nntm.datasets package¶

Module contents¶

nntm.datasets.fetch_numerai_example_predictions(*args, **kwargs)¶

Load the Numerai example predictions.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.
as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.
round_num (int, default=None) – Prediction round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

prediction{ndarray, Series} of shape (n_samples,): When as_frame=True, prediction is a pandas Series.
id{ndarray, Series} of shape (n_samples,): id of each prediction row. When as_frame=True, id is a pandas Series.
round_numint: Round number of the predictions.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with prediction.
y{ndarray, Series} of shape (n_samples,) if return_y=True: Only present when return_y=True. When as_frame=True, y is a pandas Series.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_example_validation_predictions(*args, **kwargs)¶

Load the Numerai example validation predictions.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.
as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.

Returns:

dataset – Dictionary-like object, with the following attributes.

prediction{ndarray, Series} of shape (539658,): When as_frame=True, prediction is a pandas Series.
id{ndarray, Series} of shape (539658,): id of each prediction row. When as_frame=True, id is a pandas Series.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with prediction.
y{ndarray, Series} of shape (539658,) if return_y=True: Only present when return_y=True. When as_frame=True, y is a pandas Series.

Return type:

Bunch

nntm.datasets.fetch_numerai_feature_metadata(*, data_home=None, download_if_missing=True, keep=False)¶

Load the Numerai feature metadata.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the feature metadata. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

Returns:

feature_metadata – Dictionary with the following keys.

feature_setsdict: Dictionary containing lists of feature names as values.
feature_statsdict: Dictionary with feature names as keys and a dictionary of different statistics about each feature as values.

Return type:

dict

References

https://forum.numer.ai/t/october-2021-updates/4384

nntm.datasets.fetch_numerai_live(*args, **kwargs)¶

Load the Numerai live dataset.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (n_samples, 1050): Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
target{ndarray, Series} of shape (n_samples,): When as_frame=True, target is a pandas Series.
targets{ndarray, DataFrame} of shape (n_samples, 21): When as_frame=True, targets is a pandas DataFrame.
target_<name>{ndarray, Series} of shape (n_samples,): See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
id{ndarray, Series} of shape (n_samples,): id of each row in data. When as_frame=True, id is a pandas Series.
era{ndarray, Series} of shape (n_samples,): era of each row in data. When as_frame=True, era is a pandas Series.
data_type{ndarray, Series} of shape (n_samples,): data_type of each row in data. When as_frame=True, data_type is a pandas Series.
feature_nameslist of length 1050: List of ordered feature names used in the dataset.
target_nameslist of length 21: List of ordered target names used in the dataset.
int8bool: True when features use int8 data type.
DESCRstring: Description of the dataset.
round_numint: Round number of the dataset.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
(data, target)tuple if return_X_y=True: Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_test(*args, **kwargs)¶

Load the Numerai test dataset.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (1407586, 1050): Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
target{ndarray, Series} of shape (1407586,): When as_frame=True, target is a pandas Series.
targets{ndarray, DataFrame} of shape (1407586, 21): When as_frame=True, targets is a pandas DataFrame.
target_<name>{ndarray, Series} of shape (1407586,): See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
id{ndarray, Series} of shape (1407586,): id of each row in data. When as_frame=True, id is a pandas Series.
era{ndarray, Series} of shape (1407586,): era of each row in data. When as_frame=True, era is a pandas Series.
data_type{ndarray, Series} of shape (1407586,): data_type of each row in data. When as_frame=True, data_type is a pandas Series.
feature_nameslist of length 1050: List of ordered feature names used in the dataset.
target_nameslist of length 21: List of ordered target names used in the dataset.
int8bool: True when features use int8 data type.
DESCRstring: Description of the dataset.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
(data, target)tuple if return_X_y=True: Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.fetch_numerai_tournament(*args, **kwargs)¶

Load the Numerai tournament dataset.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (n_samples, 1050): Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
target{ndarray, Series} of shape (n_samples,): When as_frame=True, target is a pandas Series.
targets{ndarray, DataFrame} of shape (n_samples, 21): When as_frame=True, targets is a pandas DataFrame.
target_<name>{ndarray, Series} of shape (n_samples,): See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
id{ndarray, Series} of shape (n_samples,): id of each row in data. When as_frame=True, id is a pandas Series.
era{ndarray, Series} of shape (n_samples,): era of each row in data. When as_frame=True, era is a pandas Series.
data_type{ndarray, Series} of shape (n_samples,): data_type of each row in data. When as_frame=True, data_type is a pandas Series.
feature_nameslist of length 1050: List of ordered feature names used in the dataset.
target_nameslist of length 21: List of ordered target names used in the dataset.
int8bool: True when features use int8 data type.
DESCRstring: Description of the dataset.
round_numint: Round number of the dataset.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
(data, target)tuple if return_X_y=True: Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_training(*args, **kwargs)¶

Load the Numerai training dataset.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (2412105, 1050): Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
target{ndarray, Series} of shape (2412105,): When as_frame=True, target is a pandas Series.
targets{ndarray, DataFrame} of shape (2412105, 21): When as_frame=True, targets is a pandas DataFrame.
target_<name>{ndarray, Series} of shape (2412105,): See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
id{ndarray, Series} of shape (2412105,): id of each row in data. When as_frame=True, id is a pandas Series.
era{ndarray, Series} of shape (2412105,): era of each row in data. When as_frame=True, era is a pandas Series.
data_type{ndarray, Series} of shape (2412105,): data_type of each row in data. When as_frame=True, data_type is a pandas Series.
feature_nameslist of length 1050: List of ordered feature names used in the dataset.
target_nameslist of length 21: List of ordered target names used in the dataset.
int8bool: True when features use int8 data type.
DESCRstring: Description of the dataset.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
(data, target)tuple if return_X_y=True: Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.fetch_numerai_validation(*args, **kwargs)¶

Load the Numerai validation dataset.

Parameters:

data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (539658, 1050): Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
target{ndarray, Series} of shape (539658,): When as_frame=True, target is a pandas Series.
targets{ndarray, DataFrame} of shape (539658, 21): When as_frame=True, targets is a pandas DataFrame.
target_<name>{ndarray, Series} of shape (539658,): See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
id{ndarray, Series} of shape (539658,): id of each row in data. When as_frame=True, id is a pandas Series.
era{ndarray, Series} of shape (539658,): era of each row in data. When as_frame=True, era is a pandas Series.
data_type{ndarray, Series} of shape (539658,): data_type of each row in data. When as_frame=True, data_type is a pandas Series.
feature_nameslist of length 1050: List of ordered feature names used in the dataset.
target_nameslist of length 21: List of ordered target names used in the dataset.
int8bool: True when features use int8 data type.
DESCRstring: Description of the dataset.
frameDataFrame if as_frame=True: Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
(data, target)tuple if return_X_y=True: Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.submit_numerai_tournament(prediction, model_id=None, public_id=None, secret_key=None, data_home=None, keep=False, version=None)¶

Submit Numerai main tournament prediction for current round.

Parameters:

prediction ({list, Series}) – Predicted values. Requires same order as example predictions.
model_id (str, default=None) – Target model UUID. Required for accounts with multiple models. See https://numer.ai/models
public_id (str, default=None) – ID of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.
secret_key (str, default=None) – Secret of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.
data_home (str, default=None) – Specify another download and cache folder for the predictions. By default all data is stored in ~/scikit_learn_data subfolders.
keep (bool, default=False) – If True, does not remove the prediction csv file from disk after uploading it.