nntm.datasets package

Module contents

nntm.datasets.fetch_numerai_example_predictions(*args, **kwargs)

Load the Numerai example predictions.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.

  • as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.

  • round_num (int, default=None) – Prediction round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

prediction{ndarray, Series} of shape (n_samples,)

When as_frame=True, prediction is a pandas Series.

id{ndarray, Series} of shape (n_samples,)

id of each prediction row. When as_frame=True, id is a pandas Series.

round_numint

Round number of the predictions.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with prediction.

y{ndarray, Series} of shape (n_samples,) if return_y=True

Only present when return_y=True. When as_frame=True, y is a pandas Series.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_example_validation_predictions(*args, **kwargs)

Load the Numerai example validation predictions.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.

  • as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.

Returns:

dataset – Dictionary-like object, with the following attributes.

prediction{ndarray, Series} of shape (539658,)

When as_frame=True, prediction is a pandas Series.

id{ndarray, Series} of shape (539658,)

id of each prediction row. When as_frame=True, id is a pandas Series.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with prediction.

y{ndarray, Series} of shape (539658,) if return_y=True

Only present when return_y=True. When as_frame=True, y is a pandas Series.

Return type:

Bunch

nntm.datasets.fetch_numerai_feature_metadata(*, data_home=None, download_if_missing=True, keep=False)

Load the Numerai feature metadata.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the feature metadata. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

Returns:

feature_metadata – Dictionary with the following keys.

feature_setsdict

Dictionary containing lists of feature names as values.

feature_statsdict

Dictionary with feature names as keys and a dictionary of different statistics about each feature as values.

Return type:

dict

References

https://forum.numer.ai/t/october-2021-updates/4384

nntm.datasets.fetch_numerai_live(*args, **kwargs)

Load the Numerai live dataset.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.

  • target (str, default="target") – Target column to return as y when return_X_y=True.

  • as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.

  • columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.

  • int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

  • round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (n_samples, 1050)

Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.

target{ndarray, Series} of shape (n_samples,)

When as_frame=True, target is a pandas Series.

targets{ndarray, DataFrame} of shape (n_samples, 21)

When as_frame=True, targets is a pandas DataFrame.

target_<name>{ndarray, Series} of shape (n_samples,)

See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.

id{ndarray, Series} of shape (n_samples,)

id of each row in data. When as_frame=True, id is a pandas Series.

era{ndarray, Series} of shape (n_samples,)

era of each row in data. When as_frame=True, era is a pandas Series.

data_type{ndarray, Series} of shape (n_samples,)

data_type of each row in data. When as_frame=True, data_type is a pandas Series.

feature_nameslist of length 1050

List of ordered feature names used in the dataset.

target_nameslist of length 21

List of ordered target names used in the dataset.

int8bool

True when features use int8 data type.

DESCRstring

Description of the dataset.

round_numint

Round number of the dataset.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.

(data, target)tuple if return_X_y=True

Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_test(*args, **kwargs)

Load the Numerai test dataset.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.

  • target (str, default="target") – Target column to return as y when return_X_y=True.

  • as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.

  • columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.

  • int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (1407586, 1050)

Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.

target{ndarray, Series} of shape (1407586,)

When as_frame=True, target is a pandas Series.

targets{ndarray, DataFrame} of shape (1407586, 21)

When as_frame=True, targets is a pandas DataFrame.

target_<name>{ndarray, Series} of shape (1407586,)

See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.

id{ndarray, Series} of shape (1407586,)

id of each row in data. When as_frame=True, id is a pandas Series.

era{ndarray, Series} of shape (1407586,)

era of each row in data. When as_frame=True, era is a pandas Series.

data_type{ndarray, Series} of shape (1407586,)

data_type of each row in data. When as_frame=True, data_type is a pandas Series.

feature_nameslist of length 1050

List of ordered feature names used in the dataset.

target_nameslist of length 21

List of ordered target names used in the dataset.

int8bool

True when features use int8 data type.

DESCRstring

Description of the dataset.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.

(data, target)tuple if return_X_y=True

Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.fetch_numerai_tournament(*args, **kwargs)

Load the Numerai tournament dataset.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.

  • target (str, default="target") – Target column to return as y when return_X_y=True.

  • as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.

  • columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.

  • int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

  • round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (n_samples, 1050)

Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.

target{ndarray, Series} of shape (n_samples,)

When as_frame=True, target is a pandas Series.

targets{ndarray, DataFrame} of shape (n_samples, 21)

When as_frame=True, targets is a pandas DataFrame.

target_<name>{ndarray, Series} of shape (n_samples,)

See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.

id{ndarray, Series} of shape (n_samples,)

id of each row in data. When as_frame=True, id is a pandas Series.

era{ndarray, Series} of shape (n_samples,)

era of each row in data. When as_frame=True, era is a pandas Series.

data_type{ndarray, Series} of shape (n_samples,)

data_type of each row in data. When as_frame=True, data_type is a pandas Series.

feature_nameslist of length 1050

List of ordered feature names used in the dataset.

target_nameslist of length 21

List of ordered target names used in the dataset.

int8bool

True when features use int8 data type.

DESCRstring

Description of the dataset.

round_numint

Round number of the dataset.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.

(data, target)tuple if return_X_y=True

Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

Notes

Data changes weekly.

nntm.datasets.fetch_numerai_training(*args, **kwargs)

Load the Numerai training dataset.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.

  • target (str, default="target") – Target column to return as y when return_X_y=True.

  • as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.

  • columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.

  • int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (2412105, 1050)

Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.

target{ndarray, Series} of shape (2412105,)

When as_frame=True, target is a pandas Series.

targets{ndarray, DataFrame} of shape (2412105, 21)

When as_frame=True, targets is a pandas DataFrame.

target_<name>{ndarray, Series} of shape (2412105,)

See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.

id{ndarray, Series} of shape (2412105,)

id of each row in data. When as_frame=True, id is a pandas Series.

era{ndarray, Series} of shape (2412105,)

era of each row in data. When as_frame=True, era is a pandas Series.

data_type{ndarray, Series} of shape (2412105,)

data_type of each row in data. When as_frame=True, data_type is a pandas Series.

feature_nameslist of length 1050

List of ordered feature names used in the dataset.

target_nameslist of length 21

List of ordered target names used in the dataset.

int8bool

True when features use int8 data type.

DESCRstring

Description of the dataset.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.

(data, target)tuple if return_X_y=True

Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.fetch_numerai_validation(*args, **kwargs)

Load the Numerai validation dataset.

Parameters:
  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.

  • keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.

  • target (str, default="target") – Target column to return as y when return_X_y=True.

  • as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.

  • columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.

  • int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.

Returns:

dataset – Dictionary-like object, with the following attributes.

data{ndarray, DataFrame} of shape (539658, 1050)

Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.

target{ndarray, Series} of shape (539658,)

When as_frame=True, target is a pandas Series.

targets{ndarray, DataFrame} of shape (539658, 21)

When as_frame=True, targets is a pandas DataFrame.

target_<name>{ndarray, Series} of shape (539658,)

See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.

id{ndarray, Series} of shape (539658,)

id of each row in data. When as_frame=True, id is a pandas Series.

era{ndarray, Series} of shape (539658,)

era of each row in data. When as_frame=True, era is a pandas Series.

data_type{ndarray, Series} of shape (539658,)

data_type of each row in data. When as_frame=True, data_type is a pandas Series.

feature_nameslist of length 1050

List of ordered feature names used in the dataset.

target_nameslist of length 21

List of ordered target names used in the dataset.

int8bool

True when features use int8 data type.

DESCRstring

Description of the dataset.

frameDataFrame if as_frame=True

Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.

(data, target)tuple if return_X_y=True

Only present when return_X_y=True. target corresponds to the column set by the target attribute.

Return type:

Bunch

nntm.datasets.submit_numerai_tournament(prediction, model_id=None, public_id=None, secret_key=None, data_home=None, keep=False, version=None)

Submit Numerai main tournament prediction for current round.

Parameters:
  • prediction ({list, Series}) – Predicted values. Requires same order as example predictions.

  • model_id (str, default=None) – Target model UUID. Required for accounts with multiple models. See https://numer.ai/models

  • public_id (str, default=None) – ID of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.

  • secret_key (str, default=None) – Secret of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.

  • data_home (str, default=None) – Specify another download and cache folder for the predictions. By default all data is stored in ~/scikit_learn_data subfolders.

  • keep (bool, default=False) – If True, does not remove the prediction csv file from disk after uploading it.