nntm.datasets package¶
Module contents¶
- nntm.datasets.fetch_numerai_example_predictions(*args, **kwargs)¶
Load the Numerai example predictions.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.
as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.
round_num (int, default=None) – Prediction round to download. If None, current round will be downloaded.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- prediction{ndarray, Series} of shape (n_samples,)
When as_frame=True, prediction is a pandas Series.
- id{ndarray, Series} of shape (n_samples,)
id of each prediction row. When as_frame=True, id is a pandas Series.
- round_numint
Round number of the predictions.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with prediction.
- y{ndarray, Series} of shape (n_samples,) if return_y=True
Only present when return_y=True. When as_frame=True, y is a pandas Series.
- Return type:
Bunch
Notes
Data changes weekly.
- nntm.datasets.fetch_numerai_example_validation_predictions(*args, **kwargs)¶
Load the Numerai example validation predictions.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_y (bool, default=False.) – If True, returns prediction instead of a Bunch object.
as_frame (bool, default=False) – If True, prediction and id are pandas Series. frame will be given.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- prediction{ndarray, Series} of shape (539658,)
When as_frame=True, prediction is a pandas Series.
- id{ndarray, Series} of shape (539658,)
id of each prediction row. When as_frame=True, id is a pandas Series.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with prediction.
- y{ndarray, Series} of shape (539658,) if return_y=True
Only present when return_y=True. When as_frame=True, y is a pandas Series.
- Return type:
Bunch
- nntm.datasets.fetch_numerai_feature_metadata(*, data_home=None, download_if_missing=True, keep=False)¶
Load the Numerai feature metadata.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the feature metadata. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
- Returns:
feature_metadata – Dictionary with the following keys.
- feature_setsdict
Dictionary containing lists of feature names as values.
- feature_statsdict
Dictionary with feature names as keys and a dictionary of different statistics about each feature as values.
- Return type:
dict
References
- nntm.datasets.fetch_numerai_live(*args, **kwargs)¶
Load the Numerai live dataset.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- data{ndarray, DataFrame} of shape (n_samples, 1050)
Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
- target{ndarray, Series} of shape (n_samples,)
When as_frame=True, target is a pandas Series.
- targets{ndarray, DataFrame} of shape (n_samples, 21)
When as_frame=True, targets is a pandas DataFrame.
- target_<name>{ndarray, Series} of shape (n_samples,)
See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
- id{ndarray, Series} of shape (n_samples,)
id of each row in data. When as_frame=True, id is a pandas Series.
- era{ndarray, Series} of shape (n_samples,)
era of each row in data. When as_frame=True, era is a pandas Series.
- data_type{ndarray, Series} of shape (n_samples,)
data_type of each row in data. When as_frame=True, data_type is a pandas Series.
- feature_nameslist of length 1050
List of ordered feature names used in the dataset.
- target_nameslist of length 21
List of ordered target names used in the dataset.
- int8bool
True when features use int8 data type.
- DESCRstring
Description of the dataset.
- round_numint
Round number of the dataset.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
- (data, target)tuple if return_X_y=True
Only present when return_X_y=True. target corresponds to the column set by the target attribute.
- Return type:
Bunch
Notes
Data changes weekly.
- nntm.datasets.fetch_numerai_test(*args, **kwargs)¶
Load the Numerai test dataset.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- data{ndarray, DataFrame} of shape (1407586, 1050)
Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
- target{ndarray, Series} of shape (1407586,)
When as_frame=True, target is a pandas Series.
- targets{ndarray, DataFrame} of shape (1407586, 21)
When as_frame=True, targets is a pandas DataFrame.
- target_<name>{ndarray, Series} of shape (1407586,)
See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
- id{ndarray, Series} of shape (1407586,)
id of each row in data. When as_frame=True, id is a pandas Series.
- era{ndarray, Series} of shape (1407586,)
era of each row in data. When as_frame=True, era is a pandas Series.
- data_type{ndarray, Series} of shape (1407586,)
data_type of each row in data. When as_frame=True, data_type is a pandas Series.
- feature_nameslist of length 1050
List of ordered feature names used in the dataset.
- target_nameslist of length 21
List of ordered target names used in the dataset.
- int8bool
True when features use int8 data type.
- DESCRstring
Description of the dataset.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
- (data, target)tuple if return_X_y=True
Only present when return_X_y=True. target corresponds to the column set by the target attribute.
- Return type:
Bunch
- nntm.datasets.fetch_numerai_tournament(*args, **kwargs)¶
Load the Numerai tournament dataset.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
round_num (int, default=None) – Tournament round to download. If None, current round will be downloaded.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- data{ndarray, DataFrame} of shape (n_samples, 1050)
Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
- target{ndarray, Series} of shape (n_samples,)
When as_frame=True, target is a pandas Series.
- targets{ndarray, DataFrame} of shape (n_samples, 21)
When as_frame=True, targets is a pandas DataFrame.
- target_<name>{ndarray, Series} of shape (n_samples,)
See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
- id{ndarray, Series} of shape (n_samples,)
id of each row in data. When as_frame=True, id is a pandas Series.
- era{ndarray, Series} of shape (n_samples,)
era of each row in data. When as_frame=True, era is a pandas Series.
- data_type{ndarray, Series} of shape (n_samples,)
data_type of each row in data. When as_frame=True, data_type is a pandas Series.
- feature_nameslist of length 1050
List of ordered feature names used in the dataset.
- target_nameslist of length 21
List of ordered target names used in the dataset.
- int8bool
True when features use int8 data type.
- DESCRstring
Description of the dataset.
- round_numint
Round number of the dataset.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
- (data, target)tuple if return_X_y=True
Only present when return_X_y=True. target corresponds to the column set by the target attribute.
- Return type:
Bunch
Notes
Data changes weekly.
- nntm.datasets.fetch_numerai_training(*args, **kwargs)¶
Load the Numerai training dataset.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- data{ndarray, DataFrame} of shape (2412105, 1050)
Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
- target{ndarray, Series} of shape (2412105,)
When as_frame=True, target is a pandas Series.
- targets{ndarray, DataFrame} of shape (2412105, 21)
When as_frame=True, targets is a pandas DataFrame.
- target_<name>{ndarray, Series} of shape (2412105,)
See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
- id{ndarray, Series} of shape (2412105,)
id of each row in data. When as_frame=True, id is a pandas Series.
- era{ndarray, Series} of shape (2412105,)
era of each row in data. When as_frame=True, era is a pandas Series.
- data_type{ndarray, Series} of shape (2412105,)
data_type of each row in data. When as_frame=True, data_type is a pandas Series.
- feature_nameslist of length 1050
List of ordered feature names used in the dataset.
- target_nameslist of length 21
List of ordered target names used in the dataset.
- int8bool
True when features use int8 data type.
- DESCRstring
Description of the dataset.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
- (data, target)tuple if return_X_y=True
Only present when return_X_y=True. target corresponds to the column set by the target attribute.
- Return type:
Bunch
- nntm.datasets.fetch_numerai_validation(*args, **kwargs)¶
Load the Numerai validation dataset.
- Parameters:
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default all data is stored in ~/scikit_learn_data subfolders.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source bucket.
keep (bool, default=False) – If True, does not remove the downloaded file from disk after reading it.
return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object. A custom target column can be selected through the target parameter.
target (str, default="target") – Target column to return as y when return_X_y=True.
as_frame (bool, default=False) – If True, data and targets are pandas DataFrames. target, target_<name>, id, era and data_type are pandas Series. frame will be given.
columns (list, default=None) – If not None, only these columns will be read from the file. index, era and data_type columns are always read.
int8 (bool, default=True) – If True, the feature columns will use the int8 data type instead of float32. Target columns are always float32.
- Returns:
dataset – Dictionary-like object, with the following attributes.
- data{ndarray, DataFrame} of shape (539658, 1050)
Each row corresponding to the feature_names in order. When as_frame=True, data is a pandas DataFrame.
- target{ndarray, Series} of shape (539658,)
When as_frame=True, target is a pandas Series.
- targets{ndarray, DataFrame} of shape (539658, 21)
When as_frame=True, targets is a pandas DataFrame.
- target_<name>{ndarray, Series} of shape (539658,)
See target_names for available targets. When as_frame=True, target_<name> is a pandas Series.
- id{ndarray, Series} of shape (539658,)
id of each row in data. When as_frame=True, id is a pandas Series.
- era{ndarray, Series} of shape (539658,)
era of each row in data. When as_frame=True, era is a pandas Series.
- data_type{ndarray, Series} of shape (539658,)
data_type of each row in data. When as_frame=True, data_type is a pandas Series.
- feature_nameslist of length 1050
List of ordered feature names used in the dataset.
- target_nameslist of length 21
List of ordered target names used in the dataset.
- int8bool
True when features use int8 data type.
- DESCRstring
Description of the dataset.
- frameDataFrame if as_frame=True
Only present when as_frame=True. Pandas DataFrame with era, data_type, features and targets.
- (data, target)tuple if return_X_y=True
Only present when return_X_y=True. target corresponds to the column set by the target attribute.
- Return type:
Bunch
- nntm.datasets.submit_numerai_tournament(prediction, model_id=None, public_id=None, secret_key=None, data_home=None, keep=False, version=None)¶
Submit Numerai main tournament prediction for current round.
- Parameters:
prediction ({list, Series}) – Predicted values. Requires same order as example predictions.
model_id (str, default=None) – Target model UUID. Required for accounts with multiple models. See https://numer.ai/models
public_id (str, default=None) – ID of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.
secret_key (str, default=None) – Secret of an API key. Needs Upload submissions scope. See https://numer.ai/account -> AUTOMATION.
data_home (str, default=None) – Specify another download and cache folder for the predictions. By default all data is stored in ~/scikit_learn_data subfolders.
keep (bool, default=False) – If True, does not remove the prediction csv file from disk after uploading it.