Solid Numerai Pipelines

These details have not been verified by PyPI

Project description

NumerBlox

NumerBlox offers components that help with developing strong Numerai models and inference pipelines. From downloading data to submitting predictions, NumerBlox has you covered.

All components can be used standalone and all processors are fully compatible to use within scikit-learn pipelines.

Documentation: crowdcent.github.io/numerblox

1. Installation

Install numerblox from PyPi by running:

pip install numerblox

Alternatively you can clone this repository and install it in development mode by installing using poetry:

git clone https://github.com/crowdcent/numerblox.git
pip install poetry
cd numerblox
poetry install

Installation without dev dependencies can be done by adding --only main to the poetry install line.

Test your installation using one of the education notebooks in examples. Good places to start are quickstart.ipynb and numerframe_tutorial.ipynb. Run it in your Notebook environment to quickly test if your installation has succeeded. The documentation contains examples and explanations for each component of NumerBlox.

2. Core functionality

NumerBlox has the following features for both Numerai Classic and Signals:

Data Download: Automated retrieval of Numerai datasets.

NumerFrame: A custom Pandas DataFrame for easier Numerai data manipulation.

Preprocessors: Customizable techniques for data preprocessing.

Target Engineering: Tools for creating new target variables.

Postprocessors: Ensembling, neutralization, and penalization.

MetaPipeline: An era-aware pipeline extension of scikit-learn's Pipeline. Specifically designed to integrate with era-specific Postprocessors such as neutralization and ensembling. Can be optionally bypassed for custom implementations.

MetaEstimators: Era-aware estimators that extend scikit-learn's functionality. Includes features like CrossValEstimator which allow for era-specific, multiple-folds fitting seamlessly integrated into the pipeline.

Evaluation: Comprehensive metrics aligned with Numerai's evaluation criteria.

Submitters: Facilitates secure and easy submission of predictions.

Example notebooks for each of these components can be found in the examples. Also check out the documentation for more information.

3. Quick Start

Below are two examples of how NumerBlox can be used to train and do inference on Numerai data. For a full overview of all components check out the documentation. More advanced examples to leverage NumerBlox to the fullest can be found in the End-To-End Example section.

3.1 Simple example

The example below shows how NumerBlox simplifies training and inference on an XGBoost model. NumerBlox is used here for easy downloading, data parsing, evaluation, inference and submission. You can experiment with this setup yourself in the example notebook quickstart.ipynb.

import pandas as pd
from xgboost import XGBRegressor
from numerblox.misc import Key
from numerblox.numerframe import create_numerframe
from numerblox.download import NumeraiClassicDownloader
from numerblox.prediction_loaders import ExamplePredictions
from numerblox.evaluation import NumeraiClassicEvaluator
from numerblox.submission import NumeraiClassicSubmitter

# Download data
downloader = NumeraiClassicDownloader("data")
# Training and validation data
downloader.download_training_data("train_val", version="4.3")
df = create_numerframe("data/train_val/train_int8.parquet")

# Train
X, y = df.get_feature_target_pair(multi_target=False)
xgb = XGBRegressor()
xgb.fit(X.values, y.values)

# Evaluate
val_df = create_numerframe("data/train_val/validation_int8.parquet")
val_df['prediction'] = xgb.predict(val_df.get_feature_data)
val_df['example_preds'] = ExamplePredictions("v4.3/validation_example_preds.parquet").fit_transform(None)['prediction'].values
evaluator = NumeraiClassicEvaluator()
metrics = evaluator.full_evaluation(val_df, 
                                    example_col="example_preds", 
                                    pred_cols=["prediction"], 
                                    target_col="target")

# Inference
downloader.download_live_data("current_round", version="4.3")
live_df = create_numerframe(file_path="data/current_round/live_int8.parquet")
live_X, live_y = live_df.get_feature_target_pair(multi_target=False)
preds = xgb.predict(live_X)

# Submit
NUMERAI_PUBLIC_ID = "YOUR_PUBLIC_ID"
NUMERAI_SECRET_KEY = "YOUR_SECRET_KEY"
key = Key(pub_id=NUMERAI_PUBLIC_ID, secret_key=NUMERAI_SECRET_KEY)
submitter = NumeraiClassicSubmitter(directory_path="sub_current_round", key=key)
# Your prediction file with 'id' as index and defined 'cols' below.
pred_dataf = pd.DataFrame(preds, index=live_df.index, columns=["prediction"])
# Only works with valid key credentials and model_name
submitter.full_submission(dataf=pred_dataf,
                          cols="prediction",
                          file_name="submission.csv",
                          model_name="MY_MODEL_NAME")

3.2. Advanced NumerBlox modeling

This example showcases how you can really push NumerBlox to create powerful pipelines. This pipeline approaches the Numerai Classic data as a classification problem. It fits multiple cross validation folds, reduces the classification probabilties to single values and create a weighted ensemble of these where the most recent folds get a higher weight. Lastly, the predictions are neutralized. The model is evaluated in validation data, inference is done on live data and a submission is done. Lastly, we remove the download and submission directories to clean up the environment. This is especially convenient if you are running daily inference on your own server or a cloud VM.

from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import TimeSeriesSplit
from numerblox.meta import CrossValEstimator, make_meta_pipeline
from numerblox.prediction_loaders import ExamplePredictions
from numerblox.ensemble import NumeraiEnsemble, PredictionReducer
from numerblox.neutralizers import FeatureNeutralizer

# Download data
downloader = NumeraiClassicDownloader("data")
# Training and validation data
downloader.download_training_data("train_val", version="4.3")
df = create_numerframe("data/train_val/train_int8.parquet")

# Setup model pipeline
model = XGBClassifier()
crossval = CrossValEstimator(estimator=model, cv=TimeSeriesSplit(n_splits=5), predict_func='predict_proba')
pred_rud = PredictionReducer(n_models=5, n_classes=5)
ens = NumeraiEnsemble(donate_weighted=True)
neut = FeatureNeutralizer(proportion=0.5)
full_pipe = make_meta_pipeline(preproc_pipe, crossval, pred_rud, ens, neut)

# Train
X, y = df.get_feature_target_pair(multi_target=False)
y_int = (y * 4).astype(int)
eras = df.get_era_data
features = df.get_feature_data
full_pipe.fit(X, y_int, numeraiensemble__eras=eras)

# Evaluate
val_df = create_numerframe("data/train_val/validation_int8.parquet")
val_X, _ = val_df.get_feature_target_pair(multi_target=False)
val_eras = val_df.get_era_data
val_features = val_df.get_feature_data
val_df['prediction'] = full_pipe.predict(val_X, eras=val_eras, features=val_features)
val_df['example_preds'] = ExamplePredictions("v4.3/validation_example_preds.parquet").fit_transform(None)['prediction'].values
evaluator = NumeraiClassicEvaluator()
metrics = evaluator.full_evaluation(val_df, 
                                    example_col="example_preds", 
                                    pred_cols=["prediction"], 
                                    target_col="target")

# Inference
downloader.download_live_data("current_round", version="4.3")
live_df = create_numerframe(file_path="data/current_round/live_int8.parquet")
live_X, live_y = live_df.get_feature_target_pair(multi_target=False)
live_eras = live_df.get_era_data
live_features = live_df.get_feature_data
preds = full_pipe.predict(live_X, eras=live_eras, features=live_features)

# Submit
NUMERAI_PUBLIC_ID = "YOUR_PUBLIC_ID"
NUMERAI_SECRET_KEY = "YOUR_SECRET_KEY"
key = Key(pub_id=NUMERAI_PUBLIC_ID, secret_key=NUMERAI_SECRET_KEY)
submitter = NumeraiClassicSubmitter(directory_path="sub_current_round", key=key)
# Your prediction file with 'id' as index and defined 'cols' below.
pred_dataf = pd.DataFrame(preds, index=live_df.index, columns=["prediction"])
# Only works with valid key credentials and model_name
submitter.full_submission(dataf=pred_dataf,
                          cols="prediction",
                          file_name="submission.csv",
                          model_name="MY_MODEL_NAME")

# Clean up environment
downloader.remove_base_directory()
submitter.remove_base_directory()

4. Contributing

Be sure to read the How To Contribute section section in the documentation for detailed instructions on contributing.

If you have questions or want to discuss new ideas for NumerBlox, please create a Github issue first.

5. Crediting sources

Some of the components in this library may be based on forum posts, notebooks or ideas made public by the Numerai community. We have done our best to ask all parties who posted a specific piece of code for their permission and credit their work in the documentation. If your code is used in this library without credits, please let us know, so we can add a link to your article/code.

If you are contributing to NumerBlox and are using ideas posted earlier by someone else, make sure to credit them by posting a link to their article/code in documentation.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.6.1

Sep 11, 2025

1.6.0

Jan 16, 2025

1.5.0

Sep 30, 2024

1.4.0

Sep 27, 2024

1.3.8

Sep 14, 2024

1.3.7

Sep 3, 2024

1.3.6

Sep 2, 2024

1.3.5

Sep 2, 2024

1.3.4

Aug 29, 2024

1.3.3

Aug 29, 2024

1.3.2

Jul 17, 2024

1.3.1

Mar 29, 2024

1.3.0

Mar 27, 2024

1.2.2

Mar 23, 2024

This version

1.2.1

Feb 27, 2024

1.2.0

Feb 27, 2024

1.1.18

Jan 16, 2024

1.1.17

Jan 10, 2024

1.1.16

Jan 9, 2024

1.1.15

Jan 4, 2024

1.1.14

Jan 2, 2024

1.1.13

Dec 22, 2023

1.1.12

Dec 21, 2023

1.1.11

Dec 20, 2023

1.1.10

Dec 20, 2023

1.1.9

Dec 20, 2023

1.1.8

Dec 12, 2023

1.1.7

Dec 8, 2023

1.1.6

Dec 7, 2023

1.1.5

Dec 6, 2023

1.1.4

Dec 5, 2023

1.1.3

Nov 30, 2023

1.1.2

Nov 30, 2023

1.1.1

Nov 29, 2023

1.1.0

Nov 28, 2023

1.0.3

Nov 16, 2023

1.0.2

Nov 16, 2023

1.0.1

Nov 15, 2023

1.0.0

Nov 15, 2023

0.5.14

Sep 23, 2023

0.5.13

Sep 20, 2023

0.5.12

Sep 11, 2023

0.5.11

Sep 7, 2023

0.5.10

Sep 4, 2023

0.5.9

Apr 24, 2023

0.5.8

Apr 18, 2023

0.5.7

Apr 17, 2023

0.5.6

Apr 5, 2023

0.5.5

Mar 25, 2023

0.5.4

Mar 23, 2023

0.5.3

Mar 14, 2023

0.5.2

Feb 20, 2023

0.5.1

Jan 31, 2023

0.5.0

Jan 5, 2023

0.4.0

Nov 18, 2022

0.3.12

Jun 1, 2022

0.3.11

May 26, 2022

0.3.10

May 26, 2022

0.3.9

May 25, 2022

0.3.8

May 25, 2022

0.3.6

May 12, 2022

0.3.5

May 8, 2022

0.3.4

May 6, 2022

0.3.3

May 3, 2022

0.3.2

May 3, 2022

0.3.1

May 2, 2022

0.3.0

May 2, 2022

0.2.19

Apr 28, 2022

0.2.18

Apr 27, 2022

0.2.17

Apr 25, 2022

0.2.16

Apr 21, 2022

0.2.15

Apr 19, 2022

0.2.14

Apr 13, 2022

0.2.13

Apr 13, 2022

0.2.12

Apr 10, 2022

0.2.11

Apr 8, 2022

0.2.10

Apr 7, 2022

0.2.9

Apr 6, 2022

0.2.8

Apr 2, 2022

0.2.7

Apr 2, 2022

0.2.6

Mar 24, 2022

0.2.5

Mar 16, 2022

0.2.4

Mar 16, 2022

0.2.3

Mar 15, 2022

0.2.2

Mar 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numerblox-1.2.1.tar.gz (107.1 kB view details)

Uploaded Feb 27, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

numerblox-1.2.1-py3-none-any.whl (111.3 kB view details)

Uploaded Feb 27, 2024 Python 3

File details

Details for the file numerblox-1.2.1.tar.gz.

File metadata

Download URL: numerblox-1.2.1.tar.gz
Upload date: Feb 27, 2024
Size: 107.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.3.0

File hashes

Hashes for numerblox-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`a6932296d1088103cfe703686bb5daa5df3aa57e65481c0c255a34321c21ea67`
MD5	`3289532449577f260d8f99fae020756e`
BLAKE2b-256	`523c6c0a1587a3c5e7e95b0cceef5df0dc8d0e0daee5efec3d1bb536925f0f15`

See more details on using hashes here.

File details

Details for the file numerblox-1.2.1-py3-none-any.whl.

File metadata

Download URL: numerblox-1.2.1-py3-none-any.whl
Upload date: Feb 27, 2024
Size: 111.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.3.0

File hashes

Hashes for numerblox-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f7568115dbba0a8a4fec4b1e73712ea012a09bf4a92b8697b5bde55c15110eea`
MD5	`57551c851bac3844e2e7e1da3e8a0544`
BLAKE2b-256	`2d014de7c20960280a56506425cdbba14a5e18b320f6d56fc7979e598556fed5`

See more details on using hashes here.

numerblox 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

NumerBlox

1. Installation

2. Core functionality

3. Quick Start

3.1 Simple example

3.2. Advanced NumerBlox modeling

4. Contributing

5. Crediting sources

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes