Tools for solid Numerai pipelines
Project description
NumerBlox
Solid Numerai pipelines
numerblox
offers Numerai specific functionality, so you can worry less about software/data engineering and focus more on building great Numerai models!
Most of the components in this library are designed for solid weekly inference pipelines, but tools like NumerFrame
, preprocessors and evaluators also greatly simplify the training process.
Questions and discussion: rocketchat.numer.ai/channel/numerblox
Documentation: crowdcent.github.io/numerblox
1. Install
1. Getting Started
This document has been generated by NBDev. Please edit nbs/index.ipynb
instead of this README.md
. Read CONTRIBUTING.MD
for more information on the contribution process and how to change files. Thank you!
1.1 Installation
Install numerblox from PyPi by running:
pip install numerblox
Alternatively you can clone this repository and install it in development mode running the following from the root of the repository:
pip install -e .
1.2 Running Notebooks
Start by spinning up your favorite Jupyter Notebook environment. Here we'll use:
jupyter notebook
Test your installation using one of the education notebooks in nbs/edu_nbs
.
A good example is numerframe_tutorial
. Run it in your Notebook environment to
quickly test if your installation has succeeded
2.1. Contents
2.1.1. Core functionality
numerblox
features the following functionality:
- Downloading data (
NumeraiClassicDownloader
andKaggleDownloader
) - A custom data structure extending Pandas DataFrame (
NumerFrame
) - A suite of preprocessors for Numerai Classic and Signals (feature selection, engineering and manipulation)
- Model objects for easy inference.
- A suite of postprocessors for Numerai Classic and Signals (standardization, ensembling, neutralization and penalization)
- Pipelines handling processing and prediction (
ModelPipeline
andModelPipelineCollection
) - Evaluation (
NumeraiClassicEvaluator
andNumeraiSignalsEvaluator
) - Authentication (
Key
andload_key_from_json
) - Submitting (
NumeraiClassicSubmitter
andNumeraiSignalsSubmitter
) - Automated staking (
NumeraiClassicStaker
andNumeraiSignalsStaker
)
2.1.2. Educational notebooks
Example notebooks can be found in the nbs/edu_nbs
directory.
nbs/edu_nbs
currently contains the following examples:
numerframe_tutorial.ipynb
: A deep dive into whatNumerFrame
has to offer.pipeline_construction.ipynb
: How to usenumerblox
tools for efficient Numerai inference.submitting.ipynb
: How to use Submitters for safe and easy Numerai submissions.google_cloud_storage.ipynb
: How to use Downloaders and Submitters to interact with Google Cloud Storage (GCS).load_model_from_wandb.ipynb
: For Weights & Biases users. Easily pull a model from W&B for inference.
Development notebooks are also in the nbs
directory. These notebooks are also used to generate the documentation.
Questions or idea discussion for educational notebooks: rocketchat.numer.ai/channel/numerblox
Full documentation: crowdcent.github.io/numerblox
2.2. Examples
Below we will illustrate a common use case for inference pipelines. To learn more in-depth about the features of this library, check out notebooks in nbs/edu_nbs
.
2.2.1. Numerai Classic
# --- 0. Numerblox dependencies ---
from numerblox.download import NumeraiClassicDownloader
from numerblox.numerframe import create_numerframe
from numerblox.postprocessing import FeatureNeutralizer
from numerblox.model import SingleModel
from numerblox.model_pipeline import ModelPipeline
from numerblox.key import load_key_from_json
from numerblox.submission import NumeraiClassicSubmitter
# --- 1. Download version 4 data ---
downloader = NumeraiClassicDownloader("data")
downloader.download_inference_data("current_round")
# --- 2. Initialize NumerFrame ---
metadata = {"version": 4,
"joblib_model_name": "test",
"joblib_model_path": "test_assets/joblib_v2_example_model.joblib",
"numerai_model_name": "test_model1",
"key_path": "test_assets/test_credentials.json"}
dataf = create_numerframe(file_path="data/current_round/live.parquet",
metadata=metadata)
# --- 3. Define and run pipeline ---
models = [SingleModel(dataf.meta.joblib_model_path,
model_name=dataf.meta.joblib_model_name)]
# No preprocessing and 0.5 feature neutralization
postprocessors = [FeatureNeutralizer(pred_name=f"prediction_{dataf.meta.joblib_model_name}",
proportion=0.5)]
pipeline = ModelPipeline(preprocessors=[],
models=models,
postprocessors=postprocessors)
dataf = pipeline(dataf)
# --- 4. Submit ---
# Load credentials from .json (random credentials in this example)
key = load_key_from_json(dataf.meta.key_path)
submitter = NumeraiClassicSubmitter(directory_path="sub_current_round", key=key)
# full_submission checks contents, saves as csv and submits.
submitter.full_submission(dataf=dataf,
cols=f"prediction_{dataf.meta.joblib_model_name}_neutralized_0.5",
model_name=dataf.meta.numerai_model_name,
version=dataf.meta.version)
# --- 5. Clean up environment (optional) ---
downloader.remove_base_directory()
submitter.remove_base_directory()
๐ป Directory structure before starting โโโ ๐ test_assets โฃโโ ๐ joblib_v2_example_model.joblib โโโ ๐ test_credentials.json
๐ป Directory structure after submitting โฃโโ ๐ data โ โโโ ๐ current_round โ โโโ ๐ numerai_tournament_data.parquet โโโ ๐ sub_current_round โโโ ๐ test_model1.csv
2.2.2. Numerai Signals
# --- 0. Numerblox dependencies ---
from numerblox.download import KaggleDownloader
from numerblox.numerframe import create_numerframe
from numerblox.preprocessing import KatsuFeatureGenerator
from numerblox.model import SingleModel
from numerblox.model_pipeline import ModelPipeline
from numerblox.key import load_key_from_json
from numerblox.submission import NumeraiSignalsSubmitter
# --- 1. Download Katsu1110 yfinance dataset from Kaggle ---
kd = KaggleDownloader("data")
kd.download_inference_data("code1110/yfinance-stock-price-data-for-numerai-signals")
# --- 2. Initialize NumerFrame with metadata ---
metadata = {"numerai_model_name": "test_model1",
"key_path": "test_assets/test_credentials.json"}
dataf = create_numerframe("data/full_data.parquet", metadata=metadata)
# --- 3. Define and run pipeline ---
models = [SingleModel("models/signals_model.cbm", model_name="cb")]
# Simple and fast feature generator based on Katsu Signals starter notebook
# https://www.kaggle.com/code1110/numeraisignals-starter-for-beginners
pipeline = ModelPipeline(preprocessors=[KatsuFeatureGenerator(windows=[20, 40, 60])],
models=models,
postprocessors=[])
dataf = pipeline(dataf)
# --- 4. Submit ---
# Load credentials from .json (random credentials in this example)
key = load_key_from_json(dataf.meta.key_path)
submitter = NumeraiSignalsSubmitter(directory_path="sub_current_round", key=key)
# full_submission checks contents, saves as csv and submits.
# cols selection must at least contain 1 ticker column and a signal column.
dataf['signal'] = dataf['prediction_cb']
submitter.full_submission(dataf=dataf,
cols=['bloomberg_ticker', 'signal'],
model_name=dataf.meta.numerai_model_name)
# --- 5. Clean up environment (optional) ---
kd.remove_base_directory()
submitter.remove_base_directory()
๐ป Directory structure before starting โฃโโ ๐ test_assets โ โโโ ๐ test_credentials.json โโโ ๐ models โโโ ๐ signals_model.cbm
๐ป Directory structure after submitting โฃโโ ๐ data โ โโโ ๐ full_data.parquet โโโ ๐ sub_current_round โโโ ๐ submission.csv
3. Contributing
Be sure to read CONTRIBUTING.md
for detailed instructions on contributing.
If you have questions or want to discuss new ideas for numerblox
, check out rocketchat.numer.ai/channel/numerblox.
4. Branch structure
Every new feature should be implemented in a branch that branches from dev
and has the naming convention feature/{FEATURE_DESCRIPTION}
. Explicit bugfixes should be named bugfix/{FIX_DESCRIPTION}
. An example structure is given below.
Branch structure โโโ ๐ฆ master (release) โโโ ๐จโ๐ป dev โฃโโ โจ feature/ta-signals-features โฃโโ โจ feature/news-api-downloader โฃโโ โจ feature/staking-portfolio-management โโโ โจ bugfix/evaluator-metrics-fix
5. Crediting sources
Some of the components in this library may be based on forum posts, notebooks or ideas made public by the Numerai community. We have done our best to ask all parties who posted a specific piece of code for their permission and credit their work in the documentation. If your code is used in this library without credits, please let us know, so we can add a link to your article/code.
If you are contributing to numerblox
and are using ideas posted earlier by someone else, make sure to credit them by posting a link to their article/code in documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for numerblox-0.2.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 811ad82332e836b93357c00da8c01832fac84d58fcbc79ca27a7db3e2bf4935a |
|
MD5 | d19c76d2d5f59434c5eeb07e146cadab |
|
BLAKE2b-256 | 4a30ae90e6e58348055f8a72a48b4892d152959eca767fffffbb42e824c6f625 |