A robust backtesting engine for asset-pricing models.
Project description
paper-model: Advanced Model Implementation & Evaluation for Asset Pricing 🧠
paper-model is a powerful and extensible component of the P.A.P.E.R (Platform for Asset Pricing Experimentation and Research) toolchain. It is the engine for implementing, training, evaluating, and managing a wide array of asset pricing models, from classic linear regressions to deep neural networks.
Its primary objective is to bridge the gap between the clean, processed data from paper-data and the portfolio construction phase in paper-portfolio, by providing robust model evaluations and generating actionable out-of-sample predictions.
✨ Features
paper-model provides a comprehensive, configuration-driven framework for quantitative researchers, enabling the replication and extension of sophisticated asset pricing studies.
- Broad Model Support: 🏗️
- Linear Models:
ols(Ordinary Least Squares),enet(Elastic Net),pcr(Principal Component Regression), andpls(Partial Least Squares). - Non-Linear & Tree-Based Models:
glm(Generalized Linear Models with splines and Group Lasso),rf(Random Forest), andgbrt(Gradient Boosted Regression Trees). - Deep Learning:
nn(Feed-Forward Neural Networks) with extensive regularization and ensembling options.
- Linear Models:
- Advanced Feature Implementation: ⚙️
- Robust Objective Functions: Support for both standard
l2(least squares) and robusthuberloss across most model types. - Adaptive Hyperparameter Tuning: A validation-set-driven grid search to find optimal hyperparameters (e.g., regularization strength, number of components, tree depth) for each rolling window.
- Sample Weighting: OLS implementation supports weighting by
inv_n_stocks(inverse number of stocks per month) ormkt_cap(market capitalization). - Specialized Regularization: Implements Group Lasso for GLMs and a full suite of NN regularization techniques (L1 penalty, Early Stopping, Batch Normalization, and Ensembling).
- Robust Objective Functions: Support for both standard
- Rigorous Model Evaluation: 📊
- Implements a standard rolling-window methodology for true out-of-sample testing.
- Calculates standard asset pricing metrics like out-of-sample R² (
r2_oos) and Mean Squared Error (mse). - Generates detailed evaluation reports and time-series metrics for each model.
- Reproducible, Configuration-Driven Workflow: 📝
- Define all aspects of the modeling pipeline—data inputs, evaluation windows, and all model specifications—declaratively in a single
models-config.yamlfile. - Ensures perfect reproducibility and simplifies experimentation.
- Define all aspects of the modeling pipeline—data inputs, evaluation windows, and all model specifications—declaratively in a single
- Seamless Integration: 🔗
- Designed to work hand-in-hand with
paper-datafor input andpaper-portfoliofor downstream portfolio construction. - Orchestrated by
paper-asset-pricingfor a unified command-line experience.
- Designed to work hand-in-hand with
📦 Installation
paper-model is designed to be part of the larger PAPER monorepo.
Recommended (as part of paper-asset-pricing):
This method ensures paper-model is available to the main paper CLI orchestrator.
# Using pip
pip install "paper-asset-pricing[models]"
# Using uv
uv pip install "paper-asset-pricing[models]"
Standalone Installation:
If you only need paper-model and its core functionalities for a different project.
# Using pip
pip install paper-model
# Using uv
uv pip install paper-model
From Source (for development within the monorepo):
Navigate to the root of your PAPER monorepo and install paper-model in editable mode. This will also install all required dependencies like scikit-learn, torch, and group-lasso.
# Using pip
pip install -e ./paper-model
# Using uv
uv pip install -e ./paper-model
📖 Usage Workflow
The typical workflow for paper-model involves:
-
Data Preparation: Use
paper-datato process your raw financial data. The resulting Parquet files indata/processed/are the direct input forpaper-model. -
Configuration: Define your entire experiment in the
models-config.yamlfile. This includes the evaluation window, metrics, and a list of all models to be trained and compared. -
Model Execution: Run the models phase using the
paper-asset-pricingCLI from your project's root directory:paper execute models
This command triggers
paper-modelto:- Load and validate the configuration.
- Iterate through each rolling window defined by your parameters.
- For each model and each window: perform hyperparameter tuning (if configured), train the best model, and generate out-of-sample predictions.
- Save evaluation reports, detailed metrics, prediction files, and optional model checkpoints.
-
Review Outputs:
- Check
logs.logfor detailed execution information. - Review summary evaluation reports in
models/evaluations/. - Analyze detailed, per-window metrics from the Parquet files in the same directory.
- Use the generated predictions from
models/predictions/as input for thepaper-portfoliostage.
- Check
⚙️ Configuration (models-config.yaml)
The models-config.yaml file is the heart of paper-model. It defines the entire experiment structure.
Top-Level Configuration
input_data: Specifies the dataset name and key column identifiers.evaluation: Defines the rolling window structure (train_month,validation_month,testing_month,step_month) and the list of metrics to compute (e.g.,r2_oos,mse).
Model Configuration
The models section is a list where each item defines a model to be run. Common keys for all models include name, type, target_column, features, save_model_checkpoints, and save_prediction_results.
OLS (type: "ols")
weighting_scheme:none(default),inv_n_stocks, ormkt_cap.market_cap_column: Required ifweighting_schemeismkt_cap.objective_function:l2(default) orhuber.huber_epsilon_quantile: If using Huber loss, sets theepsilonadaptively based on a quantile of residuals (e.g.,0.999).
Elastic Net (type: "enet")
alpha: Regularization strength (λ). Can be a float or a list for tuning (e.g.,[0.01, 0.1, 1.0]).l1_ratio: Mixing parameter (ρ). Can be a float or a list for tuning (e.g.,[0.1, 0.5, 0.9]).objective_function:l2(default) orhuber.
PCR & PLS (type: "pcr", type: "pls")
n_components: Number of components (K). Can be an integer or a list for tuning (e.g.,[5, 10, 15]).objective_function:l2orhuber(for the final regression step in PCR).
Generalized Linear Model (type: "glm")
n_knots: Number of knots for the quadratic spline transformer. Fixed value (e.g.,3).alpha: Group Lasso regularization strength (λ). Can be a float or a list for tuning.objective_function: Must bel2.
Random Forest (type: "rf")
n_estimators: Number of trees (B). Typically a fixed integer (e.g.,300).max_depth: Tree depth (L). Can be an integer or a list for tuning.max_features: Features per split. Can be a string ("sqrt"), float, or list for tuning.
Gradient Boosted Trees (type: "gbrt")
n_estimators: Number of trees (B). Can be an integer or a list for tuning.max_depth: Tree depth (L). Can be an integer or a list for tuning.learning_rate: Shrinkage parameter (ν). Can be a float or a list for tuning.objective_function:l2orhuber.use_hist_implementation:trueto use scikit-learn's fasterHistGradientBoostingRegressor.
Neural Network (type: "nn")
hidden_layer_sizes: A tuple defining the architecture (e.g.,[32, 16, 8]).alpha: L1 penalty (λ). Can be a float or a list for tuning.learning_rate: Adam optimizer learning rate. Can be a float or a list for tuning.batch_size: e.g.,10000.epochs: Max epochs, e.g.,100.patience: For early stopping, e.g.,5.n_ensembles: Number of models to train and average, e.g.,10.
🤝 Contributing
We welcome contributions to paper-model! If you have suggestions for new models, evaluation techniques, or architectural improvements, please feel free to open an issue or submit a pull request.
📄 License
paper-model is distributed under the MIT License. See the LICENSE file for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_model-0.1.1.tar.gz.
File metadata
- Download URL: paper_model-0.1.1.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ce986d85cb417e342559eeaf0d63fa0fe5a616b73cae90715e55d5d60de078b
|
|
| MD5 |
4613b0f850c929c4ae3bef53fa8f0697
|
|
| BLAKE2b-256 |
27c0de40fc2bcfc849a2f50e09b17ee6ffae32223f6bc017efd6fe7c9cf442a0
|
File details
Details for the file paper_model-0.1.1-py3-none-any.whl.
File metadata
- Download URL: paper_model-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09c0427a6eb1df1ad978bab6803757a2e15d3f9eab0036ee91321c4904c773b1
|
|
| MD5 |
6c33dd564a5cf6253b4fcc2f3ad76ca7
|
|
| BLAKE2b-256 |
20f2f2a92e9f408b141895464f3f18d4165f3b6f98173e5625e9f84b930239af
|