Skip to main content

A robust backtesting engine for asset-pricing models.

Project description

paper-model: Advanced Model Implementation & Evaluation for Asset Pricing 🧠

codecov PyPI version Python 3.11+ Ruff License: MIT

paper-model is a powerful and extensible component of the P.A.P.E.R (Platform for Asset Pricing Experimentation and Research) toolchain. It is the engine for implementing, training, evaluating, and managing a wide array of asset pricing models, from classic linear regressions to deep neural networks.

Its primary objective is to bridge the gap between the clean, processed data from paper-data and the portfolio construction phase in paper-portfolio, by providing robust model evaluations and generating actionable out-of-sample predictions.


✨ Features

paper-model provides a comprehensive, configuration-driven framework for quantitative researchers, enabling the replication and extension of sophisticated asset pricing studies.

  • Broad Model Support: 🏗️
    • Linear Models: ols (Ordinary Least Squares), enet (Elastic Net), pcr (Principal Component Regression), and pls (Partial Least Squares).
    • Non-Linear & Tree-Based Models: glm (Generalized Linear Models with splines and Group Lasso), rf (Random Forest), and gbrt (Gradient Boosted Regression Trees).
    • Deep Learning: nn (Feed-Forward Neural Networks) with extensive regularization and ensembling options.
  • Advanced Feature Implementation: ⚙️
    • Robust Objective Functions: Support for both standard l2 (least squares) and robust huber loss across most model types.
    • Adaptive Hyperparameter Tuning: A validation-set-driven grid search to find optimal hyperparameters (e.g., regularization strength, number of components, tree depth) for each rolling window.
    • Sample Weighting: OLS implementation supports weighting by inv_n_stocks (inverse number of stocks per month) or mkt_cap (market capitalization).
    • Specialized Regularization: Implements Group Lasso for GLMs and a full suite of NN regularization techniques (L1 penalty, Early Stopping, Batch Normalization, and Ensembling).
  • Rigorous Model Evaluation: 📊
    • Implements a standard rolling-window methodology for true out-of-sample testing.
    • Calculates standard asset pricing metrics like out-of-sample R² (r2_oos) and Mean Squared Error (mse).
    • Generates detailed evaluation reports and time-series metrics for each model.
  • Reproducible, Configuration-Driven Workflow: 📝
    • Define all aspects of the modeling pipeline—data inputs, evaluation windows, and all model specifications—declaratively in a single models-config.yaml file.
    • Ensures perfect reproducibility and simplifies experimentation.
  • Seamless Integration: 🔗
    • Designed to work hand-in-hand with paper-data for input and paper-portfolio for downstream portfolio construction.
    • Orchestrated by paper-asset-pricing for a unified command-line experience.

📦 Installation

paper-model is designed to be part of the larger PAPER monorepo.

Recommended (as part of paper-asset-pricing):

This method ensures paper-model is available to the main paper CLI orchestrator.

# Using pip
pip install "paper-asset-pricing[models]"

# Using uv
uv pip install "paper-asset-pricing[models]"

Standalone Installation:

If you only need paper-model and its core functionalities for a different project.

# Using pip
pip install paper-model

# Using uv
uv pip install paper-model

From Source (for development within the monorepo):

Navigate to the root of your PAPER monorepo and install paper-model in editable mode. This will also install all required dependencies like scikit-learn, torch, and group-lasso.

# Using pip
pip install -e ./paper-model

# Using uv
uv pip install -e ./paper-model

📖 Usage Workflow

The typical workflow for paper-model involves:

  1. Data Preparation: Use paper-data to process your raw financial data. The resulting Parquet files in data/processed/ are the direct input for paper-model.

  2. Configuration: Define your entire experiment in the models-config.yaml file. This includes the evaluation window, metrics, and a list of all models to be trained and compared.

  3. Model Execution: Run the models phase using the paper-asset-pricing CLI from your project's root directory:

    paper execute models
    

    This command triggers paper-model to:

    • Load and validate the configuration.
    • Iterate through each rolling window defined by your parameters.
    • For each model and each window: perform hyperparameter tuning (if configured), train the best model, and generate out-of-sample predictions.
    • Save evaluation reports, detailed metrics, prediction files, and optional model checkpoints.
  4. Review Outputs:

    • Check logs.log for detailed execution information.
    • Review summary evaluation reports in models/evaluations/.
    • Analyze detailed, per-window metrics from the Parquet files in the same directory.
    • Use the generated predictions from models/predictions/ as input for the paper-portfolio stage.

⚙️ Configuration (models-config.yaml)

The models-config.yaml file is the heart of paper-model. It defines the entire experiment structure.

Top-Level Configuration

  • input_data: Specifies the dataset name and key column identifiers.
  • evaluation: Defines the rolling window structure (train_month, validation_month, testing_month, step_month) and the list of metrics to compute (e.g., r2_oos, mse).

Model Configuration

The models section is a list where each item defines a model to be run. Common keys for all models include name, type, target_column, features, save_model_checkpoints, and save_prediction_results.

OLS (type: "ols")

  • weighting_scheme: none (default), inv_n_stocks, or mkt_cap.
  • market_cap_column: Required if weighting_scheme is mkt_cap.
  • objective_function: l2 (default) or huber.
  • huber_epsilon_quantile: If using Huber loss, sets the epsilon adaptively based on a quantile of residuals (e.g., 0.999).

Elastic Net (type: "enet")

  • alpha: Regularization strength (λ). Can be a float or a list for tuning (e.g., [0.01, 0.1, 1.0]).
  • l1_ratio: Mixing parameter (ρ). Can be a float or a list for tuning (e.g., [0.1, 0.5, 0.9]).
  • objective_function: l2 (default) or huber.

PCR & PLS (type: "pcr", type: "pls")

  • n_components: Number of components (K). Can be an integer or a list for tuning (e.g., [5, 10, 15]).
  • objective_function: l2 or huber (for the final regression step in PCR).

Generalized Linear Model (type: "glm")

  • n_knots: Number of knots for the quadratic spline transformer. Fixed value (e.g., 3).
  • alpha: Group Lasso regularization strength (λ). Can be a float or a list for tuning.
  • objective_function: Must be l2.

Random Forest (type: "rf")

  • n_estimators: Number of trees (B). Typically a fixed integer (e.g., 300).
  • max_depth: Tree depth (L). Can be an integer or a list for tuning.
  • max_features: Features per split. Can be a string ("sqrt"), float, or list for tuning.

Gradient Boosted Trees (type: "gbrt")

  • n_estimators: Number of trees (B). Can be an integer or a list for tuning.
  • max_depth: Tree depth (L). Can be an integer or a list for tuning.
  • learning_rate: Shrinkage parameter (ν). Can be a float or a list for tuning.
  • objective_function: l2 or huber.
  • use_hist_implementation: true to use scikit-learn's faster HistGradientBoostingRegressor.

Neural Network (type: "nn")

  • hidden_layer_sizes: A tuple defining the architecture (e.g., [32, 16, 8]).
  • alpha: L1 penalty (λ). Can be a float or a list for tuning.
  • learning_rate: Adam optimizer learning rate. Can be a float or a list for tuning.
  • batch_size: e.g., 10000.
  • epochs: Max epochs, e.g., 100.
  • patience: For early stopping, e.g., 5.
  • n_ensembles: Number of models to train and average, e.g., 10.

🤝 Contributing

We welcome contributions to paper-model! If you have suggestions for new models, evaluation techniques, or architectural improvements, please feel free to open an issue or submit a pull request.


📄 License

paper-model is distributed under the MIT License. See the LICENSE file for more information.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_model-0.1.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_model-0.1.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file paper_model-0.1.1.tar.gz.

File metadata

  • Download URL: paper_model-0.1.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for paper_model-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ce986d85cb417e342559eeaf0d63fa0fe5a616b73cae90715e55d5d60de078b
MD5 4613b0f850c929c4ae3bef53fa8f0697
BLAKE2b-256 27c0de40fc2bcfc849a2f50e09b17ee6ffae32223f6bc017efd6fe7c9cf442a0

See more details on using hashes here.

File details

Details for the file paper_model-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: paper_model-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for paper_model-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 09c0427a6eb1df1ad978bab6803757a2e15d3f9eab0036ee91321c4904c773b1
MD5 6c33dd564a5cf6253b4fcc2f3ad76ca7
BLAKE2b-256 20f2f2a92e9f408b141895464f3f18d4165f3b6f98173e5625e9f84b930239af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page