Skip to main content

Tools for portfolio construction experiments using datasets and models.

Project description

paper-portfolio: Portfolio Construction & Performance Evaluation 📈

paper-portfolio is the final component of the P.A.P.E.R (Platform for Asset Pricing Experimentation and Research) monorepo. It provides a powerful, configuration-driven framework for constructing investment portfolios based on the predictive outputs of machine learning models and evaluating their economic significance.

Using the prediction files generated by paper-model, this package allows you to backtest various long-short portfolio strategies, calculate standard performance metrics, and generate insightful reports and visualizations.


✨ Features

  • Flexible Portfolio Construction:
    • Long-Short Strategies: Easily construct long-short portfolios by sorting assets based on their predicted returns each month.
    • Quantile-Based Selection: Define portfolio legs using specific quantile ranges (e.g., long top 10%, short bottom 10%).
    • Weighting Schemes: Supports both equal (equally-weighted) and value (e.g., market-cap weighted) portfolio construction.
  • Comprehensive Performance Evaluation:
    • Standard Metrics: Calculates annualized_sharpe_ratio, expected_shortfall (CVaR), and tracks cumulative_return.
    • Benchmarking: Automatically compares strategies against the risk-free rate and an optional, user-provided market index benchmark.
  • In-Depth Analysis:
    • Cross-Sectional Analysis: Optionally generates plots showing the cumulative performance of assets sorted into deciles by prediction. This is crucial for checking if the model's predictions are monotonically related to returns.
  • Configuration-Driven Workflow:
    • Define all portfolio strategies, the models to test, benchmarks, and metrics to calculate in a single, human-readable portfolio-config.yaml file. This ensures reproducibility and simplifies experimentation.
  • Automated Reporting & Visualization:
    • Generates detailed summary reports in text files for each model-strategy combination.
    • Automatically creates and saves PNG plots of cumulative returns, providing a clear visual comparison of the long, short, and combined portfolios against benchmarks.
    • Saves detailed monthly portfolio returns to Parquet files for deeper, custom analysis.
  • Seamless Integration:
    • Directly consumes the .parquet prediction files produced by paper-model.
    • Orchestrated by the paper-tools CLI for a smooth, end-to-end research pipeline.

🚀 Installation

paper-portfolio is designed to be part of the larger PAPER monorepo.

Recommended (as part of paper-tools):

This method ensures paper-portfolio is available to the main paper CLI orchestrator.

# Using pip
pip install "paper-tools[portfolio]"

# Using uv
uv pip install "paper-tools[portfolio]"

Standalone Installation:

If you only need paper-portfolio and its core functionalities for a different project.

# Using pip
pip install paper-portfolio

# Using uv
uv pip install paper-portfolio

From Source (for development within the monorepo):

Navigate to the root of your PAPER monorepo and install paper-portfolio in editable mode.

# Using pip
pip install -e ./paper-portfolio

# Using uv
uv pip install -e ./paper-portfolio

📖 Usage Workflow

The paper-portfolio pipeline is the final step in the P.A.P.E.R workflow.

1. Prerequisites: Data and Model Pipelines

Before running the portfolio phase, you must first run the data and model pipelines to generate the necessary inputs.

# Assuming you are in your project directory (e.g., ThesisExample)

# 1. Run the data phase
paper execute data

# 2. Run the models phase
paper execute models

After these steps, your project's models/predictions/ directory should contain files like OLS_model_predictions.parquet.

2. Portfolio Configuration (portfolio-config.yaml)

Create or edit the portfolio-config.yaml file in your project's configs directory. This file defines which models to test and which portfolio strategies to apply.

# MyProjectExample/configs/portfolio-config.yaml

input_data:
  # List of model names whose predictions you want to evaluate.
  # These must match the names from models-config.yaml.
  prediction_model_names:
    - "OLS_model"
    - "GBRT_tuned"
  
  # The base name of the processed dataset used by the models.
  processed_dataset_name: "processed_panel_data"
  
  # Column names required for calculations.
  date_column: "date"
  id_column: "permno"
  risk_free_rate_col: "rf"
  value_weight_col: "marketcap" # For value-weighting

# Optional: Define a market index for benchmark comparison.
# The CSV file must be placed in the `portfolios/indexes/` directory.
market_benchmark:
  name: "Market Index"
  file_name: "market_index.csv"
  date_column: "caldt"
  return_column: "vwretd"
  date_format: "%Y%m%d"

# A list of portfolio strategies to backtest for each model.
strategies:
  - name: "Decile_Sort_Equal_Weighted"
    weighting_scheme: "equal"
    long_quantiles: [0.9, 1.0]   # Long the top 10%
    short_quantiles: [0.0, 0.1]  # Short the bottom 10%

  - name: "Decile_Sort_Value_Weighted"
    weighting_scheme: "value"
    long_quantiles: [0.9, 1.0]
    short_quantiles: [0.0, 0.1]

# A list of performance metrics to calculate and report.
metrics:
  - "sharpe_ratio"
  - "expected_shortfall"
  - "cumulative_return"

# Enable the generation of cross-sectional decile return plots.
cross_sectional_analysis: true

3. Running the Portfolio Pipeline

Execute the portfolio phase using the paper-tools CLI from your project directory.

# Assuming you are in your project directory (e.g., MyProjectExample)
paper execute portfolio

4. Expected Output

Console Output:

The console will show a high-level success message.

>>> Executing Portfolio Phase <<<
Portfolio phase completed successfully. Additional information in 'MyProjectExample/logs.log'

ThesisExample/portfolios/results/ Directory:

The results directory will be populated with detailed reports and plots for each model-strategy combination.

├── cross_sectional_analysis/
│   ├── GBRT_tuned_cross_sectional_returns.png
│   └── OLS_model_cross_sectional_returns.png
├── GBRT_tuned_Decile_Sort_Equal_Weighted_cumulative_return.png
├── GBRT_tuned_Decile_Sort_Equal_Weighted_monthly_returns.parquet
├── GBRT_tuned_Decile_Sort_Equal_Weighted_report.txt
├── GBRT_tuned_Decile_Sort_Value_Weighted_cumulative_return.png
├── ... (and so on for all models and strategies)

Example Report (OLS_model_Decile_Sort_Value_Weighted_report.txt):

--- Portfolio Performance Report ---
Model: OLS_model
Strategy: Decile_Sort_Value_Weighted
------------------------------
sharpe_ratio: 1.2543
expected_shortfall: -0.0312
final_cumulative_return: 8.1234
------------------------------

⚙️ Configuration Reference

The portfolio-config.yaml file controls the entire portfolio evaluation process.

input_data

  • prediction_model_names (list, required): A list of model names. The manager will look for prediction files named {model_name}_predictions.parquet.
  • processed_dataset_name (string, required): The base name of the processed dataset used for modeling. This is needed to fetch columns like the risk-free rate and value-weighting characteristic.
  • date_column, id_column, risk_free_rate_col, value_weight_col (string, optional): Names of key columns.

market_benchmark (optional)

  • name (string, required): Display name for the benchmark.
  • file_name (string, required): The name of the CSV file in the portfolios/indexes/ directory.
  • date_column, return_column, date_format (string, required): Column names and date format for the benchmark file.

strategies

A list of portfolio strategies to backtest. Each strategy requires:

  • name (string, required): A unique name for the strategy (e.g., "Value_Weighted_Decile").
  • weighting_scheme (string, required): Must be either "equal" or "value".
  • long_quantiles (list of two floats, required): The lower and upper quantile boundaries for the long leg (e.g., [0.9, 1.0] for the top 10%).
  • short_quantiles (list of two floats, required): The lower and upper quantile boundaries for the short leg (e.g., [0.0, 0.1] for the bottom 10%).

metrics

A list of performance metrics to compute. Supported values: "sharpe_ratio", "expected_shortfall", "cumulative_return".

cross_sectional_analysis (optional)

  • Set to true to enable the generation of decile-sorted performance plots for each model. Defaults to false.

🤝 Contributing

Contributions to paper-portfolio are highly welcome! If you have ideas for new performance metrics, portfolio construction techniques, or reporting features, please feel free to open an issue or submit a pull request.


📄 License

paper-portfolio is distributed under the MIT License. See the LICENSE file for more information.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_portfolio-0.1.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file paper_portfolio-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for paper_portfolio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6496985d69570f9b3d745f0c32d5fe55303b52449b8fda19c1cee25959f326de
MD5 1e2237d8924a54c7a09fb15e907dc647
BLAKE2b-256 29da5cc0d598c699e95e3205975d022a81aaaa88af677c44507949c9cd525b62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page