Skip to main content

Empirical asset pricing toolkit: ML prediction, factor models, cross-sectional tests, SDF/GMM

Project description

eapctf: Empirical Asset Pricing Toolkit

A Python package for empirical asset pricing research, covering factor construction, cross-sectional tests, SDF/GMM estimation, ML-based return prediction, and portfolio optimization.

Installation

# with uv (recommended)
uv add eapctf

# with pip
pip install eapctf

Optional ML extras (PyTorch, LightGBM):

uv add "eapctf[ml]"

Quick Start

All examples assume a long-format panel DataFrame data with columns date, permno, ret, mktcap, exchcd, and any characteristic columns.

1. Portfolio Sorting

from eapctf.sorting import univariate_sort, bivariate_sort, ff3_factors, ff5_factors

# Decile sort on book-to-market with NYSE breakpoints (JKP micro-cap filter on by default)
result = univariate_sort(data, char_col="bm", n_portfolios=10, weighting="vw")
print(result.portfolio_returns)   # DataFrame: date x [port_1, ..., port_10, long_short]
print(result.portfolio_stats)     # mean, std, Sharpe, t-stat per decile

# Fama-French 3-factor model
ff3 = ff3_factors(data, bm_col="bm", rf_col="rf")
print(ff3.factors)   # DataFrame: date x [mkt_rf, smb, hml]

# Fama-French 5-factor model
ff5 = ff5_factors(data, bm_col="bm", op_col="op", inv_col="inv", rf_col="rf")
print(ff5.factors)   # DataFrame: date x [mkt_rf, smb, hml, rmw, cma]

2. Fama-MacBeth Cross-Sectional Regression

from eapctf.crosssection import fama_macbeth

# Characteristic-based FM: cross-sectional regression of ret on chars each period
result = fama_macbeth(data, char_cols=["bm", "size", "mom"])
print(result.lambdas[["coef", "t_shanken"]])  # risk premia with Shanken-corrected t-stats
print(result.r_squared)                        # time-series average cross-sectional R²

# Factor-based FM (two-pass): estimate betas first, then price them
result2 = fama_macbeth(data, factor_cols=["mkt_rf", "smb", "hml"])

3. Time-Series Alpha and GRS Test

from eapctf.timeseries import time_series_alpha, grs_test

# Single portfolio or multiple portfolios (DataFrame)
alpha_res = time_series_alpha(result.portfolio_returns, ff3.factors)
# Returns AlphaResult (single) or list[AlphaResult] (multiple)
print(alpha_res.alpha)    # intercept
print(alpha_res.alpha_t)  # Newey-West t-statistic

# GRS test: are all portfolio alphas jointly zero?
grs = grs_test(result.portfolio_returns, ff3.factors)
print(grs.statistic, grs.p_value)

4. SDF / GMM Estimation

from eapctf.sdf import gmm_estimate, hj_distance, hj_bounds

# Two-step efficient GMM (default)
gmm = gmm_estimate(port_ret, ff3.factors, two_step=True)
print(gmm.b)           # SDF loadings (K,)
print(gmm.t_stats)     # t-statistics
print(gmm.j_statistic, gmm.j_p_value)  # overidentification J-test

# HJ distance: pass a pre-computed SDF proxy (e.g., from GMM)
f_demeaned = ff3.factors.sub(ff3.factors.mean())
sdf_proxy = 1 - f_demeaned.values @ gmm.b
hj = hj_distance(port_ret, pd.Series(sdf_proxy, index=port_ret.index))
print(hj.distance)

# HJ volatility bounds
bounds = hj_bounds(port_ret)

5. ML Out-of-Sample Return Prediction

from eapctf.predict import expanding_window_oos, make_predictor

model = make_predictor("lasso", alpha=0.01)
oos = expanding_window_oos(
    data,
    char_cols=["bm", "size", "mom", "op", "inv"],
    models=[model],
    train_min_periods=240,   # minimum 20 years of training data
)
print(oos.oos_r2)              # OOS R² averaged across models (GKX 2020)
print(oos.oos_r2_by_model)     # OOS R² per model

6. Portfolio Optimization

from eapctf.sorting import long_short_portfolio
from eapctf.portfolio import mean_variance_weights, hrp_weights

# Long-short portfolio from a signal
ls = long_short_portfolio(data, signal_col="bm", n_portfolios=10, weighting="vw")
print(ls.returns["long_short"])   # long-short return series
print(ls.metrics)                 # mean, std, Sharpe, etc.

# Mean-variance optimization
weights = mean_variance_weights(
    expected_returns=mu,
    cov_matrix_input=sigma,
    method="max_sharpe",
)

# Hierarchical Risk Parity
weights_hrp = hrp_weights(returns_data=port_ret)

CTF (Competition to Forecast)

eapctf.ctf provides a local replication pipeline for the Common Task Framework introduced in Hoberg, Jensen, Kelly & Pedersen (2025). The CTF evaluates portfolio strategies on a shared holdout test set across 402 firm characteristics (153 JKP + 249 additional GFD factors).

Pipeline

from eapctf.ctf import run_local, compute_metrics, validate

# 1. Run a CTF model script locally
weights = run_local("models/my-model.py", data_dir="data/ctf/")

# 2. Evaluate performance (10% vol-targeting matches CTF server methodology)
daily_ret = pd.read_parquet("data/ctf/ctff_daily_ret.parquet")
metrics = compute_metrics(weights, daily_ret, vol_target=0.10)
print(metrics)

# 3. Check compliance before submission
report = validate("models/my-model.py", data_dir="data/ctf/")
print(report)

Starting a New Model

cp reference/template-ctf-model.py models/my-model.py
# Edit models/my-model.py — replace TODO sections with your implementation

The template provides a complete rolling-window train/predict loop with rank-normalized features, OLS prediction, and z-score portfolio weights. Replace train_model() / predict_returns() / construct_weights() with your approach; the rest of the pipeline stays the same.

Replication Results

The table below shows eap.ctf.compute_metrics() output against known CTF leaderboard entries, confirming that local evaluation with vol_target=0.10 reproduces CTF server metrics closely. All returns are scaled to 10% annualized volatility before computing statistics (CTF standard).

Model Sharpe (local) Sharpe (CTF) Diff % Annual Return Vol Max Drawdown
1/N (equal weight) 0.551 0.491 +12.2% 5.13% 10.00% -30.43%
IPCA (KPS 2019) 1.939 1.948 -0.5% 20.64% 10.00% -10.30%

The IPCA replication uses the parallelized benchmark script at reference/benchmark-ipca-pf.py (n_factors=5, window=120 months, 402 features, 408 test dates). The Sharpe replicates within 0.5% of the CTF leaderboard value; the 1/N discrepancy reflects minor differences in stock universe filtering conventions between local evaluation and the CTF server.

Module Overview

Module Key Functions Reference
eapctf.ctf run_local, compute_metrics, validate, fetch_leaderboard, pipeline, download_ctf_data Hoberg, Jensen, Kelly & Pedersen (2025)
eapctf.sorting univariate_sort, bivariate_sort, char_factor, ff3_factors, ff5_factors, hxz4_factors, sy4_factors, mom_factor, long_short_portfolio Fama & French (1993, 2015); Hou, Xue & Zhang (2015); Stambaugh & Yuan (2017)
eapctf.crosssection fama_macbeth, cs_regression, multiple_testing_correction Fama & MacBeth (1973); Shanken (1992)
eapctf.timeseries time_series_alpha, grs_test, spanning_test, rolling_beta Gibbons, Ross & Shanken (1989)
eapctf.sdf gmm_estimate, hj_distance, hj_bounds, pricing_errors Hansen (1982); Hansen & Jagannathan (1991)
eapctf.predict expanding_window_oos, make_predictor, char_prep Gu, Kelly & Xiu (2020)
eapctf.portfolio mean_variance_weights, hrp_weights, black_litterman_weights, ParametricPolicy, evaluate_portfolio Markowitz (1952); Lopez de Prado (2016)
eapctf.utils rank_normalize, classify, EAPPanel, JKP_153, load_gfd_chars

Development

# install with dev dependencies
uv sync --dev

# run tests
uv run python -m pytest

# lint and type check
uv run ruff check eapctf/
uv run mypy eapctf/ --ignore-missing-imports

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eapctf-0.1.0.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eapctf-0.1.0-py3-none-any.whl (132.4 kB view details)

Uploaded Python 3

File details

Details for the file eapctf-0.1.0.tar.gz.

File metadata

  • Download URL: eapctf-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eapctf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f8b67fddcbfbce8628f451459f041c66dfeff4f45bbf7d70b4eadfeb9f624ca6
MD5 712174a56088a4eed7c3cd8b9e6f8aa7
BLAKE2b-256 3c162dd8b395bad2d0d658329fe1d5bf740b2e7efdcb32d31aed632a4ef7c0d0

See more details on using hashes here.

File details

Details for the file eapctf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: eapctf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 132.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eapctf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad74307fe40d0aeaebe63ce191ba308d80da7d1e0c8f88c675c215818a16b4c3
MD5 22d47bc75a1a72ebae8e25774647f860
BLAKE2b-256 d4af6df918b3f7d7db01f8a54f26d6a1fc218800f3d555bda3f0f11e427ad279

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page