Skip to main content

Empirical asset pricing toolkit: ML prediction, factor models, cross-sectional tests, SDF/GMM

Project description

eapctf: Empirical Asset Pricing Toolkit

A Python package for empirical asset pricing research, covering factor construction, cross-sectional tests, SDF/GMM estimation, ML-based return prediction, and portfolio optimization.

Installation

# with uv (recommended)
uv add eapctf

# with pip
pip install eapctf

Optional ML extras (PyTorch, LightGBM):

uv add "eapctf[ml]"

Quick Start

All examples assume a long-format panel DataFrame data with columns date, permno, ret, mktcap, exchcd, and any characteristic columns.

1. Portfolio Sorting

from eapctf.sorting import univariate_sort, bivariate_sort, ff3_factors, ff5_factors

# Decile sort on book-to-market with NYSE breakpoints (JKP micro-cap filter on by default)
result = univariate_sort(data, char_col="bm", n_portfolios=10, weighting="vw")
print(result.portfolio_returns)   # DataFrame: date x [port_1, ..., port_10, long_short]
print(result.portfolio_stats)     # mean, std, Sharpe, t-stat per decile

# Fama-French 3-factor model
ff3 = ff3_factors(data, bm_col="bm", rf_col="rf")
print(ff3.factors)   # DataFrame: date x [mkt_rf, smb, hml]

# Fama-French 5-factor model
ff5 = ff5_factors(data, bm_col="bm", op_col="op", inv_col="inv", rf_col="rf")
print(ff5.factors)   # DataFrame: date x [mkt_rf, smb, hml, rmw, cma]

2. Fama-MacBeth Cross-Sectional Regression

from eapctf.crosssection import fama_macbeth

# Characteristic-based FM: cross-sectional regression of ret on chars each period
result = fama_macbeth(data, char_cols=["bm", "size", "mom"])
print(result.lambdas[["coef", "t_shanken"]])  # risk premia with Shanken-corrected t-stats
print(result.r_squared)                        # time-series average cross-sectional R²

# Factor-based FM (two-pass): estimate betas first, then price them
result2 = fama_macbeth(data, factor_cols=["mkt_rf", "smb", "hml"])

3. Time-Series Alpha and GRS Test

from eapctf.timeseries import time_series_alpha, grs_test

# Single portfolio or multiple portfolios (DataFrame)
alpha_res = time_series_alpha(result.portfolio_returns, ff3.factors)
# Returns AlphaResult (single) or list[AlphaResult] (multiple)
print(alpha_res.alpha)    # intercept
print(alpha_res.alpha_t)  # Newey-West t-statistic

# GRS test: are all portfolio alphas jointly zero?
grs = grs_test(result.portfolio_returns, ff3.factors)
print(grs.statistic, grs.p_value)

4. SDF / GMM Estimation

from eapctf.sdf import gmm_estimate, hj_distance, hj_bounds

# Two-step efficient GMM (default)
gmm = gmm_estimate(port_ret, ff3.factors, two_step=True)
print(gmm.b)           # SDF loadings (K,)
print(gmm.t_stats)     # t-statistics
print(gmm.j_statistic, gmm.j_p_value)  # overidentification J-test

# HJ distance: pass a pre-computed SDF proxy (e.g., from GMM)
f_demeaned = ff3.factors.sub(ff3.factors.mean())
sdf_proxy = 1 - f_demeaned.values @ gmm.b
hj = hj_distance(port_ret, pd.Series(sdf_proxy, index=port_ret.index))
print(hj.distance)

# HJ volatility bounds
bounds = hj_bounds(port_ret)

5. ML Out-of-Sample Return Prediction

from eapctf.predict import expanding_window_oos, make_predictor

model = make_predictor("lasso", alpha=0.01)
oos = expanding_window_oos(
    data,
    char_cols=["bm", "size", "mom", "op", "inv"],
    models=[model],
    train_min_periods=240,   # minimum 20 years of training data
)
print(oos.oos_r2)              # OOS R² averaged across models (GKX 2020)
print(oos.oos_r2_by_model)     # OOS R² per model

6. Portfolio Optimization

from eapctf.sorting import long_short_portfolio
from eapctf.portfolio import mean_variance_weights, hrp_weights

# Long-short portfolio from a signal
ls = long_short_portfolio(data, signal_col="bm", n_portfolios=10, weighting="vw")
print(ls.returns["long_short"])   # long-short return series
print(ls.metrics)                 # mean, std, Sharpe, etc.

# Mean-variance optimization
weights = mean_variance_weights(
    expected_returns=mu,
    cov_matrix_input=sigma,
    method="max_sharpe",
)

# Hierarchical Risk Parity
weights_hrp = hrp_weights(returns_data=port_ret)

CTF (Competition to Forecast)

eapctf.ctf provides a local replication pipeline for the Common Task Framework introduced in Hoberg, Jensen, Kelly & Pedersen (2025). The CTF evaluates portfolio strategies on a shared holdout test set across 402 firm characteristics (153 JKP + 249 additional GFD factors).

Pipeline

from eapctf.ctf import run_local, compute_metrics, validate

# 1. Run a CTF model script locally
weights = run_local("models/my-model.py", data_dir="data/ctf/")

# 2. Evaluate performance (10% vol-targeting matches CTF server methodology)
daily_ret = pd.read_parquet("data/ctf/ctff_daily_ret.parquet")
metrics = compute_metrics(weights, daily_ret, vol_target=0.10)
print(metrics)

# 3. Check compliance before submission
report = validate("models/my-model.py", data_dir="data/ctf/")
print(report)

Starting a New Model

cp reference/template-ctf-model.py models/my-model.py
# Edit models/my-model.py — replace TODO sections with your implementation

The template provides a complete rolling-window train/predict loop with rank-normalized features, OLS prediction, and z-score portfolio weights. Replace train_model() / predict_returns() / construct_weights() with your approach; the rest of the pipeline stays the same.

Replication Results

The table below shows eap.ctf.compute_metrics() output against known CTF leaderboard entries, confirming that local evaluation with vol_target=0.10 reproduces CTF server metrics closely. All returns are scaled to 10% annualized volatility before computing statistics (CTF standard).

Model Sharpe (local) Sharpe (CTF) Diff % Annual Return Vol Max Drawdown
1/N (equal weight) 0.551 0.491 +12.2% 5.13% 10.00% -30.43%
IPCA (KPS 2019) 1.939 1.948 -0.5% 20.64% 10.00% -10.30%

The IPCA replication uses the parallelized benchmark script at reference/benchmark-ipca-pf.py (n_factors=5, window=120 months, 402 features, 408 test dates). The Sharpe replicates within 0.5% of the CTF leaderboard value; the 1/N discrepancy reflects minor differences in stock universe filtering conventions between local evaluation and the CTF server.

Module Overview

Module Key Functions Reference
eapctf.ctf run_local, compute_metrics, validate, fetch_leaderboard, pipeline, download_ctf_data Hoberg, Jensen, Kelly & Pedersen (2025)
eapctf.sorting univariate_sort, bivariate_sort, char_factor, ff3_factors, ff5_factors, hxz4_factors, sy4_factors, mom_factor, long_short_portfolio Fama & French (1993, 2015); Hou, Xue & Zhang (2015); Stambaugh & Yuan (2017)
eapctf.crosssection fama_macbeth, cs_regression, multiple_testing_correction Fama & MacBeth (1973); Shanken (1992)
eapctf.timeseries time_series_alpha, grs_test, spanning_test, rolling_beta Gibbons, Ross & Shanken (1989)
eapctf.sdf gmm_estimate, hj_distance, hj_bounds, pricing_errors Hansen (1982); Hansen & Jagannathan (1991)
eapctf.predict expanding_window_oos, make_predictor, char_prep Gu, Kelly & Xiu (2020)
eapctf.portfolio mean_variance_weights, hrp_weights, black_litterman_weights, ParametricPolicy, evaluate_portfolio Markowitz (1952); Lopez de Prado (2016)
eapctf.utils rank_normalize, classify, EAPPanel, JKP_153, load_gfd_chars

Development

# install with dev dependencies
uv sync --dev

# run tests
uv run python -m pytest

# lint and type check
uv run ruff check eapctf/
uv run mypy eapctf/ --ignore-missing-imports

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eapctf-0.3.1.tar.gz (327.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eapctf-0.3.1-py3-none-any.whl (143.2 kB view details)

Uploaded Python 3

File details

Details for the file eapctf-0.3.1.tar.gz.

File metadata

  • Download URL: eapctf-0.3.1.tar.gz
  • Upload date:
  • Size: 327.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eapctf-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f1807055387b2a5cb07c89da9bc4882ab8188ef321ff8cd2c8173c0d4bbdec24
MD5 58872f94a7190dd4adb42766119a2253
BLAKE2b-256 6350127c75e0673b454d04c895acaee66487aa34f73861eefc340a9d2f6e50ce

See more details on using hashes here.

File details

Details for the file eapctf-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: eapctf-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 143.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eapctf-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bed7e4055d18e24411b1f4108b49d9521047a7c569f4b2310ace3062bb24b8e6
MD5 c99b7ee0d889a86004cb965303d692b2
BLAKE2b-256 cf09f7a32a54cd1b0950a87e484d9cd42af5fafc57bb90afd5dfecf91292b0e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page