Empirical asset pricing toolkit: ML prediction, factor models, cross-sectional tests, SDF/GMM
Project description
eapctf: Empirical Asset Pricing Toolkit
A Python package for empirical asset pricing research, covering factor construction, cross-sectional tests, SDF/GMM estimation, ML-based return prediction, and portfolio optimization.
Installation
# with uv (recommended)
uv add eapctf
# with pip
pip install eapctf
Optional ML extras (PyTorch, LightGBM):
uv add "eapctf[ml]"
Quick Start
All examples assume a long-format panel DataFrame data with columns
date, permno, ret, mktcap, exchcd, and any characteristic columns.
1. Portfolio Sorting
from eapctf.sorting import univariate_sort, bivariate_sort, ff3_factors, ff5_factors
# Decile sort on book-to-market with NYSE breakpoints (JKP micro-cap filter on by default)
result = univariate_sort(data, char_col="bm", n_portfolios=10, weighting="vw")
print(result.portfolio_returns) # DataFrame: date x [port_1, ..., port_10, long_short]
print(result.portfolio_stats) # mean, std, Sharpe, t-stat per decile
# Fama-French 3-factor model
ff3 = ff3_factors(data, bm_col="bm", rf_col="rf")
print(ff3.factors) # DataFrame: date x [mkt_rf, smb, hml]
# Fama-French 5-factor model
ff5 = ff5_factors(data, bm_col="bm", op_col="op", inv_col="inv", rf_col="rf")
print(ff5.factors) # DataFrame: date x [mkt_rf, smb, hml, rmw, cma]
2. Fama-MacBeth Cross-Sectional Regression
from eapctf.crosssection import fama_macbeth
# Characteristic-based FM: cross-sectional regression of ret on chars each period
result = fama_macbeth(data, char_cols=["bm", "size", "mom"])
print(result.lambdas[["coef", "t_shanken"]]) # risk premia with Shanken-corrected t-stats
print(result.r_squared) # time-series average cross-sectional R²
# Factor-based FM (two-pass): estimate betas first, then price them
result2 = fama_macbeth(data, factor_cols=["mkt_rf", "smb", "hml"])
3. Time-Series Alpha and GRS Test
from eapctf.timeseries import time_series_alpha, grs_test
# Single portfolio or multiple portfolios (DataFrame)
alpha_res = time_series_alpha(result.portfolio_returns, ff3.factors)
# Returns AlphaResult (single) or list[AlphaResult] (multiple)
print(alpha_res.alpha) # intercept
print(alpha_res.alpha_t) # Newey-West t-statistic
# GRS test: are all portfolio alphas jointly zero?
grs = grs_test(result.portfolio_returns, ff3.factors)
print(grs.statistic, grs.p_value)
4. SDF / GMM Estimation
from eapctf.sdf import gmm_estimate, hj_distance, hj_bounds
# Two-step efficient GMM (default)
gmm = gmm_estimate(port_ret, ff3.factors, two_step=True)
print(gmm.b) # SDF loadings (K,)
print(gmm.t_stats) # t-statistics
print(gmm.j_statistic, gmm.j_p_value) # overidentification J-test
# HJ distance: pass a pre-computed SDF proxy (e.g., from GMM)
f_demeaned = ff3.factors.sub(ff3.factors.mean())
sdf_proxy = 1 - f_demeaned.values @ gmm.b
hj = hj_distance(port_ret, pd.Series(sdf_proxy, index=port_ret.index))
print(hj.distance)
# HJ volatility bounds
bounds = hj_bounds(port_ret)
5. ML Out-of-Sample Return Prediction
from eapctf.predict import expanding_window_oos, make_predictor
model = make_predictor("lasso", alpha=0.01)
oos = expanding_window_oos(
data,
char_cols=["bm", "size", "mom", "op", "inv"],
models=[model],
train_min_periods=240, # minimum 20 years of training data
)
print(oos.oos_r2) # OOS R² averaged across models (GKX 2020)
print(oos.oos_r2_by_model) # OOS R² per model
6. Portfolio Optimization
from eapctf.sorting import long_short_portfolio
from eapctf.portfolio import mean_variance_weights, hrp_weights
# Long-short portfolio from a signal
ls = long_short_portfolio(data, signal_col="bm", n_portfolios=10, weighting="vw")
print(ls.returns["long_short"]) # long-short return series
print(ls.metrics) # mean, std, Sharpe, etc.
# Mean-variance optimization
weights = mean_variance_weights(
expected_returns=mu,
cov_matrix_input=sigma,
method="max_sharpe",
)
# Hierarchical Risk Parity
weights_hrp = hrp_weights(returns_data=port_ret)
CTF (Competition to Forecast)
eapctf.ctf provides a local replication pipeline for the Common Task Framework
introduced in Hoberg, Jensen, Kelly & Pedersen (2025). The CTF evaluates portfolio strategies on a
shared holdout test set across 402 firm characteristics (153 JKP + 249 additional GFD factors).
Pipeline
from eapctf.ctf import run_local, compute_metrics, validate
# 1. Run a CTF model script locally
weights = run_local("models/my-model.py", data_dir="data/ctf/")
# 2. Evaluate performance (10% vol-targeting matches CTF server methodology)
daily_ret = pd.read_parquet("data/ctf/ctff_daily_ret.parquet")
metrics = compute_metrics(weights, daily_ret, vol_target=0.10)
print(metrics)
# 3. Check compliance before submission
report = validate("models/my-model.py", data_dir="data/ctf/")
print(report)
Starting a New Model
cp reference/template-ctf-model.py models/my-model.py
# Edit models/my-model.py — replace TODO sections with your implementation
The template provides a complete rolling-window train/predict loop with rank-normalized features,
OLS prediction, and z-score portfolio weights. Replace train_model() / predict_returns() /
construct_weights() with your approach; the rest of the pipeline stays the same.
Replication Results
The table below shows eap.ctf.compute_metrics() output against known CTF leaderboard entries,
confirming that local evaluation with vol_target=0.10 reproduces CTF server metrics closely.
All returns are scaled to 10% annualized volatility before computing statistics (CTF standard).
| Model | Sharpe (local) | Sharpe (CTF) | Diff % | Annual Return | Vol | Max Drawdown |
|---|---|---|---|---|---|---|
| 1/N (equal weight) | 0.551 | 0.491 | +12.2% | 5.13% | 10.00% | -30.43% |
| IPCA (KPS 2019) | 1.939 | 1.948 | -0.5% | 20.64% | 10.00% | -10.30% |
The IPCA replication uses the parallelized benchmark script at reference/benchmark-ipca-pf.py
(n_factors=5, window=120 months, 402 features, 408 test dates). The Sharpe replicates within
0.5% of the CTF leaderboard value; the 1/N discrepancy reflects minor differences in stock
universe filtering conventions between local evaluation and the CTF server.
Module Overview
| Module | Key Functions | Reference |
|---|---|---|
eapctf.ctf |
run_local, compute_metrics, validate, fetch_leaderboard, pipeline, download_ctf_data |
Hoberg, Jensen, Kelly & Pedersen (2025) |
eapctf.sorting |
univariate_sort, bivariate_sort, char_factor, ff3_factors, ff5_factors, hxz4_factors, sy4_factors, mom_factor, long_short_portfolio |
Fama & French (1993, 2015); Hou, Xue & Zhang (2015); Stambaugh & Yuan (2017) |
eapctf.crosssection |
fama_macbeth, cs_regression, multiple_testing_correction |
Fama & MacBeth (1973); Shanken (1992) |
eapctf.timeseries |
time_series_alpha, grs_test, spanning_test, rolling_beta |
Gibbons, Ross & Shanken (1989) |
eapctf.sdf |
gmm_estimate, hj_distance, hj_bounds, pricing_errors |
Hansen (1982); Hansen & Jagannathan (1991) |
eapctf.predict |
expanding_window_oos, make_predictor, char_prep |
Gu, Kelly & Xiu (2020) |
eapctf.portfolio |
mean_variance_weights, hrp_weights, black_litterman_weights, ParametricPolicy, evaluate_portfolio |
Markowitz (1952); Lopez de Prado (2016) |
eapctf.utils |
rank_normalize, classify, EAPPanel, JKP_153, load_gfd_chars |
— |
Development
# install with dev dependencies
uv sync --dev
# run tests
uv run python -m pytest
# lint and type check
uv run ruff check eapctf/
uv run mypy eapctf/ --ignore-missing-imports
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eapctf-0.1.0.tar.gz.
File metadata
- Download URL: eapctf-0.1.0.tar.gz
- Upload date:
- Size: 8.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8b67fddcbfbce8628f451459f041c66dfeff4f45bbf7d70b4eadfeb9f624ca6
|
|
| MD5 |
712174a56088a4eed7c3cd8b9e6f8aa7
|
|
| BLAKE2b-256 |
3c162dd8b395bad2d0d658329fe1d5bf740b2e7efdcb32d31aed632a4ef7c0d0
|
File details
Details for the file eapctf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eapctf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 132.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad74307fe40d0aeaebe63ce191ba308d80da7d1e0c8f88c675c215818a16b4c3
|
|
| MD5 |
22d47bc75a1a72ebae8e25774647f860
|
|
| BLAKE2b-256 |
d4af6df918b3f7d7db01f8a54f26d6a1fc218800f3d555bda3f0f11e427ad279
|