Single-factor evaluation/testing toolkit (pandas-first).
Project description
bagel-factor
A pandas-first toolkit for single-factor evaluation in quantitative finance.
What is this?
bagel-factor helps you answer: "Does my factor predict future returns?"
Given a factor (signal) and price data, it computes:
- ✅ IC/ICIR - Information coefficient (predictive correlation)
- ✅ Quantile returns - Performance by factor bucket
- ✅ Long-short spread - Top-minus-bottom returns
- ✅ Turnover - Trading cost implications
- ✅ Coverage - Data quality metrics
- ✅ Statistical tests - Significance testing
Perfect for: Alpha researchers, quant traders, and anyone evaluating predictive signals.
Scope (by design)
What it does:
- 📊 Canonical point-in-time panel data structure (
date × asset) - 🔄 Preprocessing transforms (clip/zscore/rank)
- 📈 Single-factor evaluation metrics
- 📉 Publication-quality visualizations
- 🧪 Statistical testing
What it doesn't do (by design):
- ❌ Multi-factor portfolio optimization
- ❌ Backtesting with transaction costs
- ❌ Risk model construction
- ❌ Position sizing / execution
This is a precision calculation engine for factor evaluation, not a full backtesting framework.
Install
Requires Python >=3.12
pip install bagel-factor
Install (dev / from source)
This repo is managed with uv.
uv sync
Quick Example
from bagelfactor import SingleFactorJob, plot_result_summary
# Run evaluation
res = SingleFactorJob.run(
panel, # Your data: (date, asset) indexed DataFrame
factor="alpha", # Factor column name
price="close", # Price column for forward returns
horizons=(1, 5, 20), # Evaluate 1, 5, and 20-period returns
n_quantiles=5, # Split into 5 buckets
)
# Check results
print(f"IC: {res.ic[5].mean():.3f}")
print(f"ICIR: {res.icir[5]:.2f}")
print(f"Sharpe: {res.long_short[5].mean() / res.long_short[5].std():.2f}")
# Visualize
fig = plot_result_summary(res, horizon=5)
fig.show()
Output: A comprehensive 4×2 plot showing IC, quantile returns, long-short performance, turnover, and coverage.
Installation
Requires Python ≥3.12
pip install bagel-factor
User Guide
Step-by-Step Tutorial
0) Data preparation (CRITICAL)
Before using bagel-factor, ensure your data meets these requirements:
import pandas as pd
from bagelfactor.data import ensure_panel_index, lag_by_asset
# 1. Load your data
df = pd.read_csv("your_data.csv")
# 2. Create canonical panel index
panel = ensure_panel_index(df, date="date", asset="ticker")
# 3. CRITICAL: Sort the panel
panel = panel.sort_index()
# 4. Lag factors to avoid lookahead bias
# (If factor data is "as-of" date t, use it starting from t+1)
panel = lag_by_asset(panel, columns=["your_factor"], periods=1)
⚠️ Critical: Unsorted data produces incorrect results. Point-in-time integrity is your responsibility.
📖 See Data Format Requirements for complete guide.
1) Prepare a canonical panel
Most APIs expect a canonical panel:
pd.DataFrame- indexed by
pd.MultiIndexwith names("date", "asset")
import pandas as pd
from bagelfactor.data import ensure_panel_index
raw = pd.DataFrame(
{
"date": ["2020-01-01", "2020-01-01"],
"asset": ["A", "B"],
"close": [10.0, 20.0],
"alpha": [1.0, 2.0],
}
)
panel = ensure_panel_index(raw)
panel = panel.sort_index() # ← CRITICAL: Always sort!
2) (Optional) preprocess the factor
from bagelfactor.preprocess import Clip, Pipeline, Rank, ZScore
preprocess = Pipeline([
Clip("alpha", lower=0.0, upper=2.0),
ZScore("alpha"),
Rank("alpha"),
])
3) Run single-factor evaluation
from bagelfactor import SingleFactorJob
res = SingleFactorJob.run(
panel,
factor="alpha", # Factor column name
price="close", # Price for computing returns
horizons=(1, 5, 20), # Multiple forward-return windows
n_quantiles=5, # Number of buckets (quintiles)
preprocess=preprocess, # Optional
)
What you get:
# Information Coefficient (per horizon)
res.ic[1] # Daily IC time series
res.icir[1] # IC Information Ratio
# Quantile analysis
res.quantile_returns[5] # Mean returns per quantile (5-day horizon)
res.long_short[5] # Top minus bottom returns
# Diagnostics
res.coverage # Data availability
res.turnover # Trading cost proxy
4) Interpret results
Quick health check:
h = 5 # 5-day horizon
# 1. Check IC
ic_mean = res.ic[h].mean()
print(f"Mean IC: {ic_mean:.4f}") # Want: 0.03-0.10 (positive or negative)
# 2. Check stability
icir = res.icir[h]
print(f"ICIR: {icir:.2f}") # Want: > 0.5
# 3. Check economic significance
ls_mean = res.long_short[h].mean()
ls_std = res.long_short[h].std()
sharpe = ls_mean / ls_std if ls_std > 0 else 0
print(f"L/S Sharpe: {sharpe:.2f}") # Want: > 0.5
# 4. Check tradability
turnover = res.turnover.mean()
print(f"Avg turnover: {turnover:.1%}") # Want: < 40%
📖 Complete interpretation guide: Result Interpretation Guide
5) Visualize results
from bagelfactor import plot_result_summary
# All-in-one summary (4×2 grid)
fig = plot_result_summary(res, horizon=5)
fig.savefig('factor_summary.png', dpi=150)
Or use individual plots:
from bagelfactor import (
plot_ic_time_series,
plot_quantile_cumulative_returns,
plot_long_short_time_series,
)
# IC over time
plot_ic_time_series(res.ic[5], rolling=20)
# Cumulative wealth by quantile
plot_quantile_cumulative_returns(res.quantile_returns[5])
# Long-short equity curve
plot_long_short_time_series(res.long_short[5], cumulative=True)
6) Statistical tests
from bagelfactor import ttest_1samp, ols_alpha_tstat
# Test if mean IC is significantly different from 0
ic_test = ttest_1samp(res.ic[5], popmean=0.0)
print(f"IC t-stat: {ic_test.statistic:.2f}, p-value: {ic_test.pvalue:.4f}")
# Test if long-short has significant alpha
ls_alpha = ols_alpha_tstat(res.long_short[5])
print(f"L/S alpha t-stat: {ls_alpha.tstat:.2f}")
# Interpretation:
# |t-stat| > 2: Significant at ~5% level
# |t-stat| > 3: Strong evidence
7) (Optional) Validate your data
Use the diagnostic utility to check for common issues:
from bagelfactor import diagnose_panel
diag = diagnose_panel(panel)
print(diag)
Example output:
Panel Diagnostics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Valid MultiIndex with names ['date', 'asset']
✓ Index is sorted
✓ No duplicate entries
⚠ Missing data: 5.2% of values are NaN
Date range: 2020-01-01 to 2023-12-31 (1000 dates)
Assets: 500 unique
Understanding Results
What do these metrics mean?
| Metric | What it measures | Good range | Red flag |
|---|---|---|---|
| IC | Cross-sectional correlation with returns | 0.03-0.10 | < 0.01 |
| ICIR | IC stability (mean/std) | > 0.5 | < 0.2 |
| Quantile spread | Q5 - Q1 average return | Context-dependent | Non-monotonic |
| Turnover | Portfolio changes between periods | < 30% (daily) | > 60% |
| Coverage | Data availability | > 90% | < 80% |
📖 Detailed interpretation: Result Interpretation Guide
Example: Good vs Concerning Factor
✅ Good Factor:
IC: 0.045, ICIR: 1.2
Quantiles: Q1=-0.8%, Q2=-0.1%, Q3=0.2%, Q4=0.6%, Q5=1.2%
L/S Sharpe: 1.8
Turnover: 25%
Coverage: 95%
→ Strong, stable signal with monotonic quantiles and reasonable turnover.
⚠️ Concerning Factor:
IC: 0.015, ICIR: 0.3
Quantiles: Q1=0.2%, Q2=-0.5%, Q3=0.8%, Q4=-0.2%, Q5=0.3%
L/S Sharpe: 0.4
Turnover: 65%
Coverage: 75%
→ Weak, unstable signal with non-monotonic quantiles, high turnover, and data quality issues.
Documentation
Getting Started
- 🚀 Quick Start (above) - 5-minute intro
- 📊 Result Interpretation Guide - How to understand your results
- ⚠️ Data Format Requirements - Critical data prep guide
- 📝 Complete Example - Full workflow with outputs
- 📚 Factor Evaluation Theory - Statistical background
Complete Example
# Run the included example
uv run python examples/example.py
# View outputs in examples/outputs/
Full example with expected outputs: docs/example.md.
Performance
Optimized vectorized implementations:
| Metric | Speedup | Notes |
|---|---|---|
| IC | 4-5x | Vectorized correlation |
| Coverage | 20-30x | Single pass counting |
| Quantiles | 10x+ | Optimized groupby |
Reproduce: uv run python examples/benchmark_ic.py
API Reference
Table of contents
-
Getting started
- 🚀 Quick Start (in README above)
- 📊 Result Interpretation Guide ⭐ Start here for understanding results!
- ⚠️ Data Format Requirements - Critical for correct results
- 📝 Complete Example - Full workflow with outputs
- 📚 Factor Evaluation Theory - Statistical background
-
Modules (API reference)
-
Design docs
Install (dev / from source)
This repo uses uv for development:
git clone https://github.com/bagelquant/bagel-factor.git
cd bagel-factor
uv sync
uv run pytest # Run tests
See CONTRIBUTING.md for development guidelines.
FAQ
Q: What's the difference between IC and RankIC?
A: IC uses Pearson correlation (linear), RankIC uses Spearman (rank-based). RankIC is more robust to outliers.
Q: Why is my IC negative?
A: Negative IC means higher factor values predict lower returns. Consider inverting your factor (multiply by -1).
Q: What IC value is "good"?
A: Context-dependent, but for daily equity factors: 0.03-0.06 is solid, >0.10 is exceptional (or suspicious—check for data leakage).
Q: My quantile returns aren't monotonic. Is that bad?
A: Yes, it suggests the factor doesn't cleanly order assets. Check data quality, try different preprocessing, or investigate non-linear relationships.
Q: How do I handle missing data?
A: The package handles NaN gracefully (cross-sectional operations skip missing values). But check coverage—if it's low, your results may be biased.
Q: Can I use this for non-equity asset classes?
A: Yes! The package is asset-class agnostic. Just provide a (date, asset) panel with factor and price data.
📖 More details: Interpretation Guide
Citation
If you use bagel-factor in academic research, please cite:
@software{bagel_factor,
title = {bagel-factor: A pandas-first toolkit for single-factor evaluation},
author = {{Bagel Quant}},
year = {2024},
url = {https://github.com/bagelquant/bagel-factor}
}
License
MIT (see LICENSE).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bagel_factor-0.1.4.tar.gz.
File metadata
- Download URL: bagel_factor-0.1.4.tar.gz
- Upload date:
- Size: 33.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd1b720c6aadb0b4be2f294be199eda6301bc3cef29095c0d2a717e6721fcd91
|
|
| MD5 |
1e3d9e0c84daac4914a64fdc7dfc074f
|
|
| BLAKE2b-256 |
dc35b1c8144f7e9baf879ac12f77fdccf65ee082af010d32aa063453d319c35b
|
File details
Details for the file bagel_factor-0.1.4-py3-none-any.whl.
File metadata
- Download URL: bagel_factor-0.1.4-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84cd786178fde99c35399d6a1e237d95f670f836edafd5550dc5678c9127d501
|
|
| MD5 |
ba39bee71204b2dca923fe71e7988de5
|
|
| BLAKE2b-256 |
8cb8e2db465c1b9a015136cb7c1451efa8a2a70a6d4329e7f24e2cf343788527
|