Fast C++ implementation of Alpha191 financial factors
Project description
alpha-features-core
Fast C++ implementation of the Alpha191 quantitative finance factor library, with a pure-Python/pandas fallback.
Installation
pip install alpha-features-core
Building from source requires CMake ≥ 3.15 and a C++17 compiler (MSVC on Windows, GCC/Clang on Linux/macOS).
Quick start
import pandas as pd
from alpha_features_core.bridge import compute_alphas_cpp # C++ backend
from alpha_features_core.alpha191 import Alphas191 # pure-Python backend
# df is a long-format OHLCV DataFrame with columns:
# ticker, date, open, high, low, close, volume, amount, past_return
# --- C++ backend (fast) ---
cpp_result = compute_alphas_cpp(df) # all 191 factors
cpp_result = compute_alphas_cpp(df, nums=[1, 5, 10]) # subset
# --- Pure-Python backend ---
py_result = Alphas191(df).calculate_all_alphas(return_long=True)
Both functions return a long-format pd.DataFrame with columns [ticker, date, alpha001, alpha002, ...].
Performance
The C++ backend compiles all factors in a single pass over the data and significantly outperforms the pandas/numba implementation at scale.
from alpha_features_core.bridge import compute_alphas_cpp
from alpha_features_core.alpha191 import Alphas191
import numpy as np, pandas as pd, time, matplotlib.pyplot as plt
EXPERIMENTS = [
(5, 40), (10, 50), (20, 50), (30, 70), (50, 80),
(80, 100), (100, 120), (150, 150), (200, 180),
(250, 200), (300, 250), (350, 300), (500, 500),
]
ALL_NUMS = list(range(1, 192))
times_cpp, times_py, n_rows = [], [], []
for T, D in EXPERIMENTS:
rng = np.random.default_rng(42)
tickers = [f"T{i:04d}" for i in range(T)]
dates = pd.date_range("2020-01-01", periods=D, freq="B")
idx = pd.MultiIndex.from_product([tickers, dates], names=["ticker", "date"])
n = len(idx)
close = 100 + np.cumsum(rng.normal(0, 0.5, n))
df = pd.DataFrame({
"close": close,
"open": close + rng.normal(0, 0.2, n),
"high": close + np.abs(rng.normal(0, 0.5, n)),
"low": close - np.abs(rng.normal(0, 0.5, n)),
"volume": np.abs(rng.normal(1e6, 1e5, n)),
"past_return": rng.normal(0, 0.01, n),
}, index=idx).reset_index()
df["amount"] = df["close"] * df["volume"]
df = df.sort_values(["ticker", "date"]).reset_index(drop=True)
t0 = time.perf_counter()
compute_alphas_cpp(df, nums=ALL_NUMS, verbose=False)
times_cpp.append(time.perf_counter() - t0)
t0 = time.perf_counter()
Alphas191(df).calculate_all_alphas(return_long=True)
times_py.append(time.perf_counter() - t0)
n_rows.append(n)
print(f"{T}×{D} = {n:>7,} rows C++: {times_cpp[-1]:.2f}s "
f"Python: {times_py[-1]:.2f}s speedup: {times_py[-1]/times_cpp[-1]:.1f}x")
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(n_rows, times_cpp, "o-", color="steelblue", lw=2, ms=6, label="C++ backend")
ax.plot(n_rows, times_py, "s-", color="darkorange", lw=2, ms=6, label="Pure Python (numba)")
ax.set_xlabel("Number of rows (tickers × dates)")
ax.set_ylabel("Execution time (s)")
ax.set_title("Alpha191: C++ vs Pure Python — scaling benchmark")
ax.legend()
plt.tight_layout()
plt.savefig("benchmark.png", dpi=150)
Numerical differences between backends
The C++ and Python backends produce identical results for the vast majority of factors. Small numerical differences exist for factors that combine Corr and Rank operations (e.g. alpha001, alpha016, alpha090).
The root cause is a floating-point edge case in pandas' rolling().corr(): when one input series is constant within the rolling window, pandas internally produces ±inf or NaN depending on accumulated floating-point rounding errors that vary per window. The C++ backend applies a deterministic kEps threshold and cannot replicate this non-deterministic behaviour exactly.
In practice the affected factors differ by at most 1–2 rank positions out of 100 tickers per date, which has negligible impact on factor signal quality.
The following factors are affected:
alpha001, alpha005, alpha016, alpha036, alpha054, alpha056,
alpha061, alpha064, alpha073, alpha077, alpha083, alpha090,
alpha091, alpha092, alpha099, alpha101, alpha113, alpha115,
alpha119, alpha121, alpha123, alpha130, alpha131, alpha138,
alpha141, alpha148, alpha170, alpha176, alpha179, alpha191.
All remaining 160+ factors match the Python backend within floating-point tolerance (atol=1e-5).
Input DataFrame format
| Column | Type | Description |
|---|---|---|
ticker |
str | Asset identifier |
date |
datetime | Trading date |
open |
float | Opening price |
high |
float | High price |
low |
float | Low price |
close |
float | Closing price |
volume |
float | Trading volume |
amount |
float | Traded amount (close × volume) |
past_return |
float | Previous period return |
The DataFrame must be sorted by [ticker, date]. Column names can be customised via keyword arguments to compute_alphas_cpp().
Performance boost in using C++ vs pure Python:
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file alpha_features_core-0.1.7.tar.gz.
File metadata
- Download URL: alpha_features_core-0.1.7.tar.gz
- Upload date:
- Size: 6.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a10f51a39c899e010f4e2f84dc70a0fb90567b0f618af0479f25c8ef7322bf09
|
|
| MD5 |
4c64e92f57bade67ebfe28113d411839
|
|
| BLAKE2b-256 |
3c8b2a6b1645238aea5add0bdeaeb68ea42723e35b7a6947bab1a7a0b871307c
|