High-performance feature engineering library for quantitative investment
Project description
QFeatureLib
QFeatureLib is a high-performance, production-grade feature engineering library for quantitative investment. It focuses on financial time series processing with strict handling of future function avoidance, computational efficiency, and rigorous sample splitting.
Key Features
- Zero Future Function: All time-series operations use
shift=1by default to prevent data leakage. The library raisesFutureFunctionErrorif you accidentally try to use future information. - High Performance: Pure NumPy implementation with vectorized operations, 10-100x faster than pandas.
- Memory Efficient: Uses views instead of copies, supports in-place operations for large-scale panel data.
- Quantitative Finance Focused: Specialized for financial scenarios - suspended stock handling, industry neutralization, market cap neutralization, etc.
Installation
pip install qfeaturelib
For development:
pip install qfeaturelib[dev]
Quick Start
import numpy as np
from qfeaturelib import PanelData
from qfeaturelib.standardization import rolling_zscore, cs_zscore
from qfeaturelib.splitting import RollingWindowSplitter
# Create panel data (T=100 days, N=50 stocks, F=5 features)
values = np.random.randn(100, 50, 5)
dates = np.arange(100)
tickers = [f'STOCK_{i:02d}' for i in range(50)]
panel = PanelData(values, dates, tickers)
# Time-series standardization (rolling Z-score with shift=1 to prevent leakage)
zscore_values = rolling_zscore(
panel.values[..., 0], # First feature
window=20,
shift=1, # Use past 20 days only, excluding current moment
)
# Cross-sectional standardization (Z-score across all stocks each day)
cs_values = cs_zscore(panel.values[..., 0])
# Sample splitting for backtesting
splitter = RollingWindowSplitter(
n_samples=100,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
)
for split in splitter.split():
train_data = zscore_values[split.train]
val_data = zscore_values[split.val]
test_data = zscore_values[split.test]
# Train your model...
Core Modules
1. Time-Series Standardization
Operations along the time dimension with rolling windows:
from qfeaturelib.standardization import (
rolling_zscore, # Rolling Z-Score
rolling_robust_zscore, # Robust Z-Score using Median/MAD
rolling_minmax, # Rolling Min-Max scaling
)
# Parameters explained
result = rolling_zscore(
data,
window=20, # Rolling window size
shift=1, # Window end offset (shift=1 excludes current moment)
outlier_method="squash", # Outlier handling: 'truncate' or 'squash'
outlier_bounds=(0.01, 0.99), # Quantile bounds for outliers
)
2. Cross-Sectional Standardization
Operations across all assets at each time point:
from qfeaturelib.standardization import (
cs_zscore, # Cross-sectional Z-Score
cs_robust_zscore, # Cross-sectional robust Z-Score
cs_minmax, # Cross-sectional Min-Max
cs_rank, # Cross-sectional rank (percentile)
)
# Support for group-wise operations
result = cs_zscore(data, groups=industry_labels)
3. Sample Splitting Engine
Time-series aware train/validation/test splitting:
from qfeaturelib.splitting import RollingWindowSplitter, ExpandingWindowSplitter
# Rolling window (fixed training size)
rolling_splitter = RollingWindowSplitter(
n_samples=1000,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
step=100, # Roll forward 100 samples each iteration
gap=0, # Gap between train/val/test to prevent leakage
)
# Expanding window (growing training size)
expanding_splitter = ExpandingWindowSplitter(
n_samples=1000,
train_ratio=0.6,
val_ratio=0.2,
test_ratio=0.2,
step=50, # Expand by 50 samples each iteration
)
# Use split.apply() to split multiple arrays consistently
for split in rolling_splitter.split():
(X_train, X_val, X_test), (y_train, y_val, y_test) = split.apply([X, y])
4. Missing Value Imputation
from qfeaturelib.imputation import (
ffill, # Forward fill
ffill_limit, # Forward fill with limit (prevents stale data filling)
cs_median_fill, # Cross-sectional median fill
cs_mean_fill, # Cross-sectional mean fill
)
# Forward fill with maximum 5 consecutive fills
result = ffill_limit(data, limit=5)
5. Feature Neutralization
Remove effects of control factors via regression residuals:
from qfeaturelib.neutralization import (
neutralize,
industry_neutralize,
size_neutralize,
)
# Industry neutralization
neutralized = industry_neutralize(feature, industry_labels)
# Size (market cap) neutralization
neutralized = size_neutralize(feature, log_market_cap)
# Custom control factors
neutralized = neutralize(feature, control_factors, method="ols")
6. Macro Indicators
Special handling for macro-economic indicators without asset dimension:
from qfeaturelib import (
macro_rolling_zscore,
adapt_macro_to_panel,
)
# Direct standardization of 1D macro data
gdp_zscore = macro_rolling_zscore(gdp_growth, window=12, shift=1)
# Broadcast to panel format for combination with asset features
gdp_panel = adapt_macro_to_panel(gdp_growth, n_assets=50) # (T,) -> (T, N)
Performance Benchmarks
On standard test data (T=5000, N=1000, F=50):
| Operation | Pandas | QFeatureLib | Speedup |
|---|---|---|---|
| Rolling Z-Score | ~5s | ~0.1s | 50x |
| Cross-sectional Z-Score | ~2s | ~0.02s | 100x |
| Rolling Rank | ~10s | ~0.5s | 20x |
Design Principles
- Safety First: Default
shift=1prevents accidental future function usage - Vectorization: All core computations use NumPy vectorized operations
- Memory Efficiency: Return views instead of copies, support in-place operations
- Type Safety: Full type annotations, passes mypy strict mode
Related Projects
- AssetPanelForest - Supervised clustering for panel data
- MASFactorMiner - Factor mining and analysis
- GeneralBacktest - Backtesting framework
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Changelog
See CHANGELOG.md for version history and changes.
Support
- GitHub Issues: https://github.com/ElenYoung/QFeatureLib/issues
- Documentation: https://github.com/ElenYoung/QFeatureLib#readme
Note: This library is part of a quantitative finance ecosystem. When implementing features, consider compatibility with downstream projects.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qfeaturelib-0.1.0.tar.gz.
File metadata
- Download URL: qfeaturelib-0.1.0.tar.gz
- Upload date:
- Size: 32.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da2b748f7d9433b1f56aa09cc21a2684e72a93e659f9f1b3228a86766407cfa4
|
|
| MD5 |
affb6db814055df7a47d665017a8c900
|
|
| BLAKE2b-256 |
7dfd65de5412cbe0d7c473d633e40c0cd6ae44e50619ac3d96382cefc9d95636
|
Provenance
The following attestation bundles were made for qfeaturelib-0.1.0.tar.gz:
Publisher:
publish.yml on ElenYoung/QFeatureLib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qfeaturelib-0.1.0.tar.gz -
Subject digest:
da2b748f7d9433b1f56aa09cc21a2684e72a93e659f9f1b3228a86766407cfa4 - Sigstore transparency entry: 1296342341
- Sigstore integration time:
-
Permalink:
ElenYoung/QFeatureLib@73bdc07db1be803f9ec0e0199e5bbd3abb278b57 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ElenYoung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73bdc07db1be803f9ec0e0199e5bbd3abb278b57 -
Trigger Event:
push
-
Statement type:
File details
Details for the file qfeaturelib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qfeaturelib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 38.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07e3be6c798adc84cc29be31d8c5a595f9d08c2b000d66f79ba939efc8f4b556
|
|
| MD5 |
27b7e4b8079b5798ee4de722a5065603
|
|
| BLAKE2b-256 |
8db97585125f7ac12f0cbbac20caf77e5c6f45c647084610ae3a2c1ec81bba2c
|
Provenance
The following attestation bundles were made for qfeaturelib-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ElenYoung/QFeatureLib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qfeaturelib-0.1.0-py3-none-any.whl -
Subject digest:
07e3be6c798adc84cc29be31d8c5a595f9d08c2b000d66f79ba939efc8f4b556 - Sigstore transparency entry: 1296342875
- Sigstore integration time:
-
Permalink:
ElenYoung/QFeatureLib@73bdc07db1be803f9ec0e0199e5bbd3abb278b57 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ElenYoung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73bdc07db1be803f9ec0e0199e5bbd3abb278b57 -
Trigger Event:
push
-
Statement type: