Skip to main content

High-performance feature engineering library for quantitative investment

Project description

QFeatureLib

PyPI version Python 3.10+ License: MIT Code style: black

English | 中文

QFeatureLib is a high-performance, production-grade feature engineering library for quantitative investment. It focuses on financial time series processing with strict handling of future function avoidance, computational efficiency, and rigorous sample splitting.

Key Features

  • Zero Future Function: All time-series operations use shift=1 by default to prevent data leakage. The library raises FutureFunctionError if you accidentally try to use future information.
  • High Performance: Pure NumPy implementation with vectorized operations, 10-100x faster than pandas.
  • Memory Efficient: Uses views instead of copies, supports in-place operations for large-scale panel data.
  • Quantitative Finance Focused: Specialized for financial scenarios - suspended stock handling, industry neutralization, market cap neutralization, etc.

Installation

pip install qfeaturelib

For development:

pip install qfeaturelib[dev]

Quick Start

import numpy as np
from qfeaturelib import PanelData
from qfeaturelib.standardization import rolling_zscore, cs_zscore
from qfeaturelib.splitting import RollingWindowSplitter

# Create panel data (T=100 days, N=50 stocks, F=5 features)
values = np.random.randn(100, 50, 5)
dates = np.arange(100)
tickers = [f'STOCK_{i:02d}' for i in range(50)]

panel = PanelData(values, dates, tickers)

# Time-series standardization (rolling Z-score with shift=1 to prevent leakage)
zscore_values = rolling_zscore(
    panel.values[..., 0],  # First feature
    window=20,
    shift=1,  # Use past 20 days only, excluding current moment
)

# Cross-sectional standardization (Z-score across all stocks each day)
cs_values = cs_zscore(panel.values[..., 0])

# Sample splitting for backtesting
splitter = RollingWindowSplitter(
    n_samples=100,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
)

for split in splitter.split():
    train_data = zscore_values[split.train]
    val_data = zscore_values[split.val]
    test_data = zscore_values[split.test]
    # Train your model...

Core Modules

1. Time-Series Standardization

Operations along the time dimension with rolling windows:

from qfeaturelib.standardization import (
    rolling_zscore,      # Rolling Z-Score
    rolling_robust_zscore,  # Robust Z-Score using Median/MAD
    rolling_minmax,      # Rolling Min-Max scaling
)

# Parameters explained
result = rolling_zscore(
    data,
    window=20,      # Rolling window size
    shift=1,        # Window end offset (shift=1 excludes current moment)
    outlier_method="squash",  # Outlier handling: 'truncate' or 'squash'
    outlier_bounds=(0.01, 0.99),  # Quantile bounds for outliers
)

2. Cross-Sectional Standardization

Operations across all assets at each time point:

from qfeaturelib.standardization import (
    cs_zscore,           # Cross-sectional Z-Score
    cs_robust_zscore,    # Cross-sectional robust Z-Score
    cs_minmax,           # Cross-sectional Min-Max
    cs_rank,             # Cross-sectional rank (percentile)
)

# Support for group-wise operations
result = cs_zscore(data, groups=industry_labels)

3. Sample Splitting Engine

Time-series aware train/validation/test splitting:

from qfeaturelib.splitting import RollingWindowSplitter, ExpandingWindowSplitter

# Rolling window (fixed training size)
rolling_splitter = RollingWindowSplitter(
    n_samples=1000,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
    step=100,  # Roll forward 100 samples each iteration
    gap=0,     # Gap between train/val/test to prevent leakage
)

# Expanding window (growing training size)
expanding_splitter = ExpandingWindowSplitter(
    n_samples=1000,
    train_ratio=0.6,
    val_ratio=0.2,
    test_ratio=0.2,
    step=50,   # Expand by 50 samples each iteration
)

# Use split.apply() to split multiple arrays consistently
for split in rolling_splitter.split():
    (X_train, X_val, X_test), (y_train, y_val, y_test) = split.apply([X, y])

4. Missing Value Imputation

from qfeaturelib.imputation import (
    ffill,          # Forward fill
    ffill_limit,    # Forward fill with limit (prevents stale data filling)
    cs_median_fill, # Cross-sectional median fill
    cs_mean_fill,   # Cross-sectional mean fill
)

# Forward fill with maximum 5 consecutive fills
result = ffill_limit(data, limit=5)

5. Feature Neutralization

Remove effects of control factors via regression residuals:

from qfeaturelib.neutralization import (
    neutralize,
    industry_neutralize,
    size_neutralize,
)

# Industry neutralization
neutralized = industry_neutralize(feature, industry_labels)

# Size (market cap) neutralization
neutralized = size_neutralize(feature, log_market_cap)

# Custom control factors
neutralized = neutralize(feature, control_factors, method="ols")

6. Macro Indicators

Special handling for macro-economic indicators without asset dimension:

from qfeaturelib import (
    macro_rolling_zscore,
    adapt_macro_to_panel,
)

# Direct standardization of 1D macro data
gdp_zscore = macro_rolling_zscore(gdp_growth, window=12, shift=1)

# Broadcast to panel format for combination with asset features
gdp_panel = adapt_macro_to_panel(gdp_growth, n_assets=50)  # (T,) -> (T, N)

Performance Benchmarks

On standard test data (T=5000, N=1000, F=50):

Operation Pandas QFeatureLib Speedup
Rolling Z-Score ~5s ~0.1s 50x
Cross-sectional Z-Score ~2s ~0.02s 100x
Rolling Rank ~10s ~0.5s 20x

Design Principles

  1. Safety First: Default shift=1 prevents accidental future function usage
  2. Vectorization: All core computations use NumPy vectorized operations
  3. Memory Efficiency: Return views instead of copies, support in-place operations
  4. Type Safety: Full type annotations, passes mypy strict mode

Related Projects

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Changelog

See CHANGELOG.md for version history and changes.

Support


Note: This library is part of a quantitative finance ecosystem. When implementing features, consider compatibility with downstream projects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qfeaturelib-0.1.0.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qfeaturelib-0.1.0-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file qfeaturelib-0.1.0.tar.gz.

File metadata

  • Download URL: qfeaturelib-0.1.0.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qfeaturelib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 da2b748f7d9433b1f56aa09cc21a2684e72a93e659f9f1b3228a86766407cfa4
MD5 affb6db814055df7a47d665017a8c900
BLAKE2b-256 7dfd65de5412cbe0d7c473d633e40c0cd6ae44e50619ac3d96382cefc9d95636

See more details on using hashes here.

Provenance

The following attestation bundles were made for qfeaturelib-0.1.0.tar.gz:

Publisher: publish.yml on ElenYoung/QFeatureLib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qfeaturelib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qfeaturelib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qfeaturelib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07e3be6c798adc84cc29be31d8c5a595f9d08c2b000d66f79ba939efc8f4b556
MD5 27b7e4b8079b5798ee4de722a5065603
BLAKE2b-256 8db97585125f7ac12f0cbbac20caf77e5c6f45c647084610ae3a2c1ec81bba2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for qfeaturelib-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ElenYoung/QFeatureLib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page