Skip to main content

High-performance quantitative finance feature engineering library

Project description

ml4t-engineer

Python 3.11+ License: MIT

High-performance feature engineering for financial machine learning.

ml4t-engineer provides 107+ technical indicators, triple-barrier labeling, and alternative bar sampling with a Polars-first implementation that's 10-100x faster than pandas alternatives.

Features

  • 107+ Technical Indicators: Momentum, trend, volatility, volume, and more
  • TA-Lib Validated: 59 indicators validated against TA-Lib at 1e-6 tolerance
  • Triple-Barrier Labeling: AFML-compliant labeling with ATR-based barriers
  • Alternative Bars: Volume, dollar, tick, and imbalance bars
  • Microstructure Metrics: Kyle's Lambda, VPIN, Amihud, Roll spread
  • ML-Specific Features: Fractional differencing, entropy, Hurst exponent
  • Polars-First: 10-100x faster than pandas, ~0.8x TA-Lib C speed
  • Type-Safe: Full type hints with mypy strict mode

Installation

pip install ml4t-engineer

With optional dependencies:

pip install ml4t-engineer[talib]      # TA-Lib backend
pip install ml4t-engineer[numba]      # Numba acceleration
pip install ml4t-engineer[all]        # All optional dependencies

Quick Start

import polars as pl
from ml4t.engineer import compute_features, list_features

# See available features
print(list_features("momentum"))  # RSI, MACD, Stochastic, etc.

# Load OHLCV data
df = pl.read_parquet("ohlcv.parquet")

# Compute features with default parameters
result = compute_features(df, ["rsi", "macd", "atr", "obv"])

# Or with custom parameters
result = compute_features(df, [
    {"name": "rsi", "params": {"period": 20}},
    {"name": "sma", "params": {"period": 50}},
    {"name": "bollinger_bands", "params": {"period": 20, "std_dev": 2.0}},
])

Feature Categories

Category Count Examples
Momentum 31 RSI, MACD, Stochastic, CCI, ADX, MFI
Trend 10 SMA, EMA, WMA, DEMA, TEMA, KAMA
Volatility 15 ATR, Bollinger, Yang-Zhang, GARCH
Volume 3 OBV, AD, ADOSC
Statistics 8 Variance, Linear Regression, Correlation
Math 3 MAX, MIN, SUM
Price Transform 5 Typical Price, Weighted Close
Microstructure 12 Kyle Lambda, VPIN, Amihud, Roll
ML 11 Fractional Diff, Entropy, Hurst

Triple-Barrier Labeling

from ml4t.engineer.labeling import triple_barrier_labels, atr_barriers

# Fixed barriers
labels = triple_barrier_labels(
    df,
    upper_barrier=0.02,  # 2% profit target
    lower_barrier=0.01,  # 1% stop loss
    max_holding=20,       # 20 bar horizon
)

# Dynamic ATR-based barriers
labels = atr_barriers(
    df,
    atr_period=14,
    upper_multiplier=2.0,  # 2x ATR profit target
    lower_multiplier=1.0,  # 1x ATR stop loss
    max_holding=20,
)

Alternative Bar Sampling

from ml4t.engineer.bars import volume_bars, dollar_bars, tick_imbalance_bars

# Volume bars (equal volume per bar)
vbars = volume_bars(tick_data, volume_threshold=1000)

# Dollar bars (equal dollar volume per bar)
dbars = dollar_bars(tick_data, dollar_threshold=1_000_000)

# Tick imbalance bars (information-driven)
ibars = tick_imbalance_bars(tick_data, expected_imbalance=100)

Preprocessing

from ml4t.engineer import Preprocessor, StandardScaler, RobustScaler

# Leakage-safe preprocessing
preprocessor = Preprocessor([
    StandardScaler(),
])

# Fit on train only, transform both
X_train_scaled = preprocessor.fit_transform(X_train)
X_test_scaled = preprocessor.transform(X_test)

Configuration via YAML

# features.yaml
features:
  - name: rsi
    params:
      period: 14
  - name: macd
    params:
      fast: 12
      slow: 26
      signal: 9
  - name: bollinger_bands
    params:
      period: 20
      std_dev: 2.0
result = compute_features(df, "features.yaml")

Performance

Benchmark ml4t-engineer pandas-ta Speedup
RSI (1M rows) 12ms 850ms 70x
MACD (1M rows) 18ms 1200ms 67x
Bollinger (1M rows) 15ms 920ms 61x
Triple-barrier (1M rows) 20ms N/A -

Benchmarks on M1 MacBook Pro with Polars 0.20+

API Reference

Core Functions

from ml4t.engineer import (
    compute_features,   # Compute features from config
    list_features,      # List available features
    list_categories,    # List feature categories
    describe_feature,   # Get feature metadata
)

Labeling

from ml4t.engineer.labeling import (
    triple_barrier_labels,  # Triple-barrier method
    atr_barriers,           # ATR-based barriers
    meta_labels,            # Meta-labeling
)

Bars

from ml4t.engineer.bars import (
    volume_bars,           # Volume bars
    dollar_bars,           # Dollar bars
    tick_imbalance_bars,   # Tick imbalance bars
    volume_imbalance_bars, # Volume imbalance bars
)

Preprocessing

from ml4t.engineer import (
    Preprocessor,      # Preprocessing pipeline
    StandardScaler,    # Z-score normalization
    RobustScaler,      # Robust scaling (median/IQR)
    MinMaxScaler,      # Min-max scaling
)

Integration with ML4T Libraries

ml4t-engineer is part of the ML4T library ecosystem:

from ml4t.data import DataManager
from ml4t.engineer import compute_features
from ml4t.engineer.labeling import triple_barrier_labels
from ml4t.diagnostic import Evaluator
from ml4t.backtest import Engine

# Complete workflow
data = DataManager().fetch("SPY", "2020-01-01", "2023-12-31")
features = compute_features(data, ["rsi", "macd", "atr"])
labels = triple_barrier_labels(data, 0.02, 0.01, 20)
# ... train model, evaluate, backtest

Development

# Clone repository
git clone https://github.com/applied-ai/ml4t-engineer.git
cd ml4t-engineer

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run pytest tests/ -v

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/

Testing

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_api.py

# Run with coverage
uv run pytest tests/ --cov=ml4t.engineer

# TA-Lib validation tests (requires TA-Lib)
uv run pytest tests/test_talib_validation.py

References

  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
  • López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge.
  • Easley, D., López de Prado, M., & O'Hara, M. (2012). "Flow Toxicity and Liquidity in a High-Frequency World."

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4t_engineer-0.1.0a3.tar.gz (564.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml4t_engineer-0.1.0a3-py3-none-any.whl (367.4 kB view details)

Uploaded Python 3

File details

Details for the file ml4t_engineer-0.1.0a3.tar.gz.

File metadata

  • Download URL: ml4t_engineer-0.1.0a3.tar.gz
  • Upload date:
  • Size: 564.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ml4t_engineer-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 ba1bfbb27524eea4cef37be724c0ee865a5f60764327a57b0e7ded69eaaa1cc7
MD5 a3906d469352b56c82dd32e97b89cc47
BLAKE2b-256 96dfd43bd98017dfe9bd28c13a03e8a694d12da374d0c148063648d0a2ac4fef

See more details on using hashes here.

File details

Details for the file ml4t_engineer-0.1.0a3-py3-none-any.whl.

File metadata

File hashes

Hashes for ml4t_engineer-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 f52e49e8320f9433802c99a9adc463ae10384932cd7b97c3ca381fc9e8fb3692
MD5 ae8bb4ab9068f1d8101b9b00bb55f4d4
BLAKE2b-256 3ddb5158d600cd5a19ca9a0271d627927e4e282494336da552e1f30f62984c4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page