Skip to main content

High-performance quantitative finance feature engineering library

Project description

ml4t-engineer

Python 3.11+ License: MIT

High-performance feature engineering for financial machine learning.

ml4t-engineer provides 107+ technical indicators, triple-barrier labeling, and alternative bar sampling with a Polars-first implementation that's 10-100x faster than pandas alternatives.

Features

  • 107+ Technical Indicators: Momentum, trend, volatility, volume, and more
  • TA-Lib Validated: 59 indicators validated against TA-Lib at 1e-6 tolerance
  • Triple-Barrier Labeling: AFML-compliant labeling with ATR-based barriers
  • Alternative Bars: Volume, dollar, tick, and imbalance bars
  • Microstructure Metrics: Kyle's Lambda, VPIN, Amihud, Roll spread
  • ML-Specific Features: Fractional differencing, entropy, Hurst exponent
  • Polars-First: 10-100x faster than pandas, ~0.8x TA-Lib C speed
  • Type-Safe: Full type hints with mypy strict mode

Installation

pip install ml4t-engineer

With optional dependencies:

pip install ml4t-engineer[talib]      # TA-Lib backend
pip install ml4t-engineer[numba]      # Numba acceleration
pip install ml4t-engineer[all]        # All optional dependencies

Quick Start

import polars as pl
from ml4t.engineer import compute_features, list_features

# See available features
print(list_features("momentum"))  # RSI, MACD, Stochastic, etc.

# Load OHLCV data
df = pl.read_parquet("ohlcv.parquet")

# Compute features with default parameters
result = compute_features(df, ["rsi", "macd", "atr", "obv"])

# Or with custom parameters
result = compute_features(df, [
    {"name": "rsi", "params": {"period": 20}},
    {"name": "sma", "params": {"period": 50}},
    {"name": "bollinger_bands", "params": {"period": 20, "std_dev": 2.0}},
])

Feature Categories

Category Count Examples
Momentum 31 RSI, MACD, Stochastic, CCI, ADX, MFI
Trend 10 SMA, EMA, WMA, DEMA, TEMA, KAMA
Volatility 15 ATR, Bollinger, Yang-Zhang, GARCH
Volume 3 OBV, AD, ADOSC
Statistics 8 Variance, Linear Regression, Correlation
Math 3 MAX, MIN, SUM
Price Transform 5 Typical Price, Weighted Close
Microstructure 12 Kyle Lambda, VPIN, Amihud, Roll
ML 11 Fractional Diff, Entropy, Hurst

Triple-Barrier Labeling

from ml4t.engineer.labeling import triple_barrier_labels, atr_barriers

# Fixed barriers
labels = triple_barrier_labels(
    df,
    upper_barrier=0.02,  # 2% profit target
    lower_barrier=0.01,  # 1% stop loss
    max_holding=20,       # 20 bar horizon
)

# Dynamic ATR-based barriers
labels = atr_barriers(
    df,
    atr_period=14,
    upper_multiplier=2.0,  # 2x ATR profit target
    lower_multiplier=1.0,  # 1x ATR stop loss
    max_holding=20,
)

Alternative Bar Sampling

from ml4t.engineer.bars import volume_bars, dollar_bars, tick_imbalance_bars

# Volume bars (equal volume per bar)
vbars = volume_bars(tick_data, volume_threshold=1000)

# Dollar bars (equal dollar volume per bar)
dbars = dollar_bars(tick_data, dollar_threshold=1_000_000)

# Tick imbalance bars (information-driven)
ibars = tick_imbalance_bars(tick_data, expected_imbalance=100)

Preprocessing

from ml4t.engineer import Preprocessor, StandardScaler, RobustScaler

# Leakage-safe preprocessing
preprocessor = Preprocessor([
    StandardScaler(),
])

# Fit on train only, transform both
X_train_scaled = preprocessor.fit_transform(X_train)
X_test_scaled = preprocessor.transform(X_test)

Configuration via YAML

# features.yaml
features:
  - name: rsi
    params:
      period: 14
  - name: macd
    params:
      fast: 12
      slow: 26
      signal: 9
  - name: bollinger_bands
    params:
      period: 20
      std_dev: 2.0
result = compute_features(df, "features.yaml")

Performance

Benchmark ml4t-engineer pandas-ta Speedup
RSI (1M rows) 12ms 850ms 70x
MACD (1M rows) 18ms 1200ms 67x
Bollinger (1M rows) 15ms 920ms 61x
Triple-barrier (1M rows) 20ms N/A -

Benchmarks on M1 MacBook Pro with Polars 0.20+

API Reference

Core Functions

from ml4t.engineer import (
    compute_features,   # Compute features from config
    list_features,      # List available features
    list_categories,    # List feature categories
    describe_feature,   # Get feature metadata
)

Labeling

from ml4t.engineer.labeling import (
    triple_barrier_labels,  # Triple-barrier method
    atr_barriers,           # ATR-based barriers
    meta_labels,            # Meta-labeling
)

Bars

from ml4t.engineer.bars import (
    volume_bars,           # Volume bars
    dollar_bars,           # Dollar bars
    tick_imbalance_bars,   # Tick imbalance bars
    volume_imbalance_bars, # Volume imbalance bars
)

Preprocessing

from ml4t.engineer import (
    Preprocessor,      # Preprocessing pipeline
    StandardScaler,    # Z-score normalization
    RobustScaler,      # Robust scaling (median/IQR)
    MinMaxScaler,      # Min-max scaling
)

Integration with ML4T Libraries

ml4t-engineer is part of the ML4T library ecosystem:

from ml4t.data import DataManager
from ml4t.engineer import compute_features
from ml4t.engineer.labeling import triple_barrier_labels
from ml4t.diagnostic import Evaluator
from ml4t.backtest import Engine

# Complete workflow
data = DataManager().fetch("SPY", "2020-01-01", "2023-12-31")
features = compute_features(data, ["rsi", "macd", "atr"])
labels = triple_barrier_labels(data, 0.02, 0.01, 20)
# ... train model, evaluate, backtest

Development

# Clone repository
git clone https://github.com/applied-ai/ml4t-engineer.git
cd ml4t-engineer

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run pytest tests/ -v

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/

Testing

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_api.py

# Run with coverage
uv run pytest tests/ --cov=ml4t.engineer

# TA-Lib validation tests (requires TA-Lib)
uv run pytest tests/test_talib_validation.py

References

  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
  • López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge.
  • Easley, D., López de Prado, M., & O'Hara, M. (2012). "Flow Toxicity and Liquidity in a High-Frequency World."

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4t_engineer-0.1.0a4.tar.gz (592.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml4t_engineer-0.1.0a4-py3-none-any.whl (405.8 kB view details)

Uploaded Python 3

File details

Details for the file ml4t_engineer-0.1.0a4.tar.gz.

File metadata

  • Download URL: ml4t_engineer-0.1.0a4.tar.gz
  • Upload date:
  • Size: 592.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ml4t_engineer-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 e0684e4c49d66cf88762bdcfd6e3d3f1a721699402c1d64ef353bd9efb730f96
MD5 27240b7e5bcf4e08f0683df88f9a3f27
BLAKE2b-256 cfe3b2ab08e7753da31d04009d403d2a449f581cd18a7d0576f757ef63329f26

See more details on using hashes here.

File details

Details for the file ml4t_engineer-0.1.0a4-py3-none-any.whl.

File metadata

File hashes

Hashes for ml4t_engineer-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 38d39712572d656fd2a945dc68806673eb9de07dd1317537c799f7bb6f073d86
MD5 5dd0dd5511da405cfddfdb0955a59af3
BLAKE2b-256 e55d234d0a252cc0b03cda5a6480b51b0e6b81e830285970aa0205f1f75200ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page