Skip to main content

Temporal-aware Boruta feature selection for quantitative finance. OOS-only importance with purged cross-validation.

Project description

boruta-quant

Python 3.12+ License: MIT Ruff

Temporal-aware Boruta feature selection for quantitative finance.

boruta-quant computes feature importance on validation data only, using purged cross-validation to prevent lookahead bias. Built for financial time series where temporal integrity matters.

Why boruta-quant?

Standard feature selection (SHAP, sklearn permutation importance) computes importance on training data. In financial time series, this leaks future information into feature rankings. boruta-quant fixes this:

  • OOS-Only Importance: Importance computed exclusively on validation folds
  • Purged Cross-Validation: Train/test gap with purge and embargo windows
  • Shadow Features: Boruta's all-relevant selection via shadow comparison

Installation

# Basic (permutation importance only)
pip install boruta-quant

# With LightGBM
pip install boruta-quant[lightgbm]

# With SHAP support
pip install boruta-quant[shap]

# Everything
pip install boruta-quant[all]

Development

git clone https://github.com/BlackArbsCEO/boruta-quant.git
cd boruta-quant
uv sync --all-extras --dev

Quick Start

from boruta_quant import BorutaSelector, BorutaSelectorConfig
from boruta_quant.oracle import PermutationImportanceOracle
from boruta_quant.temporal import PurgedTemporalCV, PurgedCVConfig
from boruta_quant.metrics import rank_ic_scorer
from lightgbm import LGBMRegressor

# 1. Configure purged temporal CV
cv = PurgedTemporalCV(PurgedCVConfig(
    n_splits=5,
    purge_window_days=5,      # gap before validation fold
    embargo_window_days=5,    # gap after validation fold
    min_train_size=100,
    test_size_ratio=0.2,
))

# 2. Configure importance oracle (OOS-only)
oracle = PermutationImportanceOracle(
    scoring=rank_ic_scorer,   # Spearman rank correlation
    n_repeats=10,
    random_state=42,
)

# 3. Configure Boruta selector
selector = BorutaSelector(
    config=BorutaSelectorConfig(
        n_trials=20,          # Boruta iterations
        percentile=100,       # shadow threshold percentile
        alpha=0.05,           # significance level
        two_step=True,        # resolve tentative features
        random_state=42,
    ),
    oracle=oracle,
    cv=cv,
)

# 4. Fit — model goes here, not in the constructor
result = selector.fit(
    X, y,
    timestamps=timestamps,   # must be timezone-aware
    model=LGBMRegressor(n_estimators=100, random_state=42),
)

# 5. Results
print(result.accepted_features)    # confirmed important
print(result.rejected_features)    # confirmed unimportant
print(result.tentative_features)   # borderline (resolved if two_step=True)

Importance Oracles

All oracles fit the model on training data but measure importance on validation data only.

Oracle How it works When to use
PermutationImportanceOracle Shuffles one feature in validation set, measures prediction drop Default — reliable, no refit needed
DropColumnImportanceOracle Removes feature, refits model, measures prediction drop When refit cost is acceptable
BlockPermutationImportanceOracle Block-shuffles feature (preserves autocorrelation structure) Autocorrelated time series

Temporal Cross-Validation

     Training         Purge   Validation   Embargo
  |--------------| |-------| |----------| |-------|
  ^                                                ^
  train_start                               embargo_end

- Purge: removes observations that could leak into validation
- Embargo: prevents information from validation bleeding forward

Shadow Shuffle Modes

Shadow features are shuffled copies of real features. The shuffle mode controls how temporal structure is handled:

Mode Description Use case
ShuffleMode.RANDOM Standard i.i.d. permutation Default — i.i.d. data
ShuffleMode.BLOCK Block-preserving shuffle Autocorrelated features
ShuffleMode.ERA Shuffle within eras only Regime-aware selection

Metrics

Function Description
rank_ic Spearman correlation between predictions and actuals
rank_ic_scorer sklearn-compatible scorer wrapping rank_ic
directional_accuracy Fraction of correct sign predictions (up vs down)
directional_accuracy_scorer sklearn-compatible scorer wrapping directional_accuracy
auc_score Area under ROC curve
auc_scorer sklearn-compatible scorer wrapping auc_score

Design Principles

  1. OOS-Only: Importance never computed on training data
  2. Fail-Fast: Invalid temporal data (naive timestamps, unsorted) raises immediately
  3. Type-Safe: Runtime enforcement with beartype, strict Pyright
  4. Explicit: All config parameters required — no hidden defaults

References

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boruta_quant-0.1.0.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boruta_quant-0.1.0-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file boruta_quant-0.1.0.tar.gz.

File metadata

  • Download URL: boruta_quant-0.1.0.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for boruta_quant-0.1.0.tar.gz
Algorithm Hash digest
SHA256 74d086a38157c77eb0f43856ea838cfa364b6f807b87b83e3876edc6c11d6897
MD5 56bdd14c2eb93cd1905dd43f2ca414d8
BLAKE2b-256 76b25791ca7862c4f203c4638b061565b90f22035071ac64cd05062f137eeeae

See more details on using hashes here.

File details

Details for the file boruta_quant-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for boruta_quant-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8375c8b6c016f554337b15322b07815371e734cae2a871fdb32adc7fd30a3ae9
MD5 43ed49e45be61a7f168f1c9f6cfae350
BLAKE2b-256 55066b9dd0d970e05963bff6ac7963dad66c0f0040d1ed66221e5d9c01301887

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page