Temporal-aware Boruta feature selection for quantitative finance. OOS-only importance with purged cross-validation.
Project description
boruta-quant
Temporal-aware Boruta feature selection for quantitative finance.
boruta-quant computes feature importance on validation data only, using purged cross-validation to prevent lookahead bias. Built for financial time series where temporal integrity matters.
Why boruta-quant?
Standard feature selection (SHAP, sklearn permutation importance) computes importance on training data. In financial time series, this leaks future information into feature rankings. boruta-quant fixes this:
- OOS-Only Importance: Importance computed exclusively on validation folds
- Purged Cross-Validation: Train/test gap with purge and embargo windows
- Shadow Features: Boruta's all-relevant selection via shadow comparison
Installation
# Basic (permutation importance only)
pip install boruta-quant
# With LightGBM
pip install boruta-quant[lightgbm]
# With SHAP support
pip install boruta-quant[shap]
# Everything
pip install boruta-quant[all]
Development
git clone https://github.com/BlackArbsCEO/boruta-quant.git
cd boruta-quant
uv sync --all-extras --dev
Quick Start
from boruta_quant import BorutaSelector, BorutaSelectorConfig
from boruta_quant.oracle import PermutationImportanceOracle
from boruta_quant.temporal import PurgedTemporalCV, PurgedCVConfig
from boruta_quant.metrics import rank_ic_scorer
from lightgbm import LGBMRegressor
# 1. Configure purged temporal CV
cv = PurgedTemporalCV(PurgedCVConfig(
n_splits=5,
purge_window_days=5, # gap before validation fold
embargo_window_days=5, # gap after validation fold
min_train_size=100,
test_size_ratio=0.2,
))
# 2. Configure importance oracle (OOS-only)
oracle = PermutationImportanceOracle(
scoring=rank_ic_scorer, # Spearman rank correlation
n_repeats=10,
random_state=42,
)
# 3. Configure Boruta selector
selector = BorutaSelector(
config=BorutaSelectorConfig(
n_trials=20, # Boruta iterations
percentile=100, # shadow threshold percentile
alpha=0.05, # significance level
two_step=True, # resolve tentative features
random_state=42,
),
oracle=oracle,
cv=cv,
)
# 4. Fit — model goes here, not in the constructor
result = selector.fit(
X, y,
timestamps=timestamps, # must be timezone-aware
model=LGBMRegressor(n_estimators=100, random_state=42),
)
# 5. Results
print(result.accepted_features) # confirmed important
print(result.rejected_features) # confirmed unimportant
print(result.tentative_features) # borderline (resolved if two_step=True)
Importance Oracles
All oracles fit the model on training data but measure importance on validation data only.
| Oracle | How it works | When to use |
|---|---|---|
PermutationImportanceOracle |
Shuffles one feature in validation set, measures prediction drop | Default — reliable, no refit needed |
DropColumnImportanceOracle |
Removes feature, refits model, measures prediction drop | When refit cost is acceptable |
BlockPermutationImportanceOracle |
Block-shuffles feature (preserves autocorrelation structure) | Autocorrelated time series |
Temporal Cross-Validation
Training Purge Validation Embargo
|--------------| |-------| |----------| |-------|
^ ^
train_start embargo_end
- Purge: removes observations that could leak into validation
- Embargo: prevents information from validation bleeding forward
Shadow Shuffle Modes
Shadow features are shuffled copies of real features. The shuffle mode controls how temporal structure is handled:
| Mode | Description | Use case |
|---|---|---|
ShuffleMode.RANDOM |
Standard i.i.d. permutation | Default — i.i.d. data |
ShuffleMode.BLOCK |
Block-preserving shuffle | Autocorrelated features |
ShuffleMode.ERA |
Shuffle within eras only | Regime-aware selection |
Metrics
| Function | Description |
|---|---|
rank_ic |
Spearman correlation between predictions and actuals |
rank_ic_scorer |
sklearn-compatible scorer wrapping rank_ic |
directional_accuracy |
Fraction of correct sign predictions (up vs down) |
directional_accuracy_scorer |
sklearn-compatible scorer wrapping directional_accuracy |
auc_score |
Area under ROC curve |
auc_scorer |
sklearn-compatible scorer wrapping auc_score |
Design Principles
- OOS-Only: Importance never computed on training data
- Fail-Fast: Invalid temporal data (naive timestamps, unsorted) raises immediately
- Type-Safe: Runtime enforcement with beartype, strict Pyright
- Explicit: All config parameters required — no hidden defaults
References
- Boruta Algorithm — Kursa & Rudnicki (2010)
- Advances in Financial Machine Learning — Lopez de Prado (2018), Ch. 7 (purged CV)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boruta_quant-0.1.0.tar.gz.
File metadata
- Download URL: boruta_quant-0.1.0.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74d086a38157c77eb0f43856ea838cfa364b6f807b87b83e3876edc6c11d6897
|
|
| MD5 |
56bdd14c2eb93cd1905dd43f2ca414d8
|
|
| BLAKE2b-256 |
76b25791ca7862c4f203c4638b061565b90f22035071ac64cd05062f137eeeae
|
File details
Details for the file boruta_quant-0.1.0-py3-none-any.whl.
File metadata
- Download URL: boruta_quant-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8375c8b6c016f554337b15322b07815371e734cae2a871fdb32adc7fd30a3ae9
|
|
| MD5 |
43ed49e45be61a7f168f1c9f6cfae350
|
|
| BLAKE2b-256 |
55066b9dd0d970e05963bff6ac7963dad66c0f0040d1ed66221e5d9c01301887
|