Modern BorutaShap - Feature selection with SHAP values, NumPy 2.0+ compatible
Project description
borutashap-modern
A modernized fork of BorutaShap that works with current versions of NumPy 2.0+, SciPy, and scikit-learn. This fork includes performance improvements and bug fixes for SHAP-based feature selection.
Installation
# Install from PyPI (recommended)
pip install borutashap-modern
# With LightGBM support (recommended for speed)
pip install borutashap-modern[lightgbm]
# With all optional dependencies
pip install borutashap-modern[all]
Key Improvements
Compatibility Fixes
- NumPy 2.0+ support: Fixed deprecated
np.NaNtonp.nan - SciPy 1.11+ support: Updated
binom_testtobinomtestwith backward compatibility - Python 3.12+ support: Requires Python 3.12 or higher
Bug Fixes
- RandomForest + SHAP: Fixed 3D array handling and indexing issues
- RandomForest + Gini: Fixed premature feature_importances_ check
- Missing imports: Added required imports (inspect, defaultdict)
Performance Insights
Based on extensive benchmarking:
- LightGBM: Best overall performer (0.6s avg SHAP time, F1=0.875)
- XGBoost: Good balance (1.6s avg SHAP time, F1=0.868)
- RandomForest: Best F1 on small datasets (F1=0.935 @ 1k samples)
- GradientBoosting: Highest accuracy but slow (13s avg SHAP time)
Requirements
- Python 3.12+
- NumPy 2.0+
- pandas 2.0+
- scikit-learn 1.3+
- SHAP 0.45+
- LightGBM 4.0+ (optional, recommended)
- XGBoost 2.0+ (optional)
Quick Start
from BorutaShap import BorutaShap
from lightgbm import LGBMClassifier
from sklearn.datasets import make_classification
import pandas as pd
# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(20)])
# Initialize with LightGBM (recommended for speed)
model = LGBMClassifier(n_estimators=50, max_depth=5, verbose=-1)
# Run BorutaShap
fs = BorutaShap(
model=model,
importance_measure='shap', # or 'gini' for tree-based models
classification=True
)
fs.fit(X=X, y=y, n_trials=100, random_state=42)
# Get results
print(f"Accepted features: {fs.accepted}")
print(f"Rejected features: {fs.rejected}")
print(f"Tentative features: {fs.tentative}")
Performance Recommendations
Model Selection Guide
| Use Case | Recommended Model | F1 Score | SHAP Speed |
|---|---|---|---|
| Small data (<5k samples) | RandomForest | 0.935 | 0.15s |
| Medium data (5-50k) | LightGBM | 0.90 | 0.5-2s |
| Large data (>50k) | LightGBM | 0.89 | 2-5s |
| Best accuracy | GradientBoosting | 0.91 | 10-50s |
| Production/speed critical | LightGBM | 0.88 | <2s |
Dataset Size Impact
- Samples: More samples → better F1 (all models improve 5-9%)
- Features: More features → worse F1 (especially RandomForest: -15% from 10→200 features)
- Sweet spot: 5-10k samples with ≤50 features
Feature Importance Methods
- SHAP: More accurate but ~11x slower than Gini
- Gini: Fast but only for tree-based models (not XGBoost)
- Recommendation: Use SHAP for final models, Gini for exploration
Supported Models
✅ Fully Supported:
- LightGBM (fastest SHAP)
- XGBoost (SHAP only)
- RandomForest (both SHAP and Gini)
- ExtraTrees (both SHAP and Gini)
- GradientBoosting (both SHAP and Gini)
❌ Not Supported:
- BaggingClassifier (SHAP TreeExplainer incompatible)
- SVM, Neural Networks (no tree structure)
Testing
# Run basic test
python examples/test_basic.py
# Run performance comparison
python examples/compare_models.py
# Test with your data
python examples/test_custom.py --data your_data.csv
Changes from Original
- Fixed NumPy 2.0 compatibility (src/BorutaShap.py:L384-394)
- Fixed SciPy binomial test import (src/BorutaShap.py:L8-13)
- Fixed RandomForest SHAP 3D array handling (src/BorutaShap.py:L250-260)
- Fixed RandomForest Gini importance check (src/BorutaShap.py:L150-155)
- Added Python 3.12 support (setup.py)
- Added comprehensive benchmarks (examples/benchmark.py)
Citation
If you use this fork, please cite both the original and this fork:
# Original BorutaShap
@software{boruta_shap,
author = {Eoghan Keany},
title = {BorutaShap: A wrapper feature selection method using Boruta and SHAP},
url = {https://github.com/Ekeany/Boruta-Shap},
year = {2020}
}
# This fork
@software{boruta_shap_modern,
author = {BlackArbsCEO},
title = {BorutaShap Modern Fork: Compatible with NumPy 2.0+},
url = {https://github.com/BlackArbsCEO/Boruta-Shap},
year = {2024}
}
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Run tests with Python 3.9+
- Submit a pull request
License
MIT License (same as original)
Acknowledgments
- Original author: Eoghan Keany
- SHAP library: lundberg/shap
- Boruta algorithm: Boruta R package
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file borutashap_modern-1.1.0.tar.gz.
File metadata
- Download URL: borutashap_modern-1.1.0.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f8c752e8b331c789e32436efbc6ca6d73393f9b2fca0f9c114b0b712390e49c
|
|
| MD5 |
85aa25793cfeb68896375c72098616b2
|
|
| BLAKE2b-256 |
83a050ad5a33ad3653fd685b853d7aeac00e64d9ea3c57250260caf2ec99e5ee
|
File details
Details for the file borutashap_modern-1.1.0-py3-none-any.whl.
File metadata
- Download URL: borutashap_modern-1.1.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6450b49e3614990ec8d05028d4198e449059eefe3cfaf0d804ef491491d09ab8
|
|
| MD5 |
1b2f650d01b3c58fccdb38806ee53050
|
|
| BLAKE2b-256 |
7f3f1ec77469e24c5cfbd8e209563fba96b36050148715f216442a72513fda31
|