Feature ranking ensemble using L1 penalization, random forests, XGBoost, ANOVA F-scores, and mutual information
Project description
featureranker
A lightweight Python package for robust feature importance ranking using an ensemble of methods with weighted voting.
The ensemble combines L1 penalization, random forests, XGBoost, ANOVA F-scores, and mutual information to rank feature importance for both classification and regression tasks.
Featured in:
- Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life (Nature Scientific Reports, 2023)
- cdsBERT - Extending Protein Language Models with Codon Awareness (bioRxiv, 2023)
Installation
pip install featureranker
Quick Start
from sklearn.datasets import load_breast_cancer
from featureranker import get_data, feature_ranking, voting
from featureranker.plots import plot_after_vote, plot_rankings
# Load and prepare data
cancer = load_breast_cancer(as_frame=True)
df = cancer.data.merge(cancer.target, left_index=True, right_index=True)
X, y = get_data(df, target="target")
# Rank features using all five methods
rankings = feature_ranking(X, y, task="classification")
# Aggregate with weighted voting
scoring = voting(rankings)
# Visualize
plot_rankings(rankings, title="All methods")
plot_after_vote(scoring, title="Ensemble ranking")
Parallel execution
Speed up ranking by running methods in parallel:
rankings = feature_ranking(X, y, task="classification", n_jobs=-1)
Custom method selection and weights
rankings = feature_ranking(X, y, task="classification", choices=["mi", "f_test", "l1"])
scoring = voting(rankings, weights=[0.2, 0.4, 0.4])
Voting methods
Three aggregation schemes are available:
scoring = voting(rankings, method="reciprocal_rank") # default: weight * (1/rank)
scoring = voting(rankings, method="borda") # weight * (n_features - rank)
scoring = voting(rankings, method="exponential") # weight * exp(-rank / n_features)
Regression
from sklearn.datasets import load_diabetes
diabetes = load_diabetes(as_frame=True)
df = diabetes.data.merge(diabetes.target, left_index=True, right_index=True)
X, y = get_data(df, target="target")
rankings = feature_ranking(X, y, task="regression")
scoring = voting(rankings)
Ranking Methods
| Key | Method | How it works |
|---|---|---|
rf |
Random Forest | Feature importances from a tuned RandomForest model |
xg |
XGBoost | Feature importances from a tuned XGBoost model |
mi |
Mutual Information | Statistical dependency between each feature and target |
f_test |
ANOVA F-test | Variance-based scoring (f_classif / f_regression) |
l1 |
L1 Regularization | Regularization path analysis (lasso / logistic L1) |
Documentation
See the full API documentation and example notebook.
Development
git clone https://github.com/lhallee/feature-ranker.git
cd feature-ranker
pip install -e ".[dev]"
pytest tests/ -v
Citation
@article{Hallee2023,
title = {Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life},
volume = {13},
ISSN = {2045-2322},
url = {http://dx.doi.org/10.1038/s41598-023-28965-7},
DOI = {10.1038/s41598-023-28965-7},
number = {1},
journal = {Scientific Reports},
publisher = {Springer Science and Business Media LLC},
author = {Hallee, Logan and Khomtchouk, Bohdan B.},
year = {2023},
month = feb
}
@article{Hallee2023cds,
title = {cdsBERT - Extending Protein Language Models with Codon Awareness},
url = {http://dx.doi.org/10.1101/2023.09.15.558027},
DOI = {10.1101/2023.09.15.558027},
publisher = {Cold Spring Harbor Laboratory},
author = {Hallee, Logan and Rafailidis, Nikolaos and Gleghorn, Jason P.},
year = {2023},
month = sep
}
License
CC-BY-NC-SA-4.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featureranker-2.0.0.tar.gz.
File metadata
- Download URL: featureranker-2.0.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e54c5ce1d1cfd50c08191cc0c411e950936d8845bef089b81d239e589668110
|
|
| MD5 |
07dcd489ec08328b3a60ff5805a9487d
|
|
| BLAKE2b-256 |
ab3f47c6febfc3393612dcff66b9ef4e359a6e56433615e9ed7bc2fe18db60b5
|
File details
Details for the file featureranker-2.0.0-py3-none-any.whl.
File metadata
- Download URL: featureranker-2.0.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a118472e6e22fd8adae4111fe64d09964d83895e1d6c1d5a71039c71441867cf
|
|
| MD5 |
854f3358adb8568f496be2d0afb09c86
|
|
| BLAKE2b-256 |
2ee97f661b41664e147d71cfc82298dec8b85f9d3a845818ad3450e5b8b8b24c
|