Skip to main content

Feature ranking ensemble using L1 penalization, random forests, XGBoost, ANOVA F-scores, and mutual information

Project description

featureranker

A lightweight Python package for robust feature importance ranking using an ensemble of methods with weighted voting.

The ensemble combines L1 penalization, random forests, XGBoost, ANOVA F-scores, and mutual information to rank feature importance for both classification and regression tasks.

Featured in:

Installation

pip install featureranker

Quick Start

from sklearn.datasets import load_breast_cancer
from featureranker import get_data, feature_ranking, voting
from featureranker.plots import plot_after_vote, plot_rankings

# Load and prepare data
cancer = load_breast_cancer(as_frame=True)
df = cancer.data.merge(cancer.target, left_index=True, right_index=True)
X, y = get_data(df, target="target")

# Rank features using all five methods
rankings = feature_ranking(X, y, task="classification")

# Aggregate with weighted voting
scoring = voting(rankings)

# Visualize
plot_rankings(rankings, title="All methods")
plot_after_vote(scoring, title="Ensemble ranking")

Parallel execution

Speed up ranking by running methods in parallel:

rankings = feature_ranking(X, y, task="classification", n_jobs=-1)

Custom method selection and weights

rankings = feature_ranking(X, y, task="classification", choices=["mi", "f_test", "l1"])
scoring = voting(rankings, weights=[0.2, 0.4, 0.4])

Voting methods

Three aggregation schemes are available:

scoring = voting(rankings, method="reciprocal_rank")  # default: weight * (1/rank)
scoring = voting(rankings, method="borda")             # weight * (n_features - rank)
scoring = voting(rankings, method="exponential")       # weight * exp(-rank / n_features)

Regression

from sklearn.datasets import load_diabetes

diabetes = load_diabetes(as_frame=True)
df = diabetes.data.merge(diabetes.target, left_index=True, right_index=True)
X, y = get_data(df, target="target")
rankings = feature_ranking(X, y, task="regression")
scoring = voting(rankings)

Ranking Methods

Key Method How it works
rf Random Forest Feature importances from a tuned RandomForest model
xg XGBoost Feature importances from a tuned XGBoost model
mi Mutual Information Statistical dependency between each feature and target
f_test ANOVA F-test Variance-based scoring (f_classif / f_regression)
l1 L1 Regularization Regularization path analysis (lasso / logistic L1)

Documentation

See the full API documentation and example notebook.

Development

git clone https://github.com/lhallee/feature-ranker.git
cd feature-ranker
pip install -e ".[dev]"
pytest tests/ -v

Citation

@article{Hallee2023,
  title = {Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life},
  volume = {13},
  ISSN = {2045-2322},
  url = {http://dx.doi.org/10.1038/s41598-023-28965-7},
  DOI = {10.1038/s41598-023-28965-7},
  number = {1},
  journal = {Scientific Reports},
  publisher = {Springer Science and Business Media LLC},
  author = {Hallee, Logan and Khomtchouk, Bohdan B.},
  year = {2023},
  month = feb
}
@article{Hallee2023cds,
  title = {cdsBERT - Extending Protein Language Models with Codon Awareness},
  url = {http://dx.doi.org/10.1101/2023.09.15.558027},
  DOI = {10.1101/2023.09.15.558027},
  publisher = {Cold Spring Harbor Laboratory},
  author = {Hallee, Logan and Rafailidis, Nikolaos and Gleghorn, Jason P.},
  year = {2023},
  month = sep
}

License

CC-BY-NC-SA-4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featureranker-2.0.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featureranker-2.0.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file featureranker-2.0.0.tar.gz.

File metadata

  • Download URL: featureranker-2.0.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for featureranker-2.0.0.tar.gz
Algorithm Hash digest
SHA256 2e54c5ce1d1cfd50c08191cc0c411e950936d8845bef089b81d239e589668110
MD5 07dcd489ec08328b3a60ff5805a9487d
BLAKE2b-256 ab3f47c6febfc3393612dcff66b9ef4e359a6e56433615e9ed7bc2fe18db60b5

See more details on using hashes here.

File details

Details for the file featureranker-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: featureranker-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for featureranker-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a118472e6e22fd8adae4111fe64d09964d83895e1d6c1d5a71039c71441867cf
MD5 854f3358adb8568f496be2d0afb09c86
BLAKE2b-256 2ee97f661b41664e147d71cfc82298dec8b85f9d3a845818ad3450e5b8b8b24c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page