Skip to main content

Feature ranking ensemble

Project description

FEATURE RANKER

featureranker is a lightweight Python package for the feature ranking ensemble developed by Logan Hallee, featured in the following works:

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

cdsBERT - Extending Protein Language Models with Codon Awareness

Exploring Phylogenetic Classification and Further Applications of Codon Usage Frequencies

The ensemble utilizes l1 penalization, random forests, extreme gradient boosting, ANOVA F values, and mutual information to effectively rank the importance of features for regression and classification tasks. Scoring lists are concatenated with a weighted voting scheme.

Usage

Install

!pip install featureranker

Imports

from featureranker.utils import *
from featureranker.plots import *
from featureranker.rankers import *

import pandas as pd
from sklearn.datasets import load_diabetes, load_breast_cancer
import warnings
warnings.filterwarnings('ignore')

Regression example (diabetes dataset)

diabetes = load_diabetes(as_frame=True)
df = diabetes.data.merge(diabetes.target, left_index=True, right_index=True)
view_data(df)
X, y = get_data(df, labels='target')
rankings = regression_ranking(X, y, predict=False)
scoring = voting(rankings)
plot_rankings(rankings, title='Regression example all methods')
plot_after_vote(scoring, title='Regression example full ensemble')

image image

Classification example (breast cancer dataset)

cancer = load_breast_cancer(as_frame=True)
df = cancer.data.merge(cancer.target, left_index=True, right_index=True)
view_data(df)
X, y = get_data(df, labels='target')
rankings = classification_ranking(X, y, predict=False)
scoring = voting(rankings)
plot_rankings(rankings, title='Classification example all methods')
plot_after_vote(scoring, title='Classification example full ensemble')

image image

More examples

Documentation

See documentation via the link above for more details

ISSUES WITH GOOGLE COLAB

The numpy / linux build on Google Colab does not always work when installing featureranker on collab. Simply upgrade numpy and restart the session to fix featureranker.

Citation

Please cite Hallee, L., Khomtchouk, B.B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 13, 2088 (2023). https://doi.org/10.1038/s41598-023-28965-7

and

Logan Hallee, Nikolaos Rafailidis, Jason P. Gleghorn bioRxiv 2023.09.15.558027; doi: https://doi.org/10.1101/2023.09.15.558027

News

  • 1/22/2024: Version 1.1.0 is released with faster solvers, many more settings, and more plots. 1.1.1 fixes some bugs.
  • 1/3/2024: Version 1.0.2 is released with added clustering capabilities and better automatic plots.
  • 11/10/2023: Version 1.0.1 is published in PyPI under featureranker.
  • 11/9/2023: Version 1.0.0 of the package is published for testing on TestPyPI.
  • 11/8/2023: Various utility helpers and plot functions are added for ease of use. The proper l1 penalty constant is now found automatically. The automatic hyperparameter search also returns the best metrics found via the methodologies.
  • 11/7/2023: Recursive feature extraction is replaced with ANOVA F-scores due to its ability to rank based on modeled variance.
  • 10/15/2023: A separate classification and regression version are developed for more reliable results. Logistic regression (OvR) with an l1 penalty takes the place of lasso for classification.
  • 9/17/2023: The feature ranker is now a proper ensemble, with a custom soft voting scheme. XGboost, recursive feature elimination, and mutual information are also leveraged. The ensemble is used to unify the results of the previous papers in the cdsBERT paper.
  • 2/6/2023: The preliminary work makes its way into Nature Scientific Reports!
  • 7/21/2022: A preliminary version of this feature ranker leveraging lasso and random forests is published in BioRxiv for phylogenetic and organelle prediction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featureranker-1.2.2.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

featureranker-1.2.2-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file featureranker-1.2.2.tar.gz.

File metadata

  • Download URL: featureranker-1.2.2.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for featureranker-1.2.2.tar.gz
Algorithm Hash digest
SHA256 f54a3bd9f12ca0bc0c5e32def17eed24308cb4afccdb5ee4397b0c9d0a0e1134
MD5 ddcd231b1934d78f6d0428ca6f593a62
BLAKE2b-256 e6df4cbb30bcd841e39b5388f5a8d88e1835700a7de1dada14be44666d8e3af3

See more details on using hashes here.

File details

Details for the file featureranker-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for featureranker-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 21be4462b2bcff2cd667796bfb09b82282f5ba76cf263b033de4903e5f8f7500
MD5 5030f22f2ad3d8b89556e770d85acf99
BLAKE2b-256 05dd95fa4873c4af3b3be76c66a8e1447a01c64d3949579475990bd70b90ce38

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page