Skip to main content

Feature ranking ensemble

Project description

FEATURE RANKER

featureranker is a lightweight Python package for the feature ranking ensemble developed by Logan Hallee, featured in the following works:

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

cdsBERT - Extending Protein Language Models with Codon Awareness

Exploring Phylogenetic Classification and Further Applications of Codon Usage Frequencies

The ensemble utilizes l1 penalization, random forests, extreme gradient boosting, ANOVA F values, and mutual information to effectively rank the importance of features for regression and classification tasks. Scoring lists are concatenated with a weighted voting scheme.

Usage

Install

!pip install featureranker

Example use

Documentation

ISSUES WITH GOOGLE COLAB

The numpy / linux build on Google Colab does not always work when installing featureranker on collab. Simply upgrade numpy and restart the session to fix featureranker.

Citation

Please cite

@article{Hallee2023,
  title = {Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life},
  volume = {13},
  ISSN = {2045-2322},
  url = {http://dx.doi.org/10.1038/s41598-023-28965-7},
  DOI = {10.1038/s41598-023-28965-7},
  number = {1},
  journal = {Scientific Reports},
  publisher = {Springer Science and Business Media LLC},
  author = {Hallee,  Logan and Khomtchouk,  Bohdan B.},
  year = {2023},
  month = feb 
}
@article{Hallee2023,
  title = {cdsBERT - Extending Protein Language Models with Codon Awareness},
  url = {http://dx.doi.org/10.1101/2023.09.15.558027},
  DOI = {10.1101/2023.09.15.558027},
  publisher = {Cold Spring Harbor Laboratory},
  author = {Hallee,  Logan and Rafailidis,  Nikolaos and Gleghorn,  Jason P.},
  year = {2023},
  month = sep 
}

News

  • 3/20/2025: Version 1.3.0 is released with improved runtime, documentation, and examples.
  • 10/22/2024: Versions 1.2.0 - 1.2.2 are released with improvements and bug fixes.
  • 1/22/2024: Version 1.1.0 is released with faster solvers, many more settings, and more plots. 1.1.1 fixes some bugs.
  • 1/3/2024: Version 1.0.2 is released with added clustering capabilities and better automatic plots.
  • 11/10/2023: Version 1.0.1 is published in PyPI under featureranker.
  • 11/9/2023: Version 1.0.0 of the package is published for testing on TestPyPI.
  • 11/8/2023: Various utility helpers and plot functions are added for ease of use. The proper l1 penalty constant is now found automatically. The automatic hyperparameter search also returns the best metrics found via the methodologies.
  • 11/7/2023: Recursive feature extraction is replaced with ANOVA F-scores due to its ability to rank based on modeled variance.
  • 10/15/2023: A separate classification and regression version are developed for more reliable results. Logistic regression (OvR) with an l1 penalty takes the place of lasso for classification.
  • 9/17/2023: The feature ranker is now a proper ensemble, with a custom soft voting scheme. XGboost, recursive feature elimination, and mutual information are also leveraged. The ensemble is used to unify the results of the previous papers in the cdsBERT paper.
  • 2/6/2023: The preliminary work makes its way into Nature Scientific Reports!
  • 7/21/2022: A preliminary version of this feature ranker leveraging lasso and random forests is published in BioRxiv for phylogenetic and organelle prediction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featureranker-1.3.2.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featureranker-1.3.2-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file featureranker-1.3.2.tar.gz.

File metadata

  • Download URL: featureranker-1.3.2.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for featureranker-1.3.2.tar.gz
Algorithm Hash digest
SHA256 45cfccccf0c2b991a8b1866150d13937e1cd96131dacb80e58fcfbaababaf5cf
MD5 a50fbf4deb84423113eba9dd01af1ca5
BLAKE2b-256 d95b148b3d1e1298454928520ff7cd923c158712d59c78f932f15aafebe9aadc

See more details on using hashes here.

File details

Details for the file featureranker-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: featureranker-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for featureranker-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0ef6d69c1d3c197f4448dffa31801351d8dece21c43b28186db4a482c7ef598b
MD5 d6654ef58e4da59b14c165c18b7f596d
BLAKE2b-256 42e97ccf33bca6ecf6e8396b5faadb3faefde5c0dd5c3fb837dedc38854a10a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page