Skip to main content

Dynamic recursive feature elimination utilities built on scikit-learn.

Project description

dRFEtools - dynamic Recursive Feature Elimination

dRFEtools is a package for dynamic recursive feature elimination with scikit-learn.

Authors: Apuã Paquola, Kynon Jade Benjamin, and Tarun Katipalli

Package developed in Python 3.11+.

In addition to scikit-learn, dRFEtools is also built with NumPy, SciPy, Pandas, matplotlib, plotnine, and statsmodels. Currently, dynamic RFE supports models with coef_ or feature_importances_ attribute.

This package provides several functions to run dynamic recursive feature elimination (dRFE) for random forest and linear model classifier and regression models. For random forest workflows, dRFEtools assumes Out-of-Bag (OOB) scoring is enabled. Linear-model workflows build a developmental split internally. For both classification and regression, three measurements are calculated for feature selection:

Classification:

  1. Normalized mutual information
  2. Accuracy
  3. Area under the curve (AUC) ROC curve

Regression:

  1. R2 (this can be negative if model is arbitrarily worse)
  2. Explained variance
  3. Mean squared error

Package structure

The repository is organized into focused modules to match the runtime architecture:

  • dRFEtools.py – core interfaces for random-forest and developmental-set elimination workflows.
  • scoring/ – metric implementations for developmental splits and random-forest OOB scoring.
  • lowess/ – helpers for smoothing elimination curves and extracting optimal feature counts.
  • metrics/ – feature ranking utilities used during elimination.
  • plotting.py – visualization helpers re-exported from the top-level package.
  • cli.py – command-line entry points for running full dRFE pipelines.
  • utils.py – shared helpers for normalizing results and persisting plots.

Table of Contents

  1. Citation
  2. Installation
  3. Tutorials
  4. Reference Manual
    1. Core elimination functions
    2. Ranking and scoring utilities
    3. LOWESS helpers
    4. Plotting functions
    5. Utilities and CLI

Citation

If using please cite the following:

Kynon J M Benjamin, Tarun Katipalli, Apuã C M Paquola, dRFEtools: dynamic recursive feature elimination for omics, Bioinformatics, Volume 39, Issue 8, August 2023, btad513, https://doi.org/10.1093/bioinformatics/btad513

PMID: 37632789

DOI: 10.1093/bioinformatics/btad513.

Installation

pip install --user dRFEtools

Tutorials

We have two tutorials for optimization and classification that align with the 0.4.x API documented on Read the Docs.

In addition to this, we have example code used in the manuscript for scikit-learn simulation, biological simulation, and BrainSEQ Phase 1 at the link below.

https://github.com/LieberInstitute/dRFEtools_manuscript

Reference Manual

Core elimination functions

  • rf_rfe – Runs random-forest feature elimination and returns a pair of standardized dictionaries: the full history keyed by feature count and the first elimination step. Each entry contains n_features, a metrics mapping appropriate for the task, the original indices, and the indices of surviving features.
  • dev_rfe – Performs the same elimination loop for estimators that rely on a developmental split, yielding the same standardized result structure as rf_rfe.

Ranking and scoring utilities

  • features_rank_fnc – Ranks features during elimination and optionally persists the ranking table for each fold.
  • Developmental-set metrics (dev_score_*) live under dRFEtools.scoring.dev.
  • Random-forest OOB metrics (oob_score_*) live under dRFEtools.scoring.random_forest.

LOWESS helpers

  • extract_max_lowess – Identifies the optimal feature count from the LOWESS-smoothed elimination curve.
  • extract_peripheral_lowess – Detects the inflection point associated with peripheral features.
  • optimize_lowess_plot – Visualizes the LOWESS curve with annotations about the selected feature counts.

Plotting functions

Plotting helpers are defined in dRFEtools.plotting and re-exported from the top-level package:

  • plot_metric – Render elimination trajectories for individual metrics.
  • plot_with_lowess_vline – Overlay LOWESS-derived selection cutoffs on the metric trajectory plot.

Utilities and CLI

  • normalize_rfe_result, get_feature_importances, and save_plot_variants are available under dRFEtools.utils and support the standardized dictionary-based API. ensure_path is provided to safely normalize user-supplied file paths.
  • The command-line interface in dRFEtools.cli wraps the same workflows for CSV inputs: run python -m dRFEtools.cli --help to explore available commands, including custom metric selection and development-split sizing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drfetools-0.4.0.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drfetools-0.4.0-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file drfetools-0.4.0.tar.gz.

File metadata

  • Download URL: drfetools-0.4.0.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.11 Linux/5.14.0-570.52.1.el9_6.x86_64

File hashes

Hashes for drfetools-0.4.0.tar.gz
Algorithm Hash digest
SHA256 56a33c791f3d0c7ea93e815f48609ed0bbe313cada6da64f82161f8a56424022
MD5 5c8a0461370fe514df5dbca4a149395b
BLAKE2b-256 ac86f039733af28ac51b03ef15ea82fe9a0a8798f1546aed6af3f7b2cac3d92c

See more details on using hashes here.

File details

Details for the file drfetools-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: drfetools-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.11 Linux/5.14.0-570.52.1.el9_6.x86_64

File hashes

Hashes for drfetools-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1ac628e5866c5e01654eaeca4cd64b542bcbf77d50323478f390fd1a82da4a2
MD5 6ae7fa92a34f05be97b5df1e5396ecab
BLAKE2b-256 f3873a9e7599d00f9b29a240b3c4851d5aa7bbcddf55f60292b533366f844055

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page