Skip to main content

Rank feature importance across multiple ML models.

Project description

FeatRanker

Rank feature importance across multiple ML models using permutation importance.

FeatRanker trains a configurable set of scikit-learn, XGBoost, LightGBM, and CatBoost models on your data, computes permutation importance for every trained model, and returns per-model rankings plus an aggregated average ranking.

Item Value
Package name featranker
Import module featranker
CLI command featranker
Model config featranker/importance_config.yaml
Default prep file featureCalc.py (project root)

Table of Contents


Installation

Install dependencies:

pip install -r requirements.txt

Install the package in editable (development) mode:

pip install -e .

Or install from PyPI:

pip install featranker

Requirements

  • Python ≥ 3.10
  • numpy, scikit-learn, pyyaml, tqdm, xgboost, lightgbm, catboost

How It Works

  1. Load data — A user-defined prep class returns a feature dict with a "label" key.
  2. Initialize models — Model definitions are read from importance_config.yaml and instantiated for the requested task and group.
  3. Train models — Every initialized model is fitted on the feature matrix.
  4. Rank features — Permutation importance is computed per model, and an overall average ranking is produced.

Data Preparation

Before running FeatRanker you need a prep class — a Python class with a _calc_features() method that returns your data as a dict.

Expected return format

{
    "feature_1": [v1, v2, v3, ...],
    "feature_2": [v1, v2, v3, ...],
    ...
    "label":     [y1, y2, y3, ...],
}
  • Every feature key maps to a list of numeric values.
  • All lists (including "label") must have the same length.
  • The "label" key is required.

Where to put it

Option A — Edit the default file (simplest)

Define your class in featureCalc.py at the project root. The default class name is prepFeature, but you can name it anything and select it with --prep-class.

Option B — Use a separate file (no reinstall needed)

Keep your prep logic in any Python file and point to it at runtime:

featranker --prep-file ./my_features.py --prep-class MyPrepClass --task clf

Example prep class

from sklearn.datasets import load_iris

class IrisPrep:
    def _calc_features(self):
        data = load_iris()
        features = {
            data.feature_names[i]: data.data[:, i].tolist()
            for i in range(data.data.shape[1])
        }
        features["label"] = data.target.tolist()
        return features

Quick Start

  1. Implement _calc_features() in featureCalc.py (or your own file).
  2. Run the CLI:
# Classification with all model families, using the default prepFeature class
featranker --task clf --group all

# Regression with tree models only, custom prep file and class
featranker --task reg --group tree \
    --prep-file ./my_features.py --prep-class DiabetesPrep

# Save results to a JSON file
featranker --task clf --group linear --output results

CLI Reference

featranker --task {clf,reg} [--group {linear,tree,all}]
           [--prep-file PATH] [--prep-class NAME]
           [--output PATH]
Flag Description Default
--task clf (classification) or reg (regression) required
--group linear, tree, or all (both) all
--prep-class Name of the prep class to instantiate prepFeature
--prep-file Path to the Python file containing the prep class featureCalc.py in the current working directory
--output File path for JSON output (.json appended if missing) print to stdout

Python API

Using FeatureRanker directly (default prep file)

When your default prepFeature class lives in featureCalc.py at the project root:

from featranker import FeatureRanker

ranker = FeatureRanker(task="clf", group="all")
results = ranker.rankFeatures()

Using build_ranker with a custom prep file

build_ranker is a convenience factory that returns a fully initialized FeatureRanker instance (features loaded, models trained, ready to rank):

from featranker import build_ranker

ranker = build_ranker(
    task="reg",
    group="tree",
    prep_file="./my_features.py",
    prep_class="DiabetesPrep",
)
results = ranker.rankFeatures()

Constructor parameters

Parameter Type Description
task "clf" | "reg" Classification or regression
group "linear" | "tree" | "all" Which model family to use
prep_file str or None Path to prep file (defaults to featureCalc.py)
prep_class str Name of the prep class (defaults to "prepFeature")

Model Configuration

Models are defined in featranker/importance_config.yaml, organized by task and group:

classification:
  linear:
    - name: logistic_regression
      import: sklearn.linear_model
      class: LogisticRegression
      params:
        max_iter: 2000
  tree:
    - name: random_forest
      import: sklearn.ensemble
      class: RandomForestClassifier
      params:
        random_state: 42

regression:
  linear:
    - ...
  tree:
    - ...

Each entry has four fields:

Field Description
name Display name used in output
import Python module to import (e.g., sklearn.ensemble)
class Class name to instantiate from that module
params Dict of keyword arguments passed to the constructor (optional)

Edit this file to add, remove, or tune models. Changes take effect on the next run — no reinstall required.


Available Models

Classification — Linear

Name Class
logistic_regression LogisticRegression
logistic_regression_l1 LogisticRegression (L1)
logistic_regression_l2 LogisticRegression (L2)
logistic_regression_elasticnet LogisticRegression (ElasticNet)
linear_svm LinearSVC
sgd_classifier SGDClassifier
ridge_classifier RidgeClassifier
perceptron Perceptron
passive_aggressive PassiveAggressiveClassifier
lda LinearDiscriminantAnalysis
qda QuadraticDiscriminantAnalysis
naive_bayes_gaussian GaussianNB
naive_bayes_bernoulli BernoulliNB
naive_bayes_multinomial MultinomialNB
pls_da PLSRegression

Classification — Tree

Name Class
decision_tree DecisionTreeClassifier
random_forest RandomForestClassifier
extra_trees ExtraTreesClassifier
bagging_tree BaggingClassifier
adaboost AdaBoostClassifier
gradient_boosting GradientBoostingClassifier
hist_gradient_boosting HistGradientBoostingClassifier
xgboost XGBClassifier
catboost CatBoostClassifier

Regression — Linear

Name Class
linear_regression LinearRegression
ridge_regression Ridge
lasso_regression Lasso
elasticnet_regression ElasticNet
elasticnet_cv_regression ElasticNetCV
pls_regression PLSRegression
huber_regression HuberRegressor
ransac_regression RANSACRegressor
kernel_ridge_regression KernelRidge
svr_regression SVR

Regression — Tree

Name Class
decision_tree_regressor DecisionTreeRegressor
random_forest_regressor RandomForestRegressor
extra_trees_regressor ExtraTreesRegressor
adaboost_regressor AdaBoostRegressor
gradient_boosting_regressor GradientBoostingRegressor
hist_gradient_boosting_regressor HistGradientBoostingRegressor
xgboost_regressor XGBRegressor
catboost_regressor CatBoostRegressor

Output Format

The result is a dict (or JSON object) keyed by model name, with an additional "average" entry that aggregates across all models. Each value is a list of single-entry dicts sorted by score in descending order. Scores are rounded to four decimal places.

{
  "logistic_regression": [
    {"feature_a": 0.1234},
    {"feature_b": 0.0567},
    {"feature_c": 0.0012}
  ],
  "random_forest": [
    {"feature_b": 0.0890},
    {"feature_a": 0.0745},
    {"feature_c": 0.0023}
  ],
  "average": [
    {"feature_a": 0.0990},
    {"feature_b": 0.0729},
    {"feature_c": 0.0018}
  ]
}

Troubleshooting

Symptom Cause Fix
Prep file not found FeatRanker can't locate featureCalc.py Run the command from the directory that contains featureCalc.py, or pass an explicit path with --prep-file
AttributeError: … has no attribute 'X' The prep class name doesn't match what's in the file Check spelling of --prep-class against the class defined in your prep file
'label' key missing _calc_features() didn't include a "label" entry Add features["label"] = ... to your return dict
Feature length mismatch Feature lists have different lengths Ensure every feature list and "label" have the same number of elements
Model training errors (printed, not fatal) A model failed to converge or doesn't support the data Check the printed warning; consider removing or tuning that model in importance_config.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featranker-0.1.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featranker-0.1.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file featranker-0.1.1.tar.gz.

File metadata

  • Download URL: featranker-0.1.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for featranker-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e40fc692793136b152fee8e59204e6f91063c1cbae143c67d23785a053f81360
MD5 07211356ada316950d3f7f607e8056d8
BLAKE2b-256 8789e6387deed20dd53263f02662df3fb51d9909c534238b07d6b45a47244226

See more details on using hashes here.

File details

Details for the file featranker-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: featranker-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for featranker-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb4f3850a6117fe62addd79f998ef96326b7fc8723dff0de31739b40ecf02591
MD5 09c493a0e7d85247da2426e2b32340c1
BLAKE2b-256 7ea03d142620d4dc1c01a2b46355db3dab5a8c0ca394e302f9ff90b611d14f46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page