Skip to main content

Benchmarking framework for Feature Selection algorithms 🚀

Project description

fseval

build status pypi badge

A Feature Selector and Feature Ranker benchmarking library. Neatly integrates with Weights and Biases and Sci-kit Learn. Uses Hydra as a config parser.

Install

pip install fseval

Usage

fseval is run via a CLI. As an example, this runs a very simple benchmark:

fseval +dataset=synclf_easy +estimator@ranker=chi2 +estimator@validator=decision_tree

Which runs Chi2 feature ranking on the 'Iris' dataset, and validates feature subsets using k-NN. The results can be uploaded to a backend. We can use wandb for this.

Weights and Biases integration

Integration with wandb is built-in. Create an account and login to the CLI with wandb login. Then, we can upload results like so:

fseval [...] callbacks="[wandb]" +callbacks.wandb.project=fseval-readme

Replace [...] with your dataset, ranker and validator config. This runs an experiment and uploads the results to wandb:

We can now explore the results on the online dashboard:

To see all the configurable options, run:

fseval --help

Running bootstraps

Bootstraps can be run, to approximate the stability of an algorithm. Bootstrapping works by creating multiple dataset permutations and running the algorithm on each of them. A simple way to create dataset permutations is to resample with replacement.

In fseval, bootstrapping can be configured like so:

fseval [...] resample=bootstrap n_bootstraps=8

To run the entire experiment 8 times, each for a resampled dataset. Ideally, when multiple processors are used, the number of bootstraps is set to an amount that is divisible by the amount of CPU's. For example:

fseval [...] resample=bootstrap n_bootstraps=8 n_jobs=4

would cause all 8 CPU's to be utilized efficiently.

When using bootstraps, all results in the dashboard will be aggregated over all bootstraps. ✨

Multiprocessing

The experiment can run in parallel. The list of bootstraps is distributed over the CPU's. To use all available processors:

fseval [...] n_jobs=-1

Alternatively, set n_jobs to the specific amount of processors to use. e.g. n_jobs=4 if you have a quad-core.

Configuring a Feature Ranker

Setting hyper-parameters of an estimator is easy. For example:

fseval [...] +validator.classifier.estimator.criterion=entropy

Changes the Decision Tree criterion to entropy.

Built-in Feature Rankers

A collection of feature rankers are already built-in, which can be used without further configuring. Others need their dependencies installed. List of rankers:

Ranker Dependency Command line argument
ANOVA F-Value <no dep> +estimator@ranker=anova_f_value
Boruta pip install Boruta +estimator@ranker=boruta
Chi2 <no dep> +estimator@ranker=chi2
Decision Tree <no dep> +estimator@ranker=decision_tree
FeatBoost pip install git+https://github.com/amjams/FeatBoost.git +estimator@ranker=featboost
MultiSURF pip install skrebate +estimator@ranker=multisurf
Mutual Info <no dep> +estimator@ranker=mutual_info
ReliefF pip install skrebate +estimator@ranker=relieff
Stability Selection pip install git+https://github.com/dunnkers/stability-selection.git matplotlib (ℹ️) +estimator@ranker=stability_selection
TabNet pip install pytorch-tabnet +estimator@ranker=tabnet
XGBoost pip install xgboost +estimator@ranker=xgb
Infinite Selection pip install git+https://github.com/dunnkers/infinite-selection.git (ℹ️) +estimator@ranker=infinite_selection

ℹ️ This library was customized to make it compatible with the fseval pipeline.

If you would like to install simply all dependencies, download the fseval requirements.txt file and run pip install -r requirements.txt.

About

Built by Jeroen Overschie as part of the Masters Thesis (Data Science and Computational Complexity track at the University of Groningen).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fseval-2.1.1.tar.gz (35.0 kB view hashes)

Uploaded Source

Built Distribution

fseval-2.1.1-py3-none-any.whl (55.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page