Benchmarking framework for Feature Selection algorithms 🚀
Project description
fseval
A Feature Selector and Feature Ranker benchmarking library. Neatly integrates with Weights and Biases and Sci-kit Learn. Uses Hydra as a config parser.
Install
pip install fseval
Usage
fseval is run via a CLI. As an example, this runs a very simple benchmark:
fseval +dataset=synclf_easy +estimator@ranker=chi2 +estimator@validator=decision_tree
Which runs Chi2 feature ranking on the 'Iris' dataset, and validates feature subsets using k-NN. The results can be uploaded to a backend. We can use wandb for this.
Weights and Biases integration
Integration with wandb is built-in. Create an account and login to the CLI with wandb login
. Then, we can upload results like so:
fseval [...] callbacks="[wandb]" +callbacks.wandb.project=fseval-readme
Replace [...]
with your dataset, ranker and validator config. This runs an experiment and uploads the results to wandb:
We can now explore the results on the online dashboard:
✨
To see all the configurable options, run:
fseval --help
Running bootstraps
Bootstraps can be run, to approximate the stability of an algorithm. Bootstrapping works by creating multiple dataset permutations and running the algorithm on each of them. A simple way to create dataset permutations is to resample with replacement.
In fseval, bootstrapping can be configured like so:
fseval [...] resample=bootstrap n_bootstraps=8
To run the entire experiment 8 times, each for a resampled dataset. Ideally, when multiple processors are used, the number of bootstraps is set to an amount that is divisible by the amount of CPU's. For example:
fseval [...] resample=bootstrap n_bootstraps=8 n_jobs=4
would cause all 8 CPU's to be utilized efficiently.
When using bootstraps, all results in the dashboard will be aggregated over all bootstraps. ✨
Multiprocessing
The experiment can run in parallel. The list of bootstraps is distributed over the CPU's. To use all available processors:
fseval [...] n_jobs=-1
Alternatively, set n_jobs
to the specific amount of processors to use. e.g. n_jobs=4
if you have a quad-core.
Configuring a Feature Ranker
Setting hyper-parameters of an estimator is easy. For example:
fseval [...] +validator.classifier.estimator.criterion=entropy
Changes the Decision Tree criterion to entropy.
Built-in Feature Rankers
A collection of feature rankers are already built-in, which can be used without further configuring. Others need their dependencies installed. List of rankers:
Ranker | Dependency | Command line argument |
---|---|---|
ANOVA F-Value | <no dep> | +estimator@ranker=anova_f_value |
Boruta | pip install Boruta |
+estimator@ranker=boruta |
Chi2 | <no dep> | +estimator@ranker=chi2 |
Decision Tree | <no dep> | +estimator@ranker=decision_tree |
FeatBoost | pip install git+https://github.com/amjams/FeatBoost.git |
+estimator@ranker=featboost |
MultiSURF | pip install skrebate |
+estimator@ranker=multisurf |
Mutual Info | <no dep> | +estimator@ranker=mutual_info |
ReliefF | pip install skrebate |
+estimator@ranker=relieff |
Stability Selection | pip install git+https://github.com/dunnkers/stability-selection.git matplotlib (ℹ️) |
+estimator@ranker=stability_selection |
TabNet | pip install pytorch-tabnet |
+estimator@ranker=tabnet |
XGBoost | pip install xgboost |
+estimator@ranker=xgb |
Infinite Selection | pip install git+https://github.com/dunnkers/infinite-selection.git (ℹ️) |
+estimator@ranker=infinite_selection |
ℹ️ This library was customized to make it compatible with the fseval pipeline.
If you would like to install simply all dependencies, download the fseval requirements.txt file and run pip install -r requirements.txt
.
About
Built by Jeroen Overschie as part of the Masters Thesis (Data Science and Computational Complexity track at the University of Groningen).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.