Evaluation and Benchmark Tool for Feature Selection
Project description
FSEval – Feature Selection Evaluation Suite
FSEval is a lightweight, modular Python library designed to benchmark feature selection and feature ranking methods across multiple datasets using both supervised and unsupervised downstream evaluation protocols.
It helps researchers and practitioners answer the question:
"Which feature selection method actually works best for my type of data and task?"
FSEval automates:
- Repeated training & evaluation at different feature subset sizes
- Stochastic method averaging
- Result persistence & incremental updates
- Support for both classification and clustering-based evaluation
📦 Dependencies and Requirements
FSEval requires:
python>=3.8numpypandasscikit-learnscipyclustpy(only needed forunsupervised_clustering_accuracy)
💡 Installation
You can just download the source code and import fseval, or you can install it using pip:
pip install sdufseval
🚀 Quick Example
from sdufseval import FSEVAL
import numpy as np
if __name__ == "__main__":
# The 23 benchmark datasets
DATASETS_TO_RUN = [
'ALLAML', 'CLL_SUB_111', 'COIL20', 'Carcinom', 'GLIOMA', 'GLI_85',
'Isolet', 'ORL', 'Prostate_GE', 'SMK_CAN_187', 'TOX_171', 'Yale',
'arcene', 'colon', 'gisette', 'leukemia', 'lung', 'lung_discrete',
'madelon', 'orlraws10P', 'pixraw10P', 'warpAR10P', 'warpPIE10P'
]
# Initialize FSEVAL
evaluator = FSEVAL(output_dir="benchmark_results", avg_steps=10)
# Configuration for methods
methods_list = [
{
'name': 'Random',
'stochastic': True,
'func': evaluator.random_baseline
},
{
'name': 'Variance_Baseline',
'stochastic': False,
'func': lambda X: np.var(X, axis=0)
}
]
# --- 1. Run Standard Benchmark ---
# Evaluates methods on real-world datasets across different feature scales
evaluator.run(DATASETS_TO_RUN, methods_list)
# --- 2. Run Runtime Analysis ---
# Performs scalability testing on synthetic data with a time cap.
# vary_param='both' triggers both 'features' and 'instances' experiments.
print("\n>>> Starting Scalability Analysis...")
evaluator.timer(
methods=methods_list,
vary_param='both',
time_limit=3600 # 1 hour limit
)
Data Loading
load_dataset(dataset_name, data_dir="datasets") supports:
- Single .mat file with keys 'X' and 'Y'
- Two CSV files: {name}_X.csv and {name}_y.csv
📚 API Reference
🛠️ FSEval(output_dir="results", cv=5, avg_steps=10, eval_type="both", metrics=None, experiments=None)
Initializes the evalutation and benchmark object.
| Parameter | Default | Description |
|---|---|---|
output_dir |
results | Folder where CSV result files are saved. |
cv |
5 | Cross-validation folds (supervised only). |
avg_steps |
10 | Number of repetitions for stochastic methods. |
supervised_iter |
5 | Number of classifier's runs with different random seeds. |
unsupervised_iter |
10 | Number of clustering runs with different random seeds. |
eval_type |
both | "supervised", "unsupervised", or "both". |
metrics |
["CLSACC", "NMI", "ACC", "AUC"] | Evaluation metrics to calculate. |
experiments |
["10Percent", "100Percent"] | Which feature ratio grids to evaluate. |
⚙️ run(datasets, methods, classifier=None)
Initializes the evalutation and benchmark object.
| Argument | Type | Description |
|---|---|---|
datasets |
List[str] | Dataset names loadable via load_dataset(). |
methods |
List[dict] | "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]" |
classifier |
sklearn classifier | Classifier for supervised eval (default: RandomForestClassifier) |
⚙️ timer(methods, vary_param='features', time_limit=3600)
Runs a runtime analysis on the methods.
| Argument | Type | Description |
|---|---|---|
methods |
List[dict] | "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]" |
vary_param |
["CLSACC", "NMI", "ACC", "AUC"] | "features", "instances", or "both". |
time_limit |
3600 | Terminate the method after reecording first time it exceeds this limit. |
Dashboard
There is a Feature Selection Evaluation Dashboard based on the benchmarks provided by FSEVAL, available on:
The dashboard offers a collection of useful analytic tools to provide comprehensive and comparative insights into the performance of your feature selection method(s).
Citation
If you use FSEVAL in your research, please cite the original paper:
CITATION WILL BE PROVIDED UPON PUBLICATION.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdufseval-1.0.4.tar.gz.
File metadata
- Download URL: sdufseval-1.0.4.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
906a71a638023dc424b997ab74fd8dc7ffa7d48ef10fa12909e55b55e57bf67b
|
|
| MD5 |
017b57ffdcbcec852b84935aaa220133
|
|
| BLAKE2b-256 |
2cb7200ebfe72d8d7d55d8a2aeb46888d32a4333af47feb57f7df5f4fa601553
|
File details
Details for the file sdufseval-1.0.4-py3-none-any.whl.
File metadata
- Download URL: sdufseval-1.0.4-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9275fa5ec81ed4e94f6aa1580c7b04f873411695011431b2cc18831f26e57aef
|
|
| MD5 |
454fcc32715c19f4530a9c459b59e525
|
|
| BLAKE2b-256 |
dcb85b66aedb428cfaf388cda715de172cbb42152e9270bf6bd90f8ac4a54674
|