Skip to main content

Evaluation and Benchmark Tool for Feature Selection

Project description

FSEval – Feature Selection Evaluation Suite

FSEval is a lightweight, modular Python library designed to benchmark feature selection and feature ranking methods across multiple datasets using both supervised and unsupervised downstream evaluation protocols.

It helps researchers and practitioners answer the question:

"Which feature selection method actually works best for my type of data and task?"

FSEval automates:

  • Repeated training & evaluation at different feature subset sizes
  • Stochastic method averaging
  • Result persistence & incremental updates
  • Support for both classification and clustering-based evaluation

📦 Dependencies and Requirements

FSEval requires:

  • python>=3.8
  • numpy
  • pandas
  • scikit-learn
  • scipy
  • clustpy (only needed for unsupervised_clustering_accuracy)
  • pcametric (only needed for AAD)

💡 Installation

You can just download the source code and import fseval, or you can install it using pip:

pip install sdufseval

🚀 Quick Example

from sdufseval import FSEVAL
import numpy as np
from sklearn.neighbors import NearestNeighbors

def snn_consistency_k5(X_orig, X_sub, y):
    """
    Calculates the average proportion of shared nearest neighbors (k=5) 
    between the original space and the feature-selected subspace.
    """
    k = 5
    k = min(k, X_orig.shape[0] - 1)
    
    def get_nn_indices(data, n_neighbors):
        nbrs = NearestNeighbors(n_neighbors=n_neighbors + 1, algorithm='auto').fit(data)
        _, indices = nbrs.kneighbors(data)
        return indices[:, 1:]

    nn_orig = get_nn_indices(X_orig, k)
    nn_sub = get_nn_indices(X_sub, k)
    
    intersections = [len(np.intersect1d(nn_orig[i], nn_sub[i])) for i in range(len(nn_orig))]
    return np.mean(intersections) / k

if __name__ == "__main__":

    DATASETS_TO_RUN = ['colon', 'leukemia', 'prostate_GE']

    evaluator = FSEVAL(
        output_dir="benchmark_results", 
        avg_steps=5,
        eval_type=["supervised", "unsupervised", "model_agnostic", "custom"],
        custom_metrics={"SNN_K5": snn_consistency_k5}
    )

    methods_list = [
        {
            'name': 'Random',
            'type': 'unsupervised', 
            'stochastic': True, 
            'func': evaluator.random_baseline
        },
        {
            'name': 'Variance_Baseline',
            'type': 'unsupervised', 
            'stochastic': False, 
            'func': lambda X: np.var(X, axis=0)
        }
    ]
    
    print(">>> Starting Integrated Evaluation (Global & Local metrics)...")
    evaluator.run(DATASETS_TO_RUN, methods_list)

    print("\n>>> Starting Scalability Analysis...")
    evaluator.timer(
        methods=methods_list, 
        vary_param='both', 
        time_limit=3600 
    )

Data Loading

load_dataset(dataset_name, data_dir="datasets") supports:

  • Single .mat file with keys 'X' and 'Y'
  • Two CSV files: {name}_X.csv and {name}_y.csv

📚 API Reference

🛠️ FSEval(output_dir="results", cv=5, avg_steps=10, eval_type=["supervised", "unsupervised", "model_agnostic"], metrics=None, experiments=None)

Initializes the evalutation and benchmark object.

Parameter Default Description
output_dir results Folder where CSV result files are saved.
cv 5 Cross-validation folds (supervised only).
avg_steps 10 Number of repetitions for stochastic methods.
supervised_iter 5 Number of classifier's runs with different random seeds.
unsupervised_iter 10 Number of clustering runs with different random seeds.
eval_type ["supervised", "unsupervised", "model_agnostic"] "supervised", "unsupervised", "model_agnostic", or "custom" to enable inclusion of custom user-defined metrics.
metrics ["CLSACC", "NMI", "ACC", "AUC", "AAD"] Evaluation metrics to calculate.
custom_metrics {} User-defined custom evaluation metrics.
experiments ["10Percent", "100Percent"] Which feature ratio grids to evaluate.
save_all False Save the results of all runs of the stochastic methods separately.

⚙️ run(datasets, methods, classifier=None)

Initializes the evalutation and benchmark object.

Argument Type Description
datasets List[str] Dataset names loadable via load_dataset().
methods List[dict] "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]"
classifier sklearn classifier Classifier for supervised eval (default: RandomForestClassifier)

⚙️ timer(methods, vary_param='features', time_limit=3600)

Runs a runtime analysis on the methods.

Argument Type Description
methods List[dict] "[{""name"": str, ""func"": callable, ""stochastic"": bool}, ...]"
vary_param ["CLSACC", "NMI", "ACC", "AUC"] "features", "instances", or "both".
time_limit 3600 Terminate the method after reecording first time it exceeds this limit.

Dashboard

There is a Feature Selection Evaluation Dashboard based on the benchmarks provided by FSEVAL, available on:

https://fseval.imada.sdu.dk/

The dashboard offers a collection of useful analytic tools to provide comprehensive and comparative insights into the performance of your feature selection method(s). You can also download the dashboard source code from the GitHub repository, which is included in the dashboard folder and use it locally.

Citation

If you use FSEVAL in your research, please cite the original paper:

CITATION WILL BE PROVIDED UPON PUBLICATION.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdufseval-1.2.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdufseval-1.2.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file sdufseval-1.2.0.tar.gz.

File metadata

  • Download URL: sdufseval-1.2.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sdufseval-1.2.0.tar.gz
Algorithm Hash digest
SHA256 a2ebcc244a4c43720dabc9c0c9c7c5b99b51b1e53a70dc1daa9a5e6bc57d4cd8
MD5 58f25bb930d815338b01b8957ef5d040
BLAKE2b-256 510264378d6a0273dc8e2c46919f5abd95513830277a05b24f3b9a277da37fa1

See more details on using hashes here.

File details

Details for the file sdufseval-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: sdufseval-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sdufseval-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0c95f70fb59d7a9ec555e95f70d5814c8eb57a837695a33d9287595e92037ae
MD5 44f5ffa0e8e12adc4cb77d5b5eb3a3af
BLAKE2b-256 d72f1a0bb83aa588fe3564f49a70528d1f5281f58a1846aee41d35df58501733

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page