Ensemble feature selection with bootstrapping, heterogeneous selectors, and stability analysis.

These details have not been verified by PyPI

Project links

Project description

pyensemblefs: a multi-threading Python library for ensemble feature selection.

This repository hosts pyensemblefs, a Python library for ensemble feature selection. Supports heterogeneous ensembles, bootstrapped evaluation, and stability analysis across feature selectors.
It assists researchers in feature selection tasks without requiring significant programming effort.

Installation and setup

pip install pyensemblefs

To download the source code, you can clone it from the GitHub repository:

git clone git@github.com:cdchushig/pyensemblefs.git

Requirements: Python ≥ 3.9, scikit-learn ≥ 1.2

Library Highlights

pyensemblefs automatically extracts relevant features in datasets using bootstrapping and ensemble aggregation.

Intuitive, reproducible workflows: compatible with scikit-learn pipelines.
Comprehensive documentation: each feature selection and aggregation method is fully described.
Extensible architecture: easily add custom selection or aggregation strategies.

Main Features

Bootstrap-based ensemble selection: assess variability across resamples.
Heterogeneous ensembles: combine different feature selectors (e.g., ANOVA, MI, Chi²).
Unified aggregator API: aggregate results from scores, ranks, or binary supports.
Visualization tools: plot selection frequency, consensus ranks, and stability matrices.
Stability metrics: compute indices such as Kuncheva, Jaccard, or Spearman correlation.
Extensible design: register custom selectors and aggregators via a single factory call.

Get started

Example using a built-in dataset and a simple configuration:

import pyensemblefs
from pyensemblefs.datasets import load_pima_dataset

# Load dataset
df = load_pima_dataset()

# Retrieve a pre-defined configuration (e.g., Relief filter)
cfg = pyensemblefs.get_config('relief', n_bootrap=100, fnc_aggregation='voting')

# Compute feature scores
df_feature_scores = pyensemblefs.compute_scores(cfg, df)

# Extract the most relevant features
df_filtered = pyensemblefs.extract_features(n_max_features=10)

Heterogeneous ensemble feature selection

from sklearn.datasets import load_breast_cancer
from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif, chi2
from pyensemblefs.ensemble.metabootstrapper import MetaBootstrapper

X, y = load_breast_cancer(return_X_y=True)

fs_methods = [
    SelectKBest(score_func=f_classif, k=10),
    SelectKBest(score_func=mutual_info_classif, k=10),
    SelectKBest(score_func=chi2, k=10),
]

# Assign higher weight to ANOVA
weights = {"SelectKBest": 2.0}

boot = MetaBootstrapper(
    fs_methods,
    n_bootstraps=20,
    n_jobs=4,
    random_state=42,
    strategy="random",
    normalize_scores=True,
    method_weights=weights,
    verbose=True
)

boot.fit(X, y)

print("First bootstrap method:", boot.results_[0]["method"])
print("Normalized + weighted scores:", boot.results_[0]["scores"][:10])

Visualization of frequency, top-k, and stability

from sklearn.datasets import load_breast_cancer
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from pyensemblefs.ensemble.bootstrapper import Bootstrapper
from pyensemblefs.aggregators.rank import MeanRankAggregator
from pyensemblefs.aggregators.score import MeanAggregator
from pyensemblefs.viz.visualizer import Visualizer

X, y = load_breast_cancer(return_X_y=True)
fs = SelectKBest(score_func=mutual_info_classif, k=10)
boot = Bootstrapper(fs, n_bootstraps=25, n_jobs=2)
boot.fit(X, y)

# Aggregators
rank_agg = MeanRankAggregator(top_k=10).fit(boot.results_)
mean_agg = MeanAggregator(top_k=10).fit(boot.results_)

# Visualizations
Visualizer.feature_frequency(boot.results_, n_features=X.shape[1], top_k=15)
Visualizer.consensus_ranking(rank_agg.final_ranking_, top_k=10)
Visualizer.stability_heatmap(boot.results_, n_features=X.shape[1])
Visualizer.compare_aggregators({
    "RankAggregator": rank_agg.final_ranking_,
    "MeanAggregator": mean_agg.final_ranking_,
}, top_k=10)

All figures are automatically saved under ./images/.

Stability Metrics

Stability analysis quantifies how consistent the selected features remain across bootstrap samples.

from pyensemblefs.stats.stability import StabilityEvaluator

stab = StabilityEvaluator(metric="kuncheva")
stability_score = stab.compute(boot.results_binary_)
print("Stability (Kuncheva):", stability_score)

Available metrics include Jaccard,Dice, Ochiai, Hamming, Novovicova, Davis, Lustgartn, Phi, Kappa, Nogueira, Yu, and Zucknick. They can be directly compared between homogeneous and heterogeneous ensembles.

Usage examples (scikit-learn compatible)

from sklearn.datasets import load_breast_cancer
from sklearn.feature_selection import SelectKBest, f_classif
from src.ensemble.bootstrapper import Bootstrapper
from src.aggregators.score import MeanAggregator
from src.aggregators.rank import MeanRankAggregator

X, y = load_breast_cancer(return_X_y=True)

fs = SelectKBest(score_func=f_classif, k=10)
boot = Bootstrapper(fs, n_bootstraps=30, n_jobs=4, random_state=42)
boot.fit(X, y)

# Aggregate scores and ranks
mean_agg = MeanAggregator().fit(boot.results_)
rank_agg = MeanRankAggregator().fit(boot.results_)

print("Consensus scores:", mean_agg.scores_[:10])
print("Consensus ranks:", rank_agg.rank_[:10])

from sklearn.feature_selection import SelectKBest, chi2
from src.ensemble.bootstrapper import Bootstrapper
from src.aggregators.score import MeanAggregator

fs = SelectKBest(score_func=chi2, k=5)
boot = Bootstrapper(fs, n_bootstraps=15, n_jobs=2)
boot.fit(X, y)

mean_agg = MeanAggregator().fit(boot.results_)
print("Consensus Scores (Chi2):", mean_agg.scores_)

How It Fits Together

Data → Bootstrapper / MetaBootstrapper
↓
Aggregators (Score / Rank / Subset)
↓
Visualizer / StabilityEvaluator → Reports

Citation

If you use pyensemblefs in academic work, please cite:

@software{pyensemblefs2025,
author = {Chushig-Muzo, C.D. and collaborators},
title = {pyensemblefs: Ensemble Feature Selection Library},
year = {2025},
url = {https://github.com/cdchushig/pyensemblefs} }

License

This project is licensed under the MIT License – see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.13

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensemblefs-0.3.13.tar.gz (63.0 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyensemblefs-0.3.13-py3-none-any.whl (80.6 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file pyensemblefs-0.3.13.tar.gz.

File metadata

Download URL: pyensemblefs-0.3.13.tar.gz
Upload date: Nov 18, 2025
Size: 63.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pyensemblefs-0.3.13.tar.gz
Algorithm	Hash digest
SHA256	`c2658209714596013c13824d3592b9715e7c38a70d338d855323e8e380faaade`
MD5	`ba6e59732b7a831a905502df2b4b3e78`
BLAKE2b-256	`c8ddc1c5e281736248c64c6ccf81ef1826fdc61a76f223a1666896b094bd689b`

See more details on using hashes here.

File details

Details for the file pyensemblefs-0.3.13-py3-none-any.whl.

File metadata

Download URL: pyensemblefs-0.3.13-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 80.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pyensemblefs-0.3.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02d5232edf841061841365a84942b9078df1d1940466e69674759ea70dc9c6c7`
MD5	`e8d12cce8ae1dc76a9985adb209fae89`
BLAKE2b-256	`da7bd2db9f739eb1d32dc5e1a884f9cb05b25e180cb70255ed77be3ed97a8ea8`

See more details on using hashes here.

pyensemblefs 0.3.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyensemblefs: a multi-threading Python library for ensemble feature selection.

Installation and setup

Library Highlights

Main Features

Get started

Heterogeneous ensemble feature selection

Visualization of frequency, top-k, and stability

Stability Metrics

Usage examples (scikit-learn compatible)

How It Fits Together

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes