A robust feature selection library using ensemble mRMR and optional refinement.

These details have not been verified by PyPI

Project links

Homepage

Project description

pyrobustfs

A robust feature selection library for Python, leveraging ensemble Minimum Redundancy Maximum Relevance (mRMR) with an optional refinement step. Designed for seamless integration into scikit-learn pipelines.

Features

Ensemble mRMR: Improves robustness and stability of feature selection by running mRMR on bootstrapped/subsampled data and aggregating results.
Scikit-learn Compatibility: Implements BaseEstimator and TransformerMixin for easy integration into scikit-learn pipelines, GridSearchCV, and RandomizedSearchCV.
Flexible Refinement: Allows for an optional second stage of model-specific feature selection using any scikit-learn compatible estimator (e.g., RFE, SelectFromModel).
Classification and Regression Support: Handles both classification (using mutual information for classification) and regression (using mutual information for regression) tasks.

Installation

Currently, pyrobustfs is not yet available on PyPI. You can install it directly from the source code:

Clone the repository:

git clone https://github.com/yourusername/pyrobustfs.git
cd pyrobustfs

Install in editable mode (for development) or standard mode:

# For development (changes to code are immediately reflected)
pip install -e .

# For standard installation
# pip install .

Usage

Basic Feature Selection

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from pyrobustfs.selectors import RobustMRMRSelector

# Generate synthetic classification data
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10, n_redundant=5, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and fit the selector
# Select 5 features using 10 ensemble runs for a classification task
selector = RobustMRMRSelector(n_features_to_select=5, n_ensembles=10, classification=True, random_state=42)
selector.fit(X_train, y_train)

# Get the names of the selected features
selected_features = selector.get_feature_names_out()
print(f"Selected features: {selected_features}")

# Transform the data to keep only the selected features
X_train_selected = selector.transform(X_train)
X_test_selected = selector.transform(X_test)

print(f"Original X_train shape: {X_train.shape}")
print(f"Transformed X_train shape: {X_train_selected.shape}")

Using with a Refiner Estimator

You can provide an optional refiner_estimator for a second stage of feature selection. This is useful for model-specific refinement.

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE # Recursive Feature Elimination
from pyrobustfs.selectors import RobustMRMRSelector

# Generate synthetic classification data
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10, n_redundant=5, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define a refiner estimator (e.g., RFE with Logistic Regression)
# The refiner will operate on the features pre-selected by the ensemble mRMR.
refiner = RFE(estimator=LogisticRegression(solver='liblinear', random_state=42), n_features_to_select=3)

# Initialize RobustMRMRSelector with the refiner
selector_with_refiner = RobustMRMRSelector(
    n_features_to_select=5, # This is the target for ensemble mRMR, refiner might override
    n_ensembles=10,
    refiner_estimator=refiner,
    classification=True,
    random_state=42
)

selector_with_refiner.fit(X_train, y_train)
selected_features_refiner = selector_with_refiner.get_feature_names_out()
print(f"Selected features (with refiner): {selected_features_refiner}")
print(f"Number of features selected by refiner: {len(selected_features_refiner)}")

# Transform data
X_train_refined = selector_with_refiner.transform(X_train)
print(f"Transformed X_train shape (with refiner): {X_train_refined.shape}")

Integrating into a Scikit-learn Pipeline

RobustMRMRSelector can be seamlessly integrated into a scikit-learn Pipeline.

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from pyrobustfs.selectors import RobustMRMRSelector

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10, n_redundant=5, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a pipeline with feature selection and a classifier
pipeline = Pipeline([
    ('feature_selection', RobustMRMRSelector(n_features_to_select=5, n_ensembles=10, classification=True, random_state=42)),
    ('classifier', LogisticRegression(solver='liblinear', random_state=42))
])

# Fit the pipeline
pipeline.fit(X_train, y_train)

# Evaluate the pipeline
accuracy = pipeline.score(X_test, y_test)
print(f"Pipeline accuracy: {accuracy:.4f}")

# Access selected features from the pipeline step
selected_features_pipeline = pipeline.named_steps['feature_selection'].get_feature_names_out()
print(f"Selected features from pipeline: {selected_features_pipeline}")

Development

To contribute or run tests, clone the repository and install in editable mode:

git clone https://github.com/yourusername/pyrobustfs.git
cd pyrobustfs
pip install -e .
pip install pytest
pytest

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Jul 3, 2025

This version

0.1.0

Jul 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrobustfs-0.1.0.tar.gz (11.9 kB view details)

Uploaded Jul 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyrobustfs-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Jul 3, 2025 Python 3

File details

Details for the file pyrobustfs-0.1.0.tar.gz.

File metadata

Download URL: pyrobustfs-0.1.0.tar.gz
Upload date: Jul 3, 2025
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pyrobustfs-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`218a862cf0d9476c4a024b16f28a951c10c5d8617c2bad49b96236d812407fc5`
MD5	`73dc6b5c627e317a37d9665c2079b02c`
BLAKE2b-256	`18e754fb261e7f294cb798a5ed60e851d034830b5d54754d371c8ae2efbc4ef1`

See more details on using hashes here.

File details

Details for the file pyrobustfs-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyrobustfs-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2025
Size: 9.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pyrobustfs-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f9e5f6b56c6146557b154dc442ebe1303f14579ef473c3720a8c5857cb51872`
MD5	`6d6fc6a3a93712d3e29765c29e4a8e07`
BLAKE2b-256	`ed0a4d9fab688ae81b105c2f5301064667bfb4adbe59dc27efe7912d5552b38d`

See more details on using hashes here.

pyrobustfs 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

pyrobustfs

Features

Installation

Usage

Basic Feature Selection

Using with a Refiner Estimator

Integrating into a Scikit-learn Pipeline

Development

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes