A Sci-Kit Learn compatible Numba and CUDA-accelerated implementation of various feature selection algorithms.

These details have not been verified by PyPI

Project links

Project description

Fast-Select: Accelerated Feature Selection for Modern Datasets

A high-performance Python library powered by Numba and CUDA, offering accelerated algorithms for feature selection. Initially built to optimize the complete Relief family of algorithms, fast-select aims to expand and accelerate a wide range of feature selection methods to empower machine learning on large-scale datasets.

Key Features

Blazing Fast Performance: Leverages Numba for JIT compilation, Joblib for multi-core parallelism, and Numba CUDA for GPU acceleration, providing unmatched performance while scaling with modern hardware.
ML Pipeline Integration: Fully compatible with Scikit-Learn, making it easy to fit into any machine learning pipeline with a familiar .fit(), .transform(), .fit_transform() interface.
Flexible Backends: Offers dual execution modes for both CPU (Joblib) and GPU (CUDA). Automatically detects hardware with an easy-to-use backend parameter.
Feature-Rich Implementation: Provides lightning-fast implementations of ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, and TuRF—with plans to support additional feature selection algorithms in future releases.
Lightweight & Simple: Avoids heavy dependencies like TensorFlow or PyTorch while delivering state-of-the-art acceleration for feature selection workflows.

Installation
Quickstart
Backend Selection
Benchmarking Highlights
Algorithm Implementations
Future Directions
Contributing
License
How to Cite
Acknowledgments

Installation

Install fast-select directly from PyPI:

pip install fast-select

For development versions (with testing and documentation dependencies):

git clone https://github.com/GavinLynch04/FastSelect.git
cd fast-select
pip install -e .[dev]

Quickstart

Using fast-select is simple and seamless for anyone familiar with Scikit-Learn.

from fast_select.estimators import MultiSURF
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression  # Example classifier

# 1. Generate a synthetic dataset
X, y = make_classification(
    n_samples=500, 
    n_features=1000, 
    n_informative=20, 
    n_redundant=100, 
    random_state=42
)

# 2. Use the MultiSURF estimator to select the top 15 features
selector = MultiSURF(n_features_to_select=15)
X_selected = selector.fit_transform(X, y)
print(f"Original feature count: {X.shape[1]}")
print(f"Selected feature count: {X_selected.shape[1]}")
print(f"Top 15 feature indices: {selector.top_features_}")

# 3. Integrate into a Scikit-Learn Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('feature_selector', MultiSURF(n_features_to_select=10, backend='cpu')),
    ('classifier', LogisticRegression())
])

# Fit the pipeline (now featuring fast feature selection!)
# pipeline.fit(X, y)

Backend Selection (CPU vs. GPU)

You can control the computational backend with the backend parameter during initialization:

backend='auto': Automatically detects if an NVIDIA GPU is available. Falls back to CPU if a GPU is not available.
backend='gpu': Explicitly runs on GPU. Will raise a RuntimeError if no compatible GPU is found.
backend='cpu': Forces CPU computations, even if a GPU is available.

Example usage:

# Force CPU usage
cpu_selector = MultiSURF(n_features_to_select=10, backend='cpu')

# Force GPU usage
gpu_selector = MultiSURF(n_features_to_select=10, backend='gpu')

Benchmarking Highlights

Fast-Select delivers groundbreaking improvements in runtime and memory efficiency. Benchmarks show 50-100x speed-ups compared to scikit-rebate and R's CORElearn library, particularly on large datasets exceeding 10,000 samples and features. Benchmarking scripts are available in the repository for further testing.

Runtime vs. Number of Samples (n >> p)

Runtime Benchmark N-Dominant

Runtime vs. Number of Features (p >> n)

Memory Benchmark P-Dominant

Algorithm Implementations

Currently supported:

Relief-Family Algorithms:
- ReliefF
- SURF
- SURF*
- MultiSURF
- MultiSURF*
- TuRF

Future plans include additional feature selection algorithms, such as wrappers, embedded methods, and more filter-based approaches.

Contributing

Contributions are highly encouraged. Whether you're fixing bugs, improving performance, or proposing new algorithms, your work is invaluable. Please ensure your submissions include relevant test cases and documentation updates.

License

This project is licensed under the MIT License. See the LICENSE file for full details.

How to Cite

If you use fast-select in your research, please cite the software:

Citing the Software (Specific Version):

Use the version-specific DOI provided via Zenodo on the GitHub Release Page.

Acknowledgments

This library builds on the exceptional work of the following:

The Numba team for enabling JIT compilation and GPU acceleration.
The scikit-rebate authors for their inspiring Relief-based library.
The original researchers behind the Relief algorithms for their foundational contributions to feature selection.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Aug 1, 2025

0.1.5

Jul 21, 2025

0.1.4

Jul 21, 2025

This version

0.1.3

Jul 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_select-0.1.3.tar.gz (22.6 kB view details)

Uploaded Jul 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fast_select-0.1.3-py3-none-any.whl (20.5 kB view details)

Uploaded Jul 21, 2025 Python 3

File details

Details for the file fast_select-0.1.3.tar.gz.

File metadata

Download URL: fast_select-0.1.3.tar.gz
Upload date: Jul 21, 2025
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fast_select-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`d5851209a76fd7336b3fa54661b8083c655e5a0b3b7f2fa563afdd4f3616f132`
MD5	`fed91b51c460a499b509f2d3d26820d7`
BLAKE2b-256	`b0b619a596a0df203a8e6fa8295c0c8b747662370b88d51eb49535d3e946b7fa`

See more details on using hashes here.

File details

Details for the file fast_select-0.1.3-py3-none-any.whl.

File metadata

Download URL: fast_select-0.1.3-py3-none-any.whl
Upload date: Jul 21, 2025
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fast_select-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40166475005dd1d37fe983a8c250aa5f13949c9e318b407a0fea80bb6ff95022`
MD5	`c5d4540a292f1419b286a4438b4e55ee`
BLAKE2b-256	`b9c30c7c8a47e9851e1bde4d38224086e38fe24cec5452355b2331b4cf16fd6d`

See more details on using hashes here.

fast-select 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast-Select: Accelerated Feature Selection for Modern Datasets

Key Features

Table of Contents

Installation

Quickstart

Backend Selection (CPU vs. GPU)

Benchmarking Highlights

Runtime vs. Number of Samples (n >> p)

Runtime vs. Number of Features (p >> n)

Algorithm Implementations

Contributing

License

How to Cite

** Citing the Software (Specific Version):**

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Citing the Software (Specific Version):