Meta-estimators that combine feature selectors with classifiers and regressors
Project description
sklearn-selector-pipeline
A scikit-learn compatible package that provides meta-estimators for seamlessly combining feature selectors with classifiers and regressors into single pipeline components.
Features
- 🔧 Seamless Integration: Works with any sklearn-compatible feature selector and classifier/regressor
- 🚀 Full sklearn API: Supports
fit,predict,predict_proba,decision_function,score, andtransform - 📊 Incremental Learning: Supports
partial_fitfor online learning scenarios - 🎯 Parameter Forwarding: Forward fit parameters to selector and classifier/regressor using prefixes
- 🔄 Pipeline Compatible: Can be used inside sklearn pipelines
- 🧪 Extensively Tested: Comprehensive test suite ensuring reliability
- 📈 Dual Support: Separate classes for classification and regression tasks
Installation
pip install sklearn-selector-pipeline
For development installation:
pip install sklearn-selector-pipeline[dev]
Quick Start
Classification Example
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn_selector_pipeline import FeatureSelectorClassifier
# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create the meta-estimator
selector = SelectKBest(score_func=f_classif, k=10)
classifier = RandomForestClassifier(n_estimators=100, random_state=42)
meta_clf = FeatureSelectorClassifier(feature_selector=selector, classifier=classifier)
# Fit and predict
meta_clf.fit(X_train, y_train)
predictions = meta_clf.predict(X_test)
probabilities = meta_clf.predict_proba(X_test)
accuracy = meta_clf.score(X_test, y_test)
print(f"Accuracy: {accuracy:.3f}")
print(f"Selected features shape: {meta_clf.transform(X_test).shape}")
Regression Example
from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn_selector_pipeline import FeatureSelectorRegressor
# Generate sample data
X, y = make_regression(n_samples=1000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create the meta-estimator
selector = SelectKBest(score_func=f_regression, k=10)
regressor = RandomForestRegressor(n_estimators=100, random_state=42)
meta_reg = FeatureSelectorRegressor(feature_selector=selector, regressor=regressor)
# Fit and predict
meta_reg.fit(X_train, y_train)
predictions = meta_reg.predict(X_test)
r2_score = meta_reg.score(X_test, y_test)
print(f"R² Score: {r2_score:.3f}")
print(f"Selected features shape: {meta_reg.transform(X_test).shape}")
Advanced Usage
Parameter Forwarding
Use prefixes to pass parameters specifically to the selector or classifier/regressor:
# Classification
meta_clf.fit(X_train, y_train,
selector__k=15, # parameter for SelectKBest
classifier__sample_weight=sample_weights) # parameter for classifier
# Regression
meta_reg.fit(X_train, y_train,
selector__k=8, # parameter for SelectKBest
regressor__sample_weight=sample_weights) # parameter for regressor
Partial Fit for Online Learning
from sklearn.linear_model import SGDClassifier, SGDRegressor
# Classification with online learning
selector = SelectKBest(k=10)
online_clf = SGDClassifier()
meta_clf = FeatureSelectorClassifier(selector, online_clf)
for X_batch, y_batch in data_batches:
meta_clf.partial_fit(X_batch, y_batch, classes=np.unique(y))
# Regression with online learning
selector = SelectKBest(k=10)
online_reg = SGDRegressor()
meta_reg = FeatureSelectorRegressor(selector, online_reg)
for X_batch, y_batch in data_batches:
meta_reg.partial_fit(X_batch, y_batch)
Usage in Pipelines
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Classification pipeline
clf_pipeline = Pipeline([
('scaler', StandardScaler()),
('feature_clf', FeatureSelectorClassifier(selector, classifier))
])
# Regression pipeline
reg_pipeline = Pipeline([
('scaler', StandardScaler()),
('feature_reg', FeatureSelectorRegressor(selector, regressor))
])
API Reference
FeatureSelectorClassifier
Parameters:
feature_selector: Any sklearn-compatible feature selectorclassifier: Any sklearn-compatible classifier
Methods:
fit(X, y, **fit_params): Fit the selector then the classifierpredict(X): Make predictionspredict_proba(X): Predict class probabilities (if supported)decision_function(X): Get decision function values (if supported)transform(X): Transform features using the fitted selectorscore(X, y): Return accuracy scorepartial_fit(X, y, classes=None, **fit_params): Incremental fit
FeatureSelectorRegressor
Parameters:
feature_selector: Any sklearn-compatible feature selectorregressor: Any sklearn-compatible regressor
Methods:
fit(X, y, **fit_params): Fit the selector then the regressorpredict(X): Make predictionstransform(X): Transform features using the fitted selectorscore(X, y): Return R² scorepartial_fit(X, y, **fit_params): Incremental fit
Examples
Check out the examples/ directory for comprehensive examples:
- Basic classification and regression usage
- Genetic Algorithm feature selector example
- Evaluation on real datasets with visualization
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License
Citation
@software{sklearn_selector_pipeline,
author = {Debajyati},
title = {sklearn-selector-pipeline: Meta-estimators for combining feature selectors with classifiers and regressors},
url = {https://github.com/Debajyati/sklearn-selector-pipeline},
version = {0.1.2},
year = {2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sklearn_selector_pipeline-0.1.2.tar.gz.
File metadata
- Download URL: sklearn_selector_pipeline-0.1.2.tar.gz
- Upload date:
- Size: 446.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98b2ad5b11ac9c27e26eb2a9876827060f89074f049ae8f3956ab1d9b51f8bde
|
|
| MD5 |
7f19df20d4877cb600f2c24f3035a69f
|
|
| BLAKE2b-256 |
83828f27230334bcbe7141a08e3f5c371e83891cc59bcb3fe22c4147c2d4c1a1
|
File details
Details for the file sklearn_selector_pipeline-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sklearn_selector_pipeline-0.1.2-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
730fa1c3d1dd16ba76a37b1c0f99f51595ead09d7f14df2637e841519b806199
|
|
| MD5 |
dd2a4dd56b75fa8e5579a7be1a3f2096
|
|
| BLAKE2b-256 |
a8f536a1e5725de1c6c05cc4e6b63c2d5e9becb2fc4d46dd945fa75583fa33e9
|