Skip to main content

Meta-estimators that combine feature selectors with classifiers and regressors

Project description

sklearn-selector-pipeline

A scikit-learn compatible package that provides meta-estimators for seamlessly combining feature selectors with classifiers and regressors into single pipeline components.

Features

  • 🔧 Seamless Integration: Works with any sklearn-compatible feature selector and classifier/regressor
  • 🚀 Full sklearn API: Supports fit, predict, predict_proba, decision_function, score, and transform
  • 📊 Incremental Learning: Supports partial_fit for online learning scenarios
  • 🎯 Parameter Forwarding: Forward fit parameters to selector and classifier/regressor using prefixes
  • 🔄 Pipeline Compatible: Can be used inside sklearn pipelines
  • 🧪 Extensively Tested: Comprehensive test suite ensuring reliability
  • 📈 Dual Support: Separate classes for classification and regression tasks

Installation

pip install sklearn-selector-pipeline

For development installation:

pip install sklearn-selector-pipeline[dev]

Quick Start

Classification Example

from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn_selector_pipeline import FeatureSelectorClassifier

# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create the meta-estimator
selector = SelectKBest(score_func=f_classif, k=10)
classifier = RandomForestClassifier(n_estimators=100, random_state=42)
meta_clf = FeatureSelectorClassifier(feature_selector=selector, classifier=classifier)

# Fit and predict
meta_clf.fit(X_train, y_train)
predictions = meta_clf.predict(X_test)
probabilities = meta_clf.predict_proba(X_test)
accuracy = meta_clf.score(X_test, y_test)

print(f"Accuracy: {accuracy:.3f}")
print(f"Selected features shape: {meta_clf.transform(X_test).shape}")

Regression Example

from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn_selector_pipeline import FeatureSelectorRegressor

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create the meta-estimator
selector = SelectKBest(score_func=f_regression, k=10)
regressor = RandomForestRegressor(n_estimators=100, random_state=42)
meta_reg = FeatureSelectorRegressor(feature_selector=selector, regressor=regressor)

# Fit and predict
meta_reg.fit(X_train, y_train)
predictions = meta_reg.predict(X_test)
r2_score = meta_reg.score(X_test, y_test)

print(f"R² Score: {r2_score:.3f}")
print(f"Selected features shape: {meta_reg.transform(X_test).shape}")

Advanced Usage

Parameter Forwarding

Use prefixes to pass parameters specifically to the selector or classifier/regressor:

# Classification
meta_clf.fit(X_train, y_train, 
             selector__k=15,  # parameter for SelectKBest
             classifier__sample_weight=sample_weights)  # parameter for classifier

# Regression  
meta_reg.fit(X_train, y_train,
             selector__k=8,  # parameter for SelectKBest
             regressor__sample_weight=sample_weights)  # parameter for regressor

Partial Fit for Online Learning

from sklearn.linear_model import SGDClassifier, SGDRegressor

# Classification with online learning
selector = SelectKBest(k=10)
online_clf = SGDClassifier()
meta_clf = FeatureSelectorClassifier(selector, online_clf)

for X_batch, y_batch in data_batches:
    meta_clf.partial_fit(X_batch, y_batch, classes=np.unique(y))

# Regression with online learning  
selector = SelectKBest(k=10)
online_reg = SGDRegressor()
meta_reg = FeatureSelectorRegressor(selector, online_reg)

for X_batch, y_batch in data_batches:
    meta_reg.partial_fit(X_batch, y_batch)

Usage in Pipelines

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Classification pipeline
clf_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('feature_clf', FeatureSelectorClassifier(selector, classifier))
])

# Regression pipeline
reg_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('feature_reg', FeatureSelectorRegressor(selector, regressor))
])

API Reference

FeatureSelectorClassifier

Parameters:

  • feature_selector: Any sklearn-compatible feature selector
  • classifier: Any sklearn-compatible classifier

Methods:

  • fit(X, y, **fit_params): Fit the selector then the classifier
  • predict(X): Make predictions
  • predict_proba(X): Predict class probabilities (if supported)
  • decision_function(X): Get decision function values (if supported)
  • transform(X): Transform features using the fitted selector
  • score(X, y): Return accuracy score
  • partial_fit(X, y, classes=None, **fit_params): Incremental fit

FeatureSelectorRegressor

Parameters:

  • feature_selector: Any sklearn-compatible feature selector
  • regressor: Any sklearn-compatible regressor

Methods:

  • fit(X, y, **fit_params): Fit the selector then the regressor
  • predict(X): Make predictions
  • transform(X): Transform features using the fitted selector
  • score(X, y): Return R² score
  • partial_fit(X, y, **fit_params): Incremental fit

Examples

Check out the examples/ directory for comprehensive examples:

  • Basic classification and regression usage
  • Genetic Algorithm feature selector example
  • Evaluation on real datasets with visualization

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Citation

@software{sklearn_selector_pipeline,
  author = {Debajyati},
  title = {sklearn-selector-pipeline: Meta-estimators for combining feature selectors with classifiers and regressors},
  url = {https://github.com/Debajyati/sklearn-selector-pipeline},
  version = {0.1.0},
  year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_selector_pipeline-0.1.0.tar.gz (293.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklearn_selector_pipeline-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file sklearn_selector_pipeline-0.1.0.tar.gz.

File metadata

File hashes

Hashes for sklearn_selector_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6bd988e02d5c05e8f9c9a0570be02957cf6a4804daebee15c91a46d6adec8a1c
MD5 add3f3c985b8bb6fd53b7ef822a13672
BLAKE2b-256 66f3fde8aa00461c1fe351edced16c2ae321f56cb0f81916717590050fff385c

See more details on using hashes here.

File details

Details for the file sklearn_selector_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_selector_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a42536dde8102799df08cd05b0b0ab6b057587874531d4d81982252e97bfff54
MD5 68a42ff5281435bca30b4daf07bb8777
BLAKE2b-256 38eeb13344aea40febcc99d7aa78adc7b139f8fb4ba3b081fc4e54bf4612c23d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page