A collection of lego bricks for scikit-learn pipelines

These details have not been verified by PyPI

Project links

Project description

scikit-lego

We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project started as a collaboration between multiple companies in the Netherlands but has since received contributions from around the globe. It was initiated by Matthijs Brouns and Vincent D. Warmerdam as a tool to teach people how to contribute to open source.

Note that we're not formally affiliated with the scikit-learn project at all, but we aim to strictly adhere to their standards.

The same holds with lego. LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this project.

Installation

Install scikit-lego via pip with

python -m pip install scikit-lego

Via conda with

conda install -c conda-forge scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

python -m pip install -e ".[dev]"
python setup.py develop

Documentation

The documentation can be found here.

Usage

We offer custom metrics, models and transformers. You can import them just like you would in scikit-learn.

# the scikit learn stuff we love
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# from scikit lego stuff we add
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())
])

...

Features

Here's a list of features that this library currently offers:

sklego.datasets.load_abalone loads in the abalone dataset
sklego.datasets.load_arrests loads in a dataset with fairness concerns
sklego.datasets.load_chicken loads in the joyful chickweight dataset
sklego.datasets.load_heroes loads a heroes of the storm dataset
sklego.datasets.load_hearts loads a dataset about hearts
sklego.datasets.load_penguins loads a lovely dataset about penguins
sklego.datasets.fetch_creditcard fetch a fraud dataset from openml
sklego.datasets.make_simpleseries make a simulated timeseries
sklego.pandas_utils.add_lags adds lag values in a pandas dataframe
sklego.pandas_utils.log_step a useful decorator to log your pipeline steps
sklego.dummy.RandomRegressor dummy benchmark that predicts random values
sklego.linear_model.DeadZoneRegressor experimental feature that has a deadzone in the cost function
sklego.linear_model.DemographicParityClassifier logistic classifier constrained on demographic parity
sklego.linear_model.EqualOpportunityClassifier logistic classifier constrained on equal opportunity
sklego.linear_model.ProbWeightRegression linear model that treats coefficients as probabilistic weights
sklego.linear_model.LowessRegression locally weighted linear regression
sklego.linear_model.LADRegression least absolute deviation regression
sklego.linear_model.QuantileRegression linear quantile regression, generalizes LADRegression
sklego.linear_model.ImbalancedLinearRegression punish over/under-estimation of a model directly
sklego.naive_bayes.GaussianMixtureNB classifies by training a 1D GMM per column per class
sklego.naive_bayes.BayesianGaussianMixtureNB classifies by training a bayesian 1D GMM per class
sklego.mixture.BayesianGMMClassifier classifies by training a bayesian GMM per class
sklego.mixture.BayesianGMMOutlierDetector detects outliers based on a trained bayesian GMM
sklego.mixture.GMMClassifier classifies by training a GMM per class
sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
sklego.meta.ConfusionBalancer experimental feature that allows you to balance the confusion matrix
sklego.meta.DecayEstimator adds decay to the sample_weight that the model accepts
sklego.meta.EstimatorTransformer adds a model output as a feature
sklego.meta.OutlierClassifier turns outlier models into classifiers for gridsearch
sklego.meta.GroupedPredictor can split the data into runs and run a model on each
sklego.meta.GroupedTransformer can split the data into runs and run a transformer on each
sklego.meta.SubjectiveClassifier experimental feature to add a prior to your classifier
sklego.meta.Thresholder meta model that allows you to gridsearch over the threshold
sklego.meta.RegressionOutlierDetector meta model that finds outliers by adding a threshold to regression
sklego.meta.ZeroInflatedRegressor predicts zero or applies a regression based on a classifier
sklego.preprocessing.ColumnCapper limits extreme values of the model features
sklego.preprocessing.ColumnDropper drops a column from pandas
sklego.preprocessing.ColumnSelector selects columns based on column name
sklego.preprocessing.InformationFilter transformer that can de-correlate features
sklego.preprocessing.IdentityTransformer returns the same data, allows for concatenating pipelines
sklego.preprocessing.OrthogonalTransformer makes all features linearly independent
sklego.preprocessing.TypeSelector selects columns based on type
sklego.preprocessing.RandomAdder adds randomness in training
sklego.preprocessing.RepeatingBasisFunction repeating feature engineering, useful for timeseries
sklego.preprocessing.DictMapper assign numeric values on categorical columns
sklego.preprocessing.OutlierRemover experimental method to remove outliers during training
sklego.preprocessing.MonotonicSplineTransformer re-uses SplineTransformer in an attempt to make monotonic features
sklego.model_selection.GroupTimeSeriesSplit timeseries Kfold for groups with different amount of observations per group
sklego.model_selection.KlusterFoldValidation experimental feature that does K folds based on clustering
sklego.model_selection.TimeGapSplit timeseries Kfold with a gap between train/test
sklego.pipeline.DebugPipeline adds debug information to make debugging easier
sklego.pipeline.make_debug_pipeline shorthand function to create a debugable pipeline
sklego.metrics.correlation_score calculates correlation between model output and feature
sklego.metrics.equal_opportunity_score calculates equal opportunity metric
sklego.metrics.p_percent_score proxy for model fairness with regards to sensitive attribute
sklego.metrics.subset_score calculate a score on a subset of your data (meant for fairness tracking)

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

any new feature contributes towards a demonstrable real-world usecase
any new feature passes standard unit tests (we use the ones from scikit-learn)
the feature has been discussed in the issue list beforehand

We automate all of our testing and use pre-commit hooks to keep the code working.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.5

Apr 30, 2025

0.9.4

Dec 17, 2024

0.9.3

Nov 13, 2024

0.9.2

Oct 31, 2024

0.9.1

Jul 10, 2024

0.9.0

May 25, 2024

0.8.2

Apr 16, 2024

0.8.1

Mar 19, 2024

0.8.0

Mar 19, 2024

0.7.4

Jan 29, 2024

0.7.3 yanked

Jan 29, 2024

Reason this release was yanked:

Typo caused wheel to be broken

0.7.2 yanked

Jan 29, 2024

Reason this release was yanked:

meta submodule is missing

0.7.1 yanked

Jan 29, 2024

Reason this release was yanked:

meta submodule is missing

0.7.0

Dec 12, 2023

0.6.16

Oct 17, 2023

0.6.15

Jul 18, 2023

0.6.14

Nov 2, 2022

0.6.13

Sep 10, 2022

0.6.12

Jun 5, 2022

0.6.11

Apr 20, 2022

0.6.10

Mar 12, 2022

0.6.9

Dec 9, 2021

0.6.8

Jul 3, 2021

0.6.7

May 7, 2021

0.6.6

Apr 2, 2021

0.6.5

Mar 16, 2021

0.6.4

Mar 1, 2021

0.6.3

Jan 4, 2021

0.6.2

Oct 25, 2020

0.6.1

Sep 22, 2020

0.6.0

Sep 7, 2020

0.5.2

Jul 31, 2020

0.5.1

Jul 8, 2020

0.5.0

May 31, 2020

0.4.4

May 26, 2020

0.4.3

May 13, 2020

0.4.2

May 3, 2020

0.4.1

Apr 5, 2020

0.4.0

Feb 12, 2020

0.3.4

Jan 17, 2020

0.3.3

Oct 23, 2019

0.3.2

Oct 18, 2019

0.3.1

Sep 22, 2019

0.3.0

Aug 23, 2019

0.2.1

Jul 25, 2019

0.2.0

Jul 13, 2019

0.1.8

Jun 19, 2019

0.1.7

May 26, 2019

0.1.6

May 5, 2019

0.1.5

Apr 24, 2019

0.1.4

Apr 7, 2019

0.1.3

Apr 4, 2019

0.1.2

Mar 31, 2019

0.1.0

Feb 27, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-lego-0.9.5.tar.gz (191.5 kB view details)

Uploaded Apr 30, 2025 Source

Built Distribution

scikit_lego-0.9.5-py2.py3-none-any.whl (224.8 kB view details)

Uploaded Apr 30, 2025 Python 2Python 3

File details

Details for the file scikit-lego-0.9.5.tar.gz.

File metadata

Download URL: scikit-lego-0.9.5.tar.gz
Upload date: Apr 30, 2025
Size: 191.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for scikit-lego-0.9.5.tar.gz
Algorithm	Hash digest
SHA256	`dc2d7032fc994fd58897f9445d560cd2142beb36c5c8f1f2cfb925753cf0bef8`
MD5	`df597b955914582483dd516dbf7e8c1e`
BLAKE2b-256	`ed22d750001857884da420f5df40b3a4def6233326a5a171555e86be9f615188`

See more details on using hashes here.

File details

Details for the file scikit_lego-0.9.5-py2.py3-none-any.whl.

File metadata

Download URL: scikit_lego-0.9.5-py2.py3-none-any.whl
Upload date: Apr 30, 2025
Size: 224.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for scikit_lego-0.9.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1bbee7efca847dd0bcff5cb0dd36f07ac4cd4dccf321a55e8633cd28a4e2795`
MD5	`ffda1426196f0f1ef604aa980eef82af`
BLAKE2b-256	`9dbdff9f48d909d3a8b5c32bd1b743a3d4ca96c26bc6239ba3d20952e67ebc49`

See more details on using hashes here.

scikit-lego 0.9.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scikit-lego

Installation

Documentation

Usage

Features

New Features

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes