A collection of lego bricks for scikit-learn pipelines
Project description
scikit-lego
We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project started as a collaboration between multiple companies in the Netherlands but has since received contributions from around the globe. It was initiated by Matthijs Brouns and Vincent D. Warmerdam as a tool to teach people how to contribute to open source.
Note that we're not formally affiliated with the scikit-learn project at all, but we aim to strictly adhere to their standards.
The same holds with lego. LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this project.
Installation
Install scikit-lego via pip with
python -m pip install scikit-lego
Via conda with
conda install -c conda-forge scikit-lego
Alternatively, to edit and contribute you can fork/clone and run:
python -m pip install -e ".[dev]"
python setup.py develop
Documentation
The documentation can be found here.
Usage
We offer custom metrics, models and transformers. You can import them just like you would in scikit-learn.
# the scikit learn stuff we love
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# from scikit lego stuff we add
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier
...
mod = Pipeline([
("scale", StandardScaler()),
("random_noise", RandomAdder()),
("model", GMMClassifier())
])
...
Features
Here's a list of features that this library currently offers:
sklego.datasets.load_abaloneloads in the abalone datasetsklego.datasets.load_arrestsloads in a dataset with fairness concernssklego.datasets.load_chickenloads in the joyful chickweight datasetsklego.datasets.load_heroesloads a heroes of the storm datasetsklego.datasets.load_heartsloads a dataset about heartssklego.datasets.load_penguinsloads a lovely dataset about penguinssklego.datasets.fetch_creditcardfetch a fraud dataset from openmlsklego.datasets.make_simpleseriesmake a simulated timeseriessklego.pandas_utils.add_lagsadds lag values in a pandas dataframesklego.pandas_utils.log_stepa useful decorator to log your pipeline stepssklego.dummy.RandomRegressordummy benchmark that predicts random valuessklego.linear_model.DeadZoneRegressorexperimental feature that has a deadzone in the cost functionsklego.linear_model.DemographicParityClassifierlogistic classifier constrained on demographic paritysklego.linear_model.EqualOpportunityClassifierlogistic classifier constrained on equal opportunitysklego.linear_model.ProbWeightRegressionlinear model that treats coefficients as probabilistic weightssklego.linear_model.LowessRegressionlocally weighted linear regressionsklego.linear_model.LADRegressionleast absolute deviation regressionsklego.linear_model.QuantileRegressionlinear quantile regression, generalizes LADRegressionsklego.linear_model.ImbalancedLinearRegressionpunish over/under-estimation of a model directlysklego.naive_bayes.GaussianMixtureNBclassifies by training a 1D GMM per column per classsklego.naive_bayes.BayesianGaussianMixtureNBclassifies by training a bayesian 1D GMM per classsklego.mixture.BayesianGMMClassifierclassifies by training a bayesian GMM per classsklego.mixture.BayesianGMMOutlierDetectordetects outliers based on a trained bayesian GMMsklego.mixture.GMMClassifierclassifies by training a GMM per classsklego.mixture.GMMOutlierDetectordetects outliers based on a trained GMMsklego.meta.ConfusionBalancerexperimental feature that allows you to balance the confusion matrixsklego.meta.DecayEstimatoradds decay to the sample_weight that the model acceptssklego.meta.EstimatorTransformeradds a model output as a featuresklego.meta.OutlierClassifierturns outlier models into classifiers for gridsearchsklego.meta.GroupedPredictorcan split the data into runs and run a model on eachsklego.meta.GroupedTransformercan split the data into runs and run a transformer on eachsklego.meta.SubjectiveClassifierexperimental feature to add a prior to your classifiersklego.meta.Thresholdermeta model that allows you to gridsearch over the thresholdsklego.meta.RegressionOutlierDetectormeta model that finds outliers by adding a threshold to regressionsklego.meta.ZeroInflatedRegressorpredicts zero or applies a regression based on a classifiersklego.preprocessing.ColumnCapperlimits extreme values of the model featuressklego.preprocessing.ColumnDropperdrops a column from pandassklego.preprocessing.ColumnSelectorselects columns based on column namesklego.preprocessing.InformationFiltertransformer that can de-correlate featuressklego.preprocessing.IdentityTransformerreturns the same data, allows for concatenating pipelinessklego.preprocessing.LinearEmbedderreweight features using coefficients from a fitted linear modelsklego.preprocessing.OrthogonalTransformermakes all features linearly independentsklego.preprocessing.TypeSelectorselects columns based on typesklego.preprocessing.RandomAdderadds randomness in trainingsklego.preprocessing.RepeatingBasisFunctionrepeating feature engineering, useful for timeseriessklego.preprocessing.DictMapperassign numeric values on categorical columnssklego.preprocessing.OutlierRemoverexperimental method to remove outliers during trainingsklego.preprocessing.MonotonicSplineTransformerre-usesSplineTransformerin an attempt to make monotonic featuressklego.model_selection.GroupTimeSeriesSplittimeseries Kfold for groups with different amount of observations per groupsklego.model_selection.KlusterFoldValidationexperimental feature that does K folds based on clusteringsklego.model_selection.TimeGapSplittimeseries Kfold with a gap between train/testsklego.pipeline.DebugPipelineadds debug information to make debugging easiersklego.pipeline.make_debug_pipelineshorthand function to create a debugable pipelinesklego.metrics.correlation_scorecalculates correlation between model output and featuresklego.metrics.equal_opportunity_scorecalculates equal opportunity metricsklego.metrics.p_percent_scoreproxy for model fairness with regards to sensitive attributesklego.metrics.subset_scorecalculate a score on a subset of your data (meant for fairness tracking)
New Features
We want to be rather open here in what we accept but we do demand three things before they become added to the project:
- any new feature contributes towards a demonstrable real-world usecase
- any new feature passes standard unit tests (we use the ones from scikit-learn)
- the feature has been discussed in the issue list beforehand
We automate all of our testing and use pre-commit hooks to keep the code working.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scikit_lego-0.9.8.tar.gz.
File metadata
- Download URL: scikit_lego-0.9.8.tar.gz
- Upload date:
- Size: 193.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0b1bcc3a74c924cd83db3c39378a3fc18b171e3d4a6d753c824f1272a12940f
|
|
| MD5 |
40a655cb49994e2d9f0123980975cd86
|
|
| BLAKE2b-256 |
eb09e33289cc2ddb0e83d4453da453360970eb179cdea8e905377e1285ae9cdf
|
File details
Details for the file scikit_lego-0.9.8-py3-none-any.whl.
File metadata
- Download URL: scikit_lego-0.9.8-py3-none-any.whl
- Upload date:
- Size: 227.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6af53354f52e7afb9d936d539fec6e26b29dd55f3305f2724bd94cad22e0628d
|
|
| MD5 |
b958587b3caf6a2fdc582ed2b64628aa
|
|
| BLAKE2b-256 |
89df5027c5ec4fabf5a4f9056be4348345c7786f9dcbd372aa96ca0a35571c04
|
File details
Details for the file scikit_lego-0.9.8-py2.py3-none-any.whl.
File metadata
- Download URL: scikit_lego-0.9.8-py2.py3-none-any.whl
- Upload date:
- Size: 227.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86ad2450b771c5732654a5fec7e73a007f687d1ab8ac29f4689673e45877acad
|
|
| MD5 |
03c94d85bd3c1bebc35256f932e255d3
|
|
| BLAKE2b-256 |
bbda4715e16a7831cb05a4200f8c1539185f7802897127317620e30a660ddd2d
|