Skip to main content

No project description provided

Project description

Build status Documentation Status

scikit-lego

We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project is a collaboration between multiple companies in the Netherlands. Note that we're not formally affiliated with the scikit-learn project at all.

Installation

Install scikit-lego via pip with

pip install scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

pip install -e ".[dev]"
python setup.py develop

Documentation

The documentation can be found here.

Usage

from sklego.transformers import RandomAdder
from sklego.mixture import GMMClassifier

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier(solver='lbfgs'))
])

...

Features

Here's a list of features that this library currently offers:

  • sklego.transformers.PatsyTransformer applies a patsy formula
  • sklego.transformers.RandomAdder adds randomness in training
  • sklego.transformers.EstimatorTransformer adds a model output as a feature
  • sklego.dummy.RandomRegressor benchmark that predicts random values
  • sklego.mixture.GMMClassifier classifies by training a GMM per class
  • sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
  • sklego.pandas_utils.log_step a simple logger-decorator for pandas pipeline steps
  • sklego.pandas_utils.add_lags adds lag values of certain columns in pandas
  • sklego.preprocessing.PandasTypeSelector selects columns based on pandas type
  • sklego.preprocessing.ColumnSelector selects columns based on column name
  • sklego.pipeline.DebugPipeline adds debug information to make debugging easier
  • sklego.mega.GroupedEstimator can split the data into runs and run a model on each
  • sklego.datasets.load_chicken loads in the joyful chickweight dataset

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

  1. any new feature contributes towards a demonstratable real-world usecase
  2. any new feature passes standard unit tests (we have a few for transformers and predictors)
  3. the feature has been discussed in the issue list beforehand

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-lego-0.1.5.tar.gz (19.9 kB view hashes)

Uploaded Source

Built Distribution

scikit_lego-0.1.5-py2.py3-none-any.whl (31.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page