Skip to main content

SciKIt-learn Pipeline in PAndas

Project description

Skippa

SciKIt-learn Pipeline in PAndas

Installation

pip install skippa

Basic usage

Import Skippa class and columns helper

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression

from skippa import Skippa, columns

Get some data

df = pd.DataFrame({
    'q': [0, 0, 0],
    'date': ['2021-11-29', '2021-12-01', '2021-12-03'],
    'x': ['a', 'b', 'c'],
    'x2': ['m', 'n', 'm'],
    'y': [1, 16, 1000],
    'z': [0.4, None, 8.7]
})
y = np.array([0, 0, 1])

Define your pipeline:

pipe = (
    Skippa()
        .impute(columns(dtype_include='number'), strategy='median')
        .scale(columns(dtype_include='number'), type='standard')
        .encode_date(columns(['date']))
        .onehot(columns(['x', 'x2']))
        .rename(columns(pattern='x_*'), lambda c: c.replace('x', 'prop'))
        .select(columns(['y', 'z']) + columns(pattern='prop*'))
        .model(LogisticRegression())
)

and use it for fitting / predicting like this:

model_pipeline = pipe.fit(X=df, y=y)

predictions = model_pipeline.predict_proba(df)

If you want details on your model, use:

model = model_pipeline.get_model()
print(model.coef_)
print(model.intercept_)

To Do

[ ] Validation of pipeline steps [ ] Input validation in transformers [ ] Support arbitrary transformer (if column-preserving) [ ] Investigate if Skippa can directly extend sklearn's Pipeline

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.1 (2021-11-22)

  • Fixes and documentation.

0.1.0 (2021-11-19)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skippa-0.1.1.tar.gz (17.4 kB view hashes)

Uploaded Source

Built Distribution

skippa-0.1.1-py2.py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page