Skip to main content

A library for data science teams to avoid code duplication in common tasks involving scikit-learn transformers and estimators.

Project description

Data Science Utils

A toolkit for day-to-day DS tasks such as using custom transformers or estimators.

Found a bug or have a feature request? Open an issue!

Usage

First, import the library:

import pier_ds_utils as ds

Transformers

CustomDiscreteCategorizer

discrete_categorizer = ds.transformer.CustomDiscreteCategorizer(
        column="input_col_name",
        categories=[["my_category_value_1", "my_category_value_2"], ["my_category_value_3"]],
        labels=["label_1", "label_2"],
        default_value="a-default-value",
        output_column="output_col_name",
    )

CustomIntervalCategorizer

interval_categorizer = ds.transformer.CustomIntervalCategorizer(
    column="price",
    intervals=[(6700000, sys.maxsize)],
    labels=["gt_67k"],
    default_value="lt_67k",
    output_column="cat_price",
)

CustomIntervalCategorizerByCategory

interval_categorizer_by_category = ds.transformer.CustomIntervalCategorizerByCategory(
    category_column: "category",
    interval_categorizers: {
        "category_1": CustomIntervalCategorizer(
            column="price",
            intervals=[(6700000, sys.maxsize)],
            labels=["gt_67k"],
            default_value="lt_67k",
            output_column="cat_price",
        ),
        "category_2": CustomIntervalCategorizer(
            column="price",
            intervals=[(0, 1000000)],
            labels=["lt_1M"],
            default_value="gt_1M",
            output_column="cat_price",
        ),
    },
    output_column = "cat_price",
)

LogTransformer

log_transformer = ds.transformer.LogTransformer()

BoundariesTransformer

boundaries_transformer = ds.transformer.BoundariesTransformer(
    lower_bound=0,
    upper_bound=1000000,
)
boundaries_transformer = ds.transformer.BoundariesTransformer(
    lower_bound=0,
    upper_bound=1000000,
    lower_value=10,
    upper_value=1200000
)

Estimators

glm_wrapper = ds.estimator.GLMWrapper(...)
predict_proba_selector = ds.estimator.PredictProbaSelector(...)

Predictors

predictor = ds.predictor.StaticGLM(...)

Example usage:

from pier_ds_utils.predictor import StaticGLM
import pandas as pd

glm = StaticGLM(
    coefficients_map={"feature1": 0.5, "feature2": 1.5},  # required
    constant=2.0,  # optional
    os_factor=1.0,  # optional
)

df = pd.DataFrame({"feature1": [1, 2], "feature2": [3, 4]})

# The predict is equivalent to:
# y = (0.5 * feature1 + 1.5 * feature2 + constant) * os_factor
print(glm.predict(df))  # Output: [7. 9.]

Installation

pip install pier-ds-utils

# or

poetry add pier-ds-utils

For a specific version:

pip install pier-ds-utils@_version_

# or

poetry add pier-ds-utils@_version_

Contributing

Contributions are welcome! Please read the contributing guidelines first.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pier_ds_utils-0.6.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pier_ds_utils-0.6.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pier_ds_utils-0.6.0.tar.gz.

File metadata

  • Download URL: pier_ds_utils-0.6.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for pier_ds_utils-0.6.0.tar.gz
Algorithm Hash digest
SHA256 110a799b878c3daf6093e89daad07a135c23aef33e9d8e147c332bd0a20f0764
MD5 b1bfbeaaa43c3ecaa34b61652737b5d7
BLAKE2b-256 8f2792e97d29354eec9cfd050f753da02f1651fd446c39854a73e07a5aba0961

See more details on using hashes here.

File details

Details for the file pier_ds_utils-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pier_ds_utils-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for pier_ds_utils-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6fd6377a750b86bf56a9c515e08b6fca69a6329678446091fe87a191d966e35c
MD5 48e7b98c347de24701690d712b7f08ab
BLAKE2b-256 47750c95c8d0e34290d8863c62d27ea85c5a5ef9df547de443e0854ad65dec32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page