Skip to main content

A library for data science teams to avoid code duplication in common tasks involving scikit-learn transformers and estimators.

Project description

Data Science Utils

A toolkit for day-to-day DS tasks such as using custom transformers or estimators.

Found a bug or have a feature request? Open an issue!

Usage

First, import the library:

import pier_ds_utils as ds

Transformers

CustomDiscreteCategorizer

discrete_categorizer = ds.transformer.CustomDiscreteCategorizer(
        column="input_col_name",
        categories=[["my_category_value_1", "my_category_value_2"], ["my_category_value_3"]],
        labels=["label_1", "label_2"],
        default_value="a-default-value",
        output_column="output_col_name",
    )

CustomIntervalCategorizer

interval_categorizer = ds.transformer.CustomIntervalCategorizer(
    column="price",
    intervals=[(6700000, sys.maxsize)],
    labels=["gt_67k"],
    default_value="lt_67k",
    output_column="cat_price",
)

CustomIntervalCategorizerByCategory

interval_categorizer_by_category = ds.transformer.CustomIntervalCategorizerByCategory(
    category_column: "category",
    interval_categorizers: {
        "category_1": CustomIntervalCategorizer(
            column="price",
            intervals=[(6700000, sys.maxsize)],
            labels=["gt_67k"],
            default_value="lt_67k",
            output_column="cat_price",
        ),
        "category_2": CustomIntervalCategorizer(
            column="price",
            intervals=[(0, 1000000)],
            labels=["lt_1M"],
            default_value="gt_1M",
            output_column="cat_price",
        ),
    },
    output_column = "cat_price",
)

LogTransformer

log_transformer = ds.transformer.LogTransformer()

BoundariesTransformer

boundaries_transformer = ds.transformer.BoundariesTransformer(
    lower_bound=0,
    upper_bound=1000000,
)

Estimators

glm_wrapper = ds.estimator.GLMWrapper(...)
predict_proba_selector = ds.estimator.PredictProbaSelector(...)

Predictors

predictor = ds.predictor.StaticGLM(...)

Example usage:

from pier_ds_utils.predictor import StaticGLM
import pandas as pd

glm = StaticGLM(
    coefficients_map={"feature1": 0.5, "feature2": 1.5},  # required
    constant=2.0,  # optional
    os_factor=1.0,  # optional
)

df = pd.DataFrame({"feature1": [1, 2], "feature2": [3, 4]})

# The predict is equivalent to:
# y = (0.5 * feature1 + 1.5 * feature2 + constant) * os_factor
print(glm.predict(df))  # Output: [7. 9.]

Installation

pip install pier-ds-utils

# or

poetry add pier-ds-utils

For a specific version:

pip install pier-ds-utils@_version_

# or

poetry add pier-ds-utils@_version_

Contributing

Contributions are welcome! Please read the contributing guidelines first.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pier_ds_utils-0.5.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pier_ds_utils-0.5.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file pier_ds_utils-0.5.0.tar.gz.

File metadata

  • Download URL: pier_ds_utils-0.5.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for pier_ds_utils-0.5.0.tar.gz
Algorithm Hash digest
SHA256 23d2c67d7e1d25c3afe67b86d5a42a237b24e236d138f6503ddc2ed757d0627f
MD5 8a12b9d920876a9e5b3e32ddd4c23001
BLAKE2b-256 1e9d6636d1e5defab46d577f78f4c466a303cd0e4e2f7af894693b78371ceae7

See more details on using hashes here.

File details

Details for the file pier_ds_utils-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pier_ds_utils-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for pier_ds_utils-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f10bc326b36f36228fb3b56b35ae487bb7ab6d6a10a6e098f3ac5be1736a0977
MD5 ee235a922d00c9c0078e8b84f50598f6
BLAKE2b-256 3abe6ec5f9990defd51fc4cc4e2d1364d5c983be2acfd6963ce5ea3f1ad226a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page