A library for data science teams to avoid code duplication in common tasks involving scikit-learn transformers and estimators.
Project description
Data Science Utils
A toolkit for day-to-day DS tasks such as using custom transformers or estimators.
Found a bug or have a feature request? Open an issue!
Usage
First, import the library:
import pier_ds_utils as ds
Transformers
CustomDiscreteCategorizer
discrete_categorizer = ds.transformer.CustomDiscreteCategorizer(
column="input_col_name",
categories=[["my_category_value_1", "my_category_value_2"], ["my_category_value_3"]],
labels=["label_1", "label_2"],
default_value="a-default-value",
output_column="output_col_name",
)
CustomIntervalCategorizer
interval_categorizer = ds.transformer.CustomIntervalCategorizer(
column="price",
intervals=[(6700000, sys.maxsize)],
labels=["gt_67k"],
default_value="lt_67k",
output_column="cat_price",
)
CustomIntervalCategorizerByCategory
interval_categorizer_by_category = ds.transformer.CustomIntervalCategorizerByCategory(
category_column: "category",
interval_categorizers: {
"category_1": CustomIntervalCategorizer(
column="price",
intervals=[(6700000, sys.maxsize)],
labels=["gt_67k"],
default_value="lt_67k",
output_column="cat_price",
),
"category_2": CustomIntervalCategorizer(
column="price",
intervals=[(0, 1000000)],
labels=["lt_1M"],
default_value="gt_1M",
output_column="cat_price",
),
},
output_column = "cat_price",
)
LogTransformer
log_transformer = ds.transformer.LogTransformer()
BoundariesTransformer
boundaries_transformer = ds.transformer.BoundariesTransformer(
lower_bound=0,
upper_bound=1000000,
)
boundaries_transformer = ds.transformer.BoundariesTransformer(
lower_bound=0,
upper_bound=1000000,
lower_value=10,
upper_value=1200000
)
Estimators
glm_wrapper = ds.estimator.GLMWrapper(...)
predict_proba_selector = ds.estimator.PredictProbaSelector(...)
Predictors
predictor = ds.predictor.StaticGLM(...)
Example usage:
from pier_ds_utils.predictor import StaticGLM
import pandas as pd
glm = StaticGLM(
coefficients_map={"feature1": 0.5, "feature2": 1.5}, # required
constant=2.0, # optional
os_factor=1.0, # optional
)
df = pd.DataFrame({"feature1": [1, 2], "feature2": [3, 4]})
# The predict is equivalent to:
# y = (0.5 * feature1 + 1.5 * feature2 + constant) * os_factor
print(glm.predict(df)) # Output: [7. 9.]
Installation
pip install pier-ds-utils
# or
poetry add pier-ds-utils
For a specific version:
pip install pier-ds-utils@_version_
# or
poetry add pier-ds-utils@_version_
Contributing
Contributions are welcome! Please read the contributing guidelines first.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pier_ds_utils-0.6.0.tar.gz.
File metadata
- Download URL: pier_ds_utils-0.6.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
110a799b878c3daf6093e89daad07a135c23aef33e9d8e147c332bd0a20f0764
|
|
| MD5 |
b1bfbeaaa43c3ecaa34b61652737b5d7
|
|
| BLAKE2b-256 |
8f2792e97d29354eec9cfd050f753da02f1651fd446c39854a73e07a5aba0961
|
File details
Details for the file pier_ds_utils-0.6.0-py3-none-any.whl.
File metadata
- Download URL: pier_ds_utils-0.6.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fd6377a750b86bf56a9c515e08b6fca69a6329678446091fe87a191d966e35c
|
|
| MD5 |
48e7b98c347de24701690d712b7f08ab
|
|
| BLAKE2b-256 |
47750c95c8d0e34290d8863c62d27ea85c5a5ef9df547de443e0854ad65dec32
|