A Python package that makes ML processes easier, faster and less error prone

These details have not been verified by PyPI

Project links

Project description

Bender 🤖

A Python package for faster, safer, and simpler ML processes.

Installation

pip install benderml

Usage:

from bender.importers import DataImporters

pre_processed_data = await DataImporters.csv("file/to/data.csv").process([...]).run()

Why use `bender`?

Bender will make your machine learning processes, faster, safer, simpler while at the same time making it easy and flexible. This is done by providing a set base component, around the core processes that will take place in a ML pipeline process. While also helping you with type hints about what your next move could be.

Pipeline Safety

The whole pipeline is build using generics from Python's typing system. Resulting in an improved developer experience, as the compiler can know if your pipeline's logic makes sense before it has started.

Load a data set

Bender makes most of the sklearn datasets available through the DataImporters.data_set(...) importer. Here will you need to pass an enum to define which dataset you want. It is also possible to load the data from sql, append different data sources and cache, and it is as simple as:

from bender.importers import DataImporters

# Predifined data set
DataImporters.data_set(DataSets.IRIS)

# Load SQL
DataImporters.sql("url", "SELECT ...")

# Cache a sql import
DataImporters.sql("url", "SELECT ...")
    .cached("path/to/cache")
    .append(
        # Add more data from a different source (with same features)
        DataImporters.sql(...)
    )

Processing

When the data has been loaded is usually the next set to process the data in some way. bender will therefore provide different components that transforms features. Therefore making it easier to keep your logic consistent over multiple projects.

from bender.transformations import Transformations

DataImporters.data_set(DataSets.IRIS)
    .process([
        # pl exp = e^(petal length)
        Transformations.exp_shift('petal length (cm)', output='pl exp'),

        # Alternative to `exp_shift`
        Transformations.compute('pl exp', lambda df: np.exp(df['petal length (cm)'])),

        # purchases = mean value of the json price values
        Transformations.unpack_json("purchases", key="price", output_feature="price", policy=UnpackPolicy.median_number()),

        ...
    ])

EDA

For view how the data is distribuated, is it also possible to explore the data.

from bender.explorers import Explorers

await (DataImporters.data_set(DataSets.IRIS)
    .process([...])
    .explore([
        # Display all features in a hist
        Explorers.histogram(target='target'),

        # Display corr matrix and logs which features you could remove
        Explorers.correlation(input_features),

        # View how features relate in 2D
        Explorers.pair_plot('target'),
    ])

Splitting into train and test sets

There are many ways we can train and test, it is therefore easy to choose and switch between how it is done with bender.

from bender.split_strategies import SplitStrategies

await (DataImporters.data_set(DataSets.IRIS)
    .process([...])

    # Have 70% as train and 30 as test
    .split(SplitStrategies.ratio(0.7))

    # Have 70% of each target group in train and the rest in test
    .split(SplitStrategies.uniform_ratio("target", 0.7))

    # Sorts by the key and taks the first 70% as train
    .split(SplitStrategies.sorted_ratio("target", 0.7))

Training

After you have split the data set into train and test, then you can train with the following.

from bender.model_trainers import Trainers

await (DataImporters.data_set(DataSets.IRIS)
    .split(...)
    .train(
        # train kneighbours on the train test
        Trainers.kneighbours(),
        input_features=[...],
        target_feature="target"
    )

Evaluate

After you have a model will it be smart to test how well it works.

from bender.evaluators import Evaluators

await (DataImporters.data_set(DataSets.IRIS)
    .split(...)
    .train(...)
    .evaluate([
        # Only present the confusion matrix
        Evaluators.confusion_matrix(),
        Evaluators.roc_curve(),
        Evaluators.precision_recall(),
    ])

Save model

At last would you need to store the model. You can therefore select one of manny exporters.

from bender.exporters import Exporters

await (DataImporters.data_set(DataSets.IRIS)
    .split(...)
    .train(...)
    .export_model(Exporters.aws_s3(...))

Predict

ModelLoaders
    .aws_s3("path/to/model", s3_config)
    .import_data(
        DataImporters.sql(sql_url, sql_query)
    )
    .predict()

Extract result

ModelLoaders
    .aws_s3(...)
    .import_data(...)
    .predict()
    .extract(prediction_as="target", metadata=['entry_id'], exporter=DataExporters.disk("predictions.csv"))

Examples

An example of the IRIS data set which trains a model to perfection

await (DataImporters.data_set(DataSets.IRIS)
    .process([
        Transformations.exp_shift('petal length (cm)', output='pl exp'),
        Transformations.exp_shift('petal width (cm)', output='pw exp'),
    ])
    .explore([
        Explorers.histogram(target='target'),
        Explorers.correlation(input_features),
        Explorers.pair_plot('target'),
    ])
    .split(SplitStrategies.uniform_ratio("target", 0.7))
    .train(Trainers.kneighbours(), input_features=input_features, target_feature="target")
    .evaluate([
        Evaluators.confusion_matrix()
    ])
    .metric(Metrics.log_loss())
    .run())

XGBoost Example

Below is a simple example for training a XGBoosted tree

DataImporters.sql(sql_url, sql_query)

    .process([ # Preproces the data
        # Extract advanced information from json data
        Transformations.unpack_json("purchases", key="price", output_feature="price", policy=UnpackPolicy.median_number())

        Transformations.log_normal_shift("y_values", "y_log"),

        # Get date values from a date feature
        Transformations.date_component("month", "date", output_feature="month_value"),
    ])
    .split(SplitStrategies.ratio(0.7))

    # Train a XGBoosted Tree model
    .train(
        Trainers.xgboost(),
        input_features=['y_log', 'price', 'month_value', 'country', ...],
        target_feature='did_buy_product_x'
    )
    .evaluate([
        Evaluators.roc_curve(),
        Evaluators.confusion_matrix(),
        Evaluators.precision_recall(
            # Overwrite where to export the evaluated result
            Exporter.disk("precision-recall.png")
        ),
    ])

Predicting Example

Below will a model be loaded from a AWS S3 bucket, preprocess the data, and predict the output. This will also make sure that the features are valid before predicting.

ModelLoaders
    # Fetch Model
    .aws_s3("path/to/model", s3_config)

    # Load data
    .import_data(
        DataImporters.sql(sql_url, sql_query)
            # Caching import localy for 1 day
            .cached("cache/path")
    )
    # Preproces the data
    .process([
        Transformations.unpack_json(...),
        ...
    ])
    # Predict the values
    .predict()

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.19

Feb 25, 2022

0.1.18

Feb 22, 2022

0.1.17

Feb 18, 2022

0.1.16

Feb 18, 2022

0.1.15

Feb 18, 2022

0.1.14

Feb 17, 2022

0.1.13

Jan 26, 2022

0.1.12

Jan 26, 2022

0.1.11

Jan 25, 2022

0.1.10

Jan 19, 2022

0.1.9

Jan 17, 2022

0.1.8

Jan 16, 2022

0.1.7

Jan 7, 2022

0.1.6

Jan 3, 2022

0.1.5

Dec 27, 2021

0.1.4

Dec 22, 2021

0.1.3

Dec 19, 2021

0.1.2

Dec 17, 2021

0.1.1

Dec 17, 2021

0.1.0

Dec 17, 2021

0.1.0a1 pre-release

Dec 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benderml-0.1.19.tar.gz (36.6 kB view details)

Uploaded Feb 25, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

benderml-0.1.19-py3-none-any.whl (59.7 kB view details)

Uploaded Feb 25, 2022 Python 3

File details

Details for the file benderml-0.1.19.tar.gz.

File metadata

Download URL: benderml-0.1.19.tar.gz
Upload date: Feb 25, 2022
Size: 36.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.9.10 Linux/5.11.0-1028-azure

File hashes

Hashes for benderml-0.1.19.tar.gz
Algorithm	Hash digest
SHA256	`9beaed08dd1ae54df374d5eca293e74a1f636e68bc145bc0b476e6d357172d5d`
MD5	`af2b3e5df31c7ca16072f442fa64551d`
BLAKE2b-256	`75a6a4e784de5cfd8aaf1697f78725cb7df61c7ff915e58f5efa4223aee4cc30`

See more details on using hashes here.

File details

Details for the file benderml-0.1.19-py3-none-any.whl.

File metadata

Download URL: benderml-0.1.19-py3-none-any.whl
Upload date: Feb 25, 2022
Size: 59.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.9.10 Linux/5.11.0-1028-azure

File hashes

Hashes for benderml-0.1.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8acb2a8e09d06df9cff50f94081560577578e526006ce199d02d983483279258`
MD5	`8c430db2c125cd83feb2d6068be55898`
BLAKE2b-256	`3c818952fb961561669ee4f8dbf3d97a2e5e36bce31afe9e2719f8782e071ef3`

See more details on using hashes here.

benderml 0.1.19

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Bender 🤖

Installation

Why use bender?

Pipeline Safety

Load a data set

Processing

EDA

Splitting into train and test sets

Training

Evaluate

Save model

Predict

Extract result

Examples

XGBoost Example

Predicting Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Why use `bender`?