A helper library to jumpstart your machine learning project based on tabular or structured data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

Tabular ML Toolkit

A helper library to jumpstart your machine learning project based on tabular or structured data.

Install

pip install -U tabular_ml_toolkit

How to use

Start with your favorite model and then just simply create MLPipeline with one API.

For example, Here we are using RandomForestRegressor from Scikit-Learn, on Melbourne Home Sale price data

No need to install scikit-learn as it comes preinstall with Tabular_ML_Toolkit

from tabular_ml_toolkit.MLPipeline import *

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# create scikit-learn ml model
scikit_model = RandomForestRegressor(n_estimators=200, random_state=42)

# createm ml pipeline for scikit-learn model
sci_ml_pl = MLPipeline().prepare_data_for_training(
    train_file_path= "https://raw.githubusercontent.com/psmathur/tabular_ml_toolkit/master/input/home_data/train.csv",
    test_file_path= "https://raw.githubusercontent.com/psmathur/tabular_ml_toolkit/master/input/home_data/test.csv",
    idx_col="Id", target="SalePrice",
    model=scikit_model,
    random_state=42,
    valid_size=0.2)

# # Now fit and predict
sci_ml_pl.scikit_pipeline.fit(sci_ml_pl.dataframeloader.X_train, sci_ml_pl.dataframeloader.y_train)

preds = sci_ml_pl.scikit_pipeline.predict(sci_ml_pl.dataframeloader.X_valid)
print('X_valid MAE:', mean_absolute_error(sci_ml_pl.dataframeloader.y_valid, preds))

X_valid MAE: 17676.01967465753

You can also use MLPipeline with XGBoost model, Just make sure to install XGBooost first depending upon your OS.

After that all steps remains same. Here is example using XGBRegressor with Melbourne Home Sale price data

#!pip install -U xgboost

from xgboost import XGBRegressor
# create xgb ml model
xgb_model = XGBRegressor(n_estimators=250,learning_rate=0.05, random_state=42)

# createm ml pipeline for xgb model
xgb_ml_pl = MLPipeline().prepare_data_for_training(
    train_file_path= "input/home_data/train.csv",
    test_file_path= "input/home_data/test.csv",
    idx_col="Id",
    target="SalePrice",
    model=xgb_model,
    random_state=42,
    valid_size=0.2)

# Now fit and predict
xgb_ml_pl.scikit_pipeline.fit(xgb_ml_pl.dataframeloader.X_train, xgb_ml_pl.dataframeloader.y_train)
preds = xgb_ml_pl.scikit_pipeline.predict(xgb_ml_pl.dataframeloader.X_valid)
print('X_valid MAE:', mean_absolute_error(xgb_ml_pl.dataframeloader.y_valid, preds))

X_valid MAE: 15824.136571596746

In background prepare_data_for_training method loads your input data into Pandas DataFrame, seprates X(features) and y(target), split X(features) into X_train, y_train, X_valid, y_valid DataFrames. Then it preprocess all numerical and categorical type data found in these DataFrames. Then it bundle preprocessed data with your given model and return an MLPipeline object, so you can call MLPipeline to fit X_train and y_train and predict on X_valid or X_test.

Here is detail documentation and source code.

# show_doc(MLPipeline.prepare_data_for_training)

If you want to customize data and preprocessing steps you can do so by using DataFrameLoader and PreProessor classes. Check detail documentations for these classes for more options.

# show_doc(MLPipeline)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.0.35

Dec 14, 2021

0.0.34

Dec 14, 2021

0.0.33

Dec 14, 2021

0.0.32

Dec 11, 2021

0.0.31

Dec 10, 2021

0.0.30

Dec 10, 2021

0.0.29

Dec 7, 2021

0.0.28

Dec 7, 2021

0.0.27

Dec 7, 2021

0.0.26

Nov 28, 2021

0.0.25

Nov 24, 2021

0.0.24

Nov 24, 2021

0.0.23

Nov 23, 2021

0.0.22

Nov 23, 2021

0.0.21

Nov 23, 2021

0.0.20

Nov 23, 2021

0.0.19

Nov 23, 2021

0.0.18

Nov 23, 2021

0.0.17

Nov 10, 2021

0.0.16

Nov 7, 2021

0.0.15

Nov 5, 2021

0.0.14

Nov 5, 2021

0.0.13

Nov 5, 2021

0.0.12

Nov 5, 2021

This version

0.0.11

Nov 4, 2021

0.0.10

Nov 3, 2021

0.0.9

Nov 3, 2021

0.0.8

Nov 3, 2021

0.0.7

Nov 2, 2021

0.0.6

Nov 1, 2021

0.0.5

Oct 30, 2021

0.0.4

Oct 25, 2021

0.0.3

Oct 25, 2021

0.0.2

Oct 25, 2021

0.0.1

Oct 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_ml_toolkit-0.0.11.tar.gz (15.5 kB view hashes)

Uploaded Nov 4, 2021 Source

Built Distribution

tabular_ml_toolkit-0.0.11-py3-none-any.whl (14.8 kB view hashes)

Uploaded Nov 4, 2021 Python 3

Hashes for tabular_ml_toolkit-0.0.11.tar.gz

Hashes for tabular_ml_toolkit-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`bf40421d230e0a7a04aa592558b5d6193fa3884fab24fd66c2dd020fe7a16183`
MD5	`588832f9f30b062a8d29855f0afbdd4d`
BLAKE2b-256	`c68e387a6987b9d4d76d78da1ac4698e620408a519050e8a83c9f8d661dcfd51`

Hashes for tabular_ml_toolkit-0.0.11-py3-none-any.whl

Hashes for tabular_ml_toolkit-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06aa2aa795a1072ebc1d0040ede51203403c2423b24ea8aac3176e5b9ff17b7e`
MD5	`1eeedf18e1d0fc8deefee30ad198ccb3`
BLAKE2b-256	`badae02a83b1979132b6d6aa3c3ee1de10df63620e50561e783e1bdf559b5bee`