Skip to main content

A light package for automatic model tuning and stacking

Project description

auto-modelling

Auto-modelling is a convenient library to train and tune machine models automatically.

Its main features include the following:

  1. preprocessing columns in all datatypes. (numeric, categorical, text)
  2. train machine models and tune parameters automatically.
  3. return top n best models with optimized parameters.
  4. Apply stacking technique to combine the n best models returned by the repo or self-determined fitted models together to get an even better result.

The machine learning models include the following:

  • Classification:
    • ExtraTreesClassifier
    • RandomForestClassifier
    • KNeighborsClassifier
    • LogisticRegression
    • XGBClassifier
  • Regression:
    • ExtraTreesRegressor
    • GradientBoostingRegressor
    • AdaBoostRegressor
    • DecisionTreeRegressor
    • RandomForestRegressor
    • XGBRegressor
  • Stack:
    • for classify: LogisticRegression
    • for regression: LinearRegression

reference: https://github.com/EpistasisLab/tpot/blob

Installation

pip install auto-modelling

Usage Example

from auto_modelling.classification import GoClassify
from auto_modelling.regression import GoRegress
from auto_modelling.preprocess import DataManager
from auto_modelling.stack import Stack

# preprocessing data
dm = DataManager(directory = 'preprocess_tools')
train, test = dm.drop_sparse_columns(x_train, x_test)
train, test = dm.process_data(x_train, x_test)
# the encoders are stored in the directory called data_process_tools.

# use the same processing tools to process new data
predict_data = dm.process_predict_data(predict_x)
# predict_x should have the same format as x_train/x_test

# classification
clf = GoClassify(n_best=1)
best = clf.train(x_train, y_train)
y_pred = best.predict(x_test)

# regression
reg = GoRegress(n_best=1)
best = reg.train(x_train, y_train)
y_pred = best.predict(x_test)

# get top 3 best models
clf = GoClassify(n_best=3)
bests = clf.train(x_train, y_train)
y_preds = [m.predict(x_test) for m in bests]

# Stack top 3 best models
stack = Stack(n_models = 3)
level_0_models, level_1_model = stack.train(x_train, y_train, x_test, y_test)

There are examples test.py and sample.py in the root directory of this package. run python test.py/python sample.py.

Development Guide

  • Clone the repo

  • Create the virtual environment

mkvirtualenv auto
workon auto
pip install requirements.txt

if you have issues in installing xgboost refrence: https://xgboost.readthedocs.io/en/latest/build.html# https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en

Note

  • TO DO: Feature selection, evaluation metricss

Thoughts

  • Ideally, any dataframe being throw into this repo, it should be processed.
  1. pre-processing

    • drop column that have too many null(Done)
    • fill na for both numeric and non-numeric values(Done)
    • encoded for non-numeric values(Done)
    • scale values if needed
    • balance the dataset if needed
  2. model-training

    • mode = classification, regression, auto(Done)
    • split data-set
    • tuning parameters and model selection (Done)
    • feature selection
    • return a model with parameters, columns and a script to process x_test(Done)
    • stacking with customized fitted models (Done)
  3. model-evualation

Other reference

Packaging your project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_modelling-1.2.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

auto_modelling-1.2.5-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file auto_modelling-1.2.5.tar.gz.

File metadata

  • Download URL: auto_modelling-1.2.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5

File hashes

Hashes for auto_modelling-1.2.5.tar.gz
Algorithm Hash digest
SHA256 bb228f0548eea4254f08ada4458765dcea2b92c630796f51ce4324fe089bada3
MD5 2ed2a9f07d8a3ba02f42481dc4c164a3
BLAKE2b-256 62abfde8ec954cc70791744e29c310ba2a914c9c173b7b2bd904cbcbe963ec39

See more details on using hashes here.

File details

Details for the file auto_modelling-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: auto_modelling-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5

File hashes

Hashes for auto_modelling-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 16b9644ca661101ff41dd94417ec57c6d623d7945c6ca2827d06ea938f1e3e3a
MD5 627b3f96650826304729de44585ea865
BLAKE2b-256 dbb23369d9649e7f38f67069067d5139c5e15c787217355bffa08ebe6a619c24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page