A light package for automatic model tuning and stacking
Project description
auto-modelling
Auto-modelling is a convenient library to train and tune machine models automatically.
Its main features include the following:
- preprocessing columns in all datatypes. (numeric, categorical, text)
- train machine models and tune parameters automatically.
- return top n best models with optimized parameters.
- Apply stacking technique to combine the n best models returned by the repo or self-determined fitted models together to get an even better result.
The machine learning models include the following:
- Classification:
- ExtraTreesClassifier
- RandomForestClassifier
- KNeighborsClassifier
- LogisticRegression
- XGBClassifier
- Regression:
- ExtraTreesRegressor
- GradientBoostingRegressor
- AdaBoostRegressor
- DecisionTreeRegressor
- RandomForestRegressor
- XGBRegressor
- Stack:
- for classify: LogisticRegression
- for regression: LinearRegression
reference: https://github.com/EpistasisLab/tpot/blob
Installation
pip install auto-modelling
Usage Example
from auto_modelling.classification import GoClassify
from auto_modelling.regression import GoRegress
from auto_modelling.preprocess import DataManager
from auto_modelling.stack import Stack
# preprocessing data
dm = DataManager(directory = 'preprocess_tools')
train, test = dm.drop_sparse_columns(x_train, x_test)
train, test = dm.process_data(x_train, x_test)
# the encoders are stored in the directory called data_process_tools.
# use the same processing tools to process new data
predict_data = dm.process_predict_data(predict_x)
# predict_x should have the same format as x_train/x_test
# classification
clf = GoClassify(n_best=1)
best = clf.train(x_train, y_train)
y_pred = best.predict(x_test)
# regression
reg = GoRegress(n_best=1)
best = reg.train(x_train, y_train)
y_pred = best.predict(x_test)
# get top 3 best models
clf = GoClassify(n_best=3)
bests = clf.train(x_train, y_train)
y_preds = [m.predict(x_test) for m in bests]
# Stack top 3 best models
stack = Stack(n_models = 3)
level_0_models, level_1_model = stack.train(x_train, y_train, x_test, y_test)
There are examples test.py
and sample.py
in the root directory of this package. run
python test.py
/python sample.py
.
Development Guide
-
Clone the repo
-
Create the virtual environment
mkvirtualenv auto
workon auto
pip install requirements.txt
if you have issues in installing xgboost
refrence:
https://xgboost.readthedocs.io/en/latest/build.html#
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en
Note
- TO DO: Feature selection, evaluation metricss
Thoughts
- Ideally, any dataframe being throw into this repo, it should be processed.
-
pre-processing
- drop column that have too many null(Done)
- fill na for both numeric and non-numeric values(Done)
- encoded for non-numeric values(Done)
- scale values if needed
- balance the dataset if needed
-
model-training
- mode =
classification
,regression
,auto
(Done) - split data-set
- tuning parameters and model selection (Done)
- feature selection
- return a model with parameters, columns and a script to process x_test(Done)
- stacking with customized fitted models (Done)
- mode =
-
model-evualation
Other reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file auto_modelling-1.2.5.tar.gz
.
File metadata
- Download URL: auto_modelling-1.2.5.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb228f0548eea4254f08ada4458765dcea2b92c630796f51ce4324fe089bada3 |
|
MD5 | 2ed2a9f07d8a3ba02f42481dc4c164a3 |
|
BLAKE2b-256 | 62abfde8ec954cc70791744e29c310ba2a914c9c173b7b2bd904cbcbe963ec39 |
File details
Details for the file auto_modelling-1.2.5-py3-none-any.whl
.
File metadata
- Download URL: auto_modelling-1.2.5-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16b9644ca661101ff41dd94417ec57c6d623d7945c6ca2827d06ea938f1e3e3a |
|
MD5 | 627b3f96650826304729de44585ea865 |
|
BLAKE2b-256 | dbb23369d9649e7f38f67069067d5139c5e15c787217355bffa08ebe6a619c24 |