A fast, simple way to train machine learning algorithms

These details have not been verified by PyPI

Project links

Homepage

Project description

ML Automator

Author: Kevin Vecmanis

Machine Learning Automator (ML Automator) is an automation project that integrates Sequential Model Based Optimization (SMBO) with the main learning algorithms from Python's Sci-kit Learn library to generate a really fast, automated tool for tuning machine learning algorithms. MLAutomator leverages a library called Hyperopt to accomplish this. Read more about Hyperopt here

What is SMBO?

SMBO is a form of hyperparameter tuning, like grid search and randomized search. In contrast to grid and randomized search, however, SMBO used Bayesian Optimization to build a probability model, through trial and error, that is able to better predict what hyperparameters might produce a better model. The "sequential" just means that multiple trials are run, one after another, each time testing better hyper parameters by applying bayesion reasoning and updating the existing probability model.

The trade-off here is that SMBO models spend more time between each iteration "selecting" the next choice of hyperparameters - but this is accepted because the extra time taken to choose the next hyperparameters is typically signigicantly less than each training iteration. In other words, SMBO results in:

Reduced time tuning hyperparameters compared to grid and random search methods.
Better scores on the testing set.

Key features:

Optimizes across data pre-processing and feature selection in addition to hyperparameters.
Fast, intelligent scan of parameter space using Hyperopt.
Optimized parameter search permits scanning a larger cross section of algorithms in the same period of time.
An exceptional spot-checking algorithm.

Usage

MLAutomator accepts a training dataset X, and a target Y. The user can define their own functions for how these datasets are produced. Note that MLAutomator is designed to be a highly optimized spot-checking algorithm - you should take care to make sure your data is free from errors and any missing values have been dealt with.

MLAutomator will find ways of transforming and pre-processing your data to produce a superior model. Feel free to make your own transformations before passing the data to MLAutomator.

Optional data utilities

I'm building a suite of data utility functions which can prepare most classification and regression datasets. These, however, are optional - MLAutomator only requires X and Y inputs in the form of a numpy ndarray.

from data.utilities import clf_prep

x, y = clf_prep('pima-indians-diabetes.csv')

Once you have training and target data, this is the main call to use MLAutomator...

Classification Example: 2-class

from mlautomator.mlautomator import MLAutomator

automator = MLAutomator(x, y, iterations = 25)
automator.find_best_algorithm()
automator.print_best_space()

MLAutomator can typically find a ~ 98th percentile solution in a fraction of the time of Gridsearch or Randomized search. Here it did a comprehensive scan across all hyperparameters for 6 common machine learning algorithms and produced exceptional model performance for the classic Pima Indians Diabetes dataset.

Best Algorithm Configuration:
    Best algorithm: Logistic Regression
    Best accuracy : 77.73239917976761%
    C : 0.02341
    k_best : 6
    penalty : l2
    scaler : RobustScaler(
        copy=True, 
        quantile_range=(25.0, 75.0), 
        with_centering=True,
        with_scaling=True)
    solver : lbfgs
    Found best solution on iteration 132 of 150
    Validation used: 10-fold cross-validation

Classification Example: Multi-class

Here are the results from the classic iris dataset, a multi-class classification problem with three classes

from data.utilities import from_sklearn
from mlautomator.mlautomator import MLAutomator

x, y = from_sklearn('iris')
automator = MLAutomator(x, y, iterations = 30, algo_type = 'classifier', score_metric = 'accuracy')
automator.find_best_algorithm()
automator.print_best_space()

Best Algorithm Configuration:
    Best algorithm: Bag of Support Vector Machine Classifiers
    Best accuracy : 96.67%
    C : 0.7064
    degree : 2
    gamma : auto
    k_best : 2
    kernel : rbf
    n_estimators : 9
    probability : True
    scaler : None
    Found best solution on iteration 3 of 30
    Validation used: 10-fold cross-validation

Regression Example

ML Automator supports regression problems as well. In this example we call the Boston Housing dataset from sklearn.datasets using one of our utility functions.

from data.utilities import from_sklearn

x, y = from_sklearn('boston')

from mlautomator.mlautomator import MLAutomator

automator = MLAutomator(x, y, iterations = 30, algo_type = 'regressor', score_metric = 'neg_mean_squared_error')
automator.find_best_algorithm()
automator.print_best_space()

Best Algorithm Configuration:
    Best algorithm: K-Neighbor Regressor
    Best neg_mean_squared_error : 10.41395782834094
    algorithm : kd_tree
    k_best : 11
    n_neighbors : 2
    scaler : StandardScaler(copy=True, with_mean=True, with_std=True)
    weights : distance
    Found best solution on iteration 24 of 30
    Validation used: 10-fold cross-validation

Existing Algorithm Support

MLAutomator currently supports the following algorithms:

Classification:

XGBoost Classifier
Random Forest Classifier
Support Vector Machines
Naive Bayes Classifier
Stochastic Gradient Descent Classification (SGD)
K-Nearest Neighbors Classification
Logistic Regression

Regression:

XGBoost Regressor
Random Forest Regressor
Support Vector Machine Regression
SGD Regression
K-Nearest Neighbors Regression

Unless otherwise declared using the specific_algos argument, MLAutomator will scan all algorithms to find the best performer.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.3

Aug 5, 2019

1.1.2

Aug 5, 2019

1.1.1

May 28, 2019

1.0.6

May 28, 2019

1.0.5

May 27, 2019

1.0.4

May 27, 2019

This version

1.0.3

May 27, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlautomator-1.0.3.tar.gz (13.2 kB view details)

Uploaded May 27, 2019 Source

Built Distribution

mlautomator-1.0.3-py3-none-any.whl (19.6 kB view details)

Uploaded May 27, 2019 Python 3

File details

Details for the file mlautomator-1.0.3.tar.gz.

File metadata

Download URL: mlautomator-1.0.3.tar.gz
Upload date: May 27, 2019
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for mlautomator-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`e8046198b89086b184ad5066e888fc62764826e1c347595dc39224fc525f2029`
MD5	`b095d1591471c3b3ab55ae552b8b3ee3`
BLAKE2b-256	`863c4d9b4366eff843b58de7c15f36deff02c9d79fd44bf6f05f8e3711a728de`

See more details on using hashes here.

File details

Details for the file mlautomator-1.0.3-py3-none-any.whl.

File metadata

Download URL: mlautomator-1.0.3-py3-none-any.whl
Upload date: May 27, 2019
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for mlautomator-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5955cb9bf88e4a6f3ab1b560a84f50c35ea2b8569f05f671a880d77af054c9f1`
MD5	`f4de83a298bce002a0dfe28e5d5f0b48`
BLAKE2b-256	`dd9bd7b46f4ecb78c3dc67b794487a97e2ad467d1982284ced7527df10f6c5f3`