Skip to main content

A fast, simple way to train machine learning algorithms

Project description

ML Automator

Author: Kevin Vecmanis

Machine Learning Automator (ML Automator) is an automation project that integrates Sequential Model Based Optimization (SMBO) with the main learning algorithms from Python's Sci-kit Learn library to generate a really fast, automated tool for tuning machine learning algorithms. MLAutomator leverages a library called Hyperopt to accomplish this. Read more about Hyperopt here

What is SMBO?

SMBO is a form of hyperparameter tuning, like grid search and randomized search. In contrast to grid and randomized search, however, SMBO used Bayesian Optimization to build a probability model, through trial and error, that is able to better predict what hyperparameters might produce a better model. The "sequential" just means that multiple trials are run, one after another, each time testing better hyper parameters by applying bayesion reasoning and updating the existing probability model.

The trade-off here is that SMBO models spend more time between each iteration "selecting" the next choice of hyperparameters - but this is accepted because the extra time taken to choose the next hyperparameters is typically signigicantly less than each training iteration. In other words, SMBO results in:

  • Reduced time tuning hyperparameters compared to grid and random search methods.
  • Better scores on the testing set.

Key features:

  • Optimizes across data pre-processing and feature selection in addition to hyperparameters.
  • Fast, intelligent scan of parameter space using Hyperopt.
  • Optimized parameter search permits scanning a larger cross section of algorithms in the same period of time.
  • An exceptional spot-checking algorithm.

Usage

MLAutomator accepts a training dataset X, and a target Y. The user can define their own functions for how these datasets are produced. Note that MLAutomator is designed to be a highly optimized spot-checking algorithm - you should take care to make sure your data is free from errors and any missing values have been dealt with.

MLAutomator will find ways of transforming and pre-processing your data to produce a superior model. Feel free to make your own transformations before passing the data to MLAutomator.

Optional data utilities

I'm building a suite of data utility functions which can prepare most classification and regression datasets. These, however, are optional - MLAutomator only requires X and Y inputs in the form of a numpy ndarray.

from data.utilities import clf_prep

x, y = clf_prep('pima-indians-diabetes.csv')

Once you have training and target data, this is the main call to use MLAutomator...

Classification Example: 2-class

from mlautomator.mlautomator import MLAutomator

automator = MLAutomator(x, y, iterations = 25)
automator.find_best_algorithm()
automator.print_best_space()

MLAutomator can typically find a ~ 98th percentile solution in a fraction of the time of Gridsearch or Randomized search. Here it did a comprehensive scan across all hyperparameters for 6 common machine learning algorithms and produced exceptional model performance for the classic Pima Indians Diabetes dataset.

Best Algorithm Configuration:
    Best algorithm: Logistic Regression
    Best accuracy : 77.73239917976761%
    C : 0.02341
    k_best : 6
    penalty : l2
    scaler : RobustScaler(
        copy=True, 
        quantile_range=(25.0, 75.0), 
        with_centering=True,
        with_scaling=True)
    solver : lbfgs
    Found best solution on iteration 132 of 150
    Validation used: 10-fold cross-validation

Classification Example: Multi-class

Here are the results from the classic iris dataset, a multi-class classification problem with three classes

from data.utilities import from_sklearn
from mlautomator.mlautomator import MLAutomator

x, y = from_sklearn('iris')
automator = MLAutomator(x, y, iterations = 30, algo_type = 'classifier', score_metric = 'accuracy')
automator.find_best_algorithm()
automator.print_best_space()
Best Algorithm Configuration:
    Best algorithm: Bag of Support Vector Machine Classifiers
    Best accuracy : 96.67%
    C : 0.7064
    degree : 2
    gamma : auto
    k_best : 2
    kernel : rbf
    n_estimators : 9
    probability : True
    scaler : None
    Found best solution on iteration 3 of 30
    Validation used: 10-fold cross-validation

Regression Example

ML Automator supports regression problems as well. In this example we call the Boston Housing dataset from sklearn.datasets using one of our utility functions.

from data.utilities import from_sklearn

x, y = from_sklearn('boston')
from mlautomator.mlautomator import MLAutomator

automator = MLAutomator(x, y, iterations = 30, algo_type = 'regressor', score_metric = 'neg_mean_squared_error')
automator.find_best_algorithm()
automator.print_best_space()
Best Algorithm Configuration:
    Best algorithm: K-Neighbor Regressor
    Best neg_mean_squared_error : 10.41395782834094
    algorithm : kd_tree
    k_best : 11
    n_neighbors : 2
    scaler : StandardScaler(copy=True, with_mean=True, with_std=True)
    weights : distance
    Found best solution on iteration 24 of 30
    Validation used: 10-fold cross-validation

Existing Algorithm Support

MLAutomator currently supports the following algorithms:

Classification:

  • XGBoost Classifier
  • Random Forest Classifier
  • Support Vector Machines
  • Naive Bayes Classifier
  • Stochastic Gradient Descent Classification (SGD)
  • K-Nearest Neighbors Classification
  • Logistic Regression

Regression:

  • XGBoost Regressor
  • Random Forest Regressor
  • Support Vector Machine Regression
  • SGD Regression
  • K-Nearest Neighbors Regression

Unless otherwise declared using the specific_algos argument, MLAutomator will scan all algorithms to find the best performer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlautomator-1.0.3.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

mlautomator-1.0.3-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file mlautomator-1.0.3.tar.gz.

File metadata

  • Download URL: mlautomator-1.0.3.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for mlautomator-1.0.3.tar.gz
Algorithm Hash digest
SHA256 e8046198b89086b184ad5066e888fc62764826e1c347595dc39224fc525f2029
MD5 b095d1591471c3b3ab55ae552b8b3ee3
BLAKE2b-256 863c4d9b4366eff843b58de7c15f36deff02c9d79fd44bf6f05f8e3711a728de

See more details on using hashes here.

File details

Details for the file mlautomator-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: mlautomator-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.1

File hashes

Hashes for mlautomator-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5955cb9bf88e4a6f3ab1b560a84f50c35ea2b8569f05f671a880d77af054c9f1
MD5 f4de83a298bce002a0dfe28e5d5f0b48
BLAKE2b-256 dd9bd7b46f4ecb78c3dc67b794487a97e2ad467d1982284ced7527df10f6c5f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page