Skip to main content

Bayesian Tuning and Bandits

Project description

BTB An open source project from Data to AI Lab at MIT.

A simple, extensible backend for developing auto-tuning systems.

PyPi Shield Travis CI Shield Coverage Status Downloads

Overview

Bayesian Tuning and Bandits is a simple, extensible backend for developing auto-tuning systems such as AutoML systems. It is currently being used in ATM (an AutoML system that allows tuning of classifiers) and MIT's system for the DARPA Data driven discovery of models program.

BTB is under active development. If you come across any issues, please report them here.

Install

Requirements

BTB has been developed and tested on Python 3.5, 3.6 and 3.7

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where BTB is run.

These are the minimum commands needed to create a virtualenv using python3.6 for BTB:

pip install virtualenv
virtualenv -p $(which python3.6) btb-venv

Afterwards, you have to execute this command to have the virtualenv activated:

source btb-venv/bin/activate

Remember about executing it every time you start a new console to work on BTB!

Install using Pip

After creating the virtualenv and activating it, we recommend using pip in order to install BTB:

pip install baytune

This will pull and install the latest stable release from PyPi.

Install from Source

With your virtualenv activated, you can clone the repository and install it from source by running make install on the stable branch:

git clone git@github.com:HDI-Project/BTB.git
cd BTB
git checkout stable
make install

Install for Development

If you want to contribute to the project, a few more steps are required to make the project ready for development.

Please head to the Contributing Guide for more details about this process.

Quickstart

Tuners

Tuners are specifically designed to speed up the process of selecting the optimal hyper parameter values for a specific machine learning algorithm.

btb.tuning.tuners defines Tuners: classes with a fit/predict/propose interface for suggesting sets of hyperparameters.

This is done by following a Bayesian Optimization approach and iteratively:

  • letting the tuner propose new sets of hyper parameter
  • fitting and scoring the model with the proposed hyper parameters
  • passing the score obtained back to the tuner

At each iteration the tuner will use the information already obtained to propose the set of hyper parameters that it considers that have the highest probability to obtain the best results.

To instantiate a Tuner all we need is a Tunable class with a collection of hyperparameters.

>>> from btb.tuning import Tunable
>>> from btb.tuning.tuners import GPTuner
>>> from btb.tuning.hyperparams import IntHyperParam
>>> hyperparams = {
...     'n_estimators': IntHyperParam(min=10, max=500),
...     'max_depth': IntHyperParam(min=10, max=500),
... }
>>> tunable = Tunable(hyperparams)
>>> tuner = GPTuner(tunable)

Then we perform the following three steps in a loop.

  1. Let the Tuner propose a new set of parameters:

    >>> parameters = tuner.propose()
    >>> parameters
    {'n_estimators': 297, 'max_depth': 3}
    
  2. Fit and score a new model using these parameters:

    >>> model = RandomForestClassifier(**parameters)
    >>> model.fit(X_train, y_train)
    RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                max_depth=3, max_features='auto', max_leaf_nodes=None,
                min_impurity_decrease=0.0, min_impurity_split=None,
                min_samples_leaf=1, min_samples_split=2,
                min_weight_fraction_leaf=0.0, n_estimators=297, n_jobs=1,
                oob_score=False, random_state=None, verbose=0,
                warm_start=False)
    >>> score = model.score(X_test, y_test)
    >>> score
    0.77
    
  3. Pass the used parameters and the score obtained back to the tuner:

    tuner.record(parameters, score)
    

At each iteration, the Tuner will use the information about the previous tests to evaluate and propose the set of parameter values that have the highest probability of obtaining the highest score.

Selectors

The selectors are intended to be used in combination with tuners in order to find out and decide which model seems to get the best results once it is properly fine tuned.

In order to use the selector we will create a Tuner instance for each model that we want to try out, as well as the Selector instance.

>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.svm import SVC
>>> from btb.selection import UCB1
>>> from btb.tuning.hyperparams import FloatHyperParam
>>> models = {
...     'RF': RandomForestClassifier,
...     'SVC': SVC
... }
>>> selector = UCB1(['RF', 'SVC'])
>>> rf_hyperparams = {
...     'n_estimators': IntHyperParam(min=10, max=500),
...     'max_depth': IntHyperParam(min=3, max=20)
... }
>>> rf_tunable = Tunable(rf_hyperparams)
>>> svc_hyperparams = {
...     'C': FloatHyperParam(min=0.01, max=10.0),
...     'gamma': FloatHyperParam(0.000000001, 0.0000001)
... }
>>> svc_tunable = Tunable(svc_hyperparams)
>>> tuners = {
...     'RF': GPTuner(rf_tunable),
...     'SVC': GPTuner(svc_tunable)
... }

Then we perform the following steps in a loop.

  1. Pass all the obtained scores to the selector and let it decide which model to test.

    >>> next_choice = selector.select({
    ...     'RF': tuners['RF'].scores,
    ...     'SVC': tuners['SVC'].scores
    ... })
    >>> next_choice
    'RF'
    
  2. Obtain a new set of parameters from the indicated tuner and create a model instance.

    >>> parameters = tuners[next_choice].propose()
    >>> parameters
    {'n_estimators': 289, 'max_depth': 18}
    >>> model = models[next_choice](**parameters)
    
  3. Evaluate the score of the new model instance and pass it back to the tuner

    >>> model.fit(X_train, y_train)
    RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                max_depth=18, max_features='auto', max_leaf_nodes=None,
                min_impurity_decrease=0.0, min_impurity_split=None,
                min_samples_leaf=1, min_samples_split=2,
                min_weight_fraction_leaf=0.0, n_estimators=289, n_jobs=1,
                oob_score=False, random_state=None, verbose=0,
                warm_start=False)
    >>> score = model.score(X_test, y_test)
    >>> score
    0.89
    >>> tuners[next_choice].record(parameters, score)
    

What's next?

For more details about BTB and all its possibilities and features, please check the project documentation site!

Citing BTB

If you use BTB, please consider citing our related papers.

For the current design of BTB and its usage within the larger Machine Learning Bazaar project at the MIT Data To AI Lab, please see:

Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development." arXiv Preprint 1905.08942. 2019.

@article{smith2019mlbazaar,
  author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
  title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
  journal = {arXiv e-prints},
  year = {2019},
  eid = {arXiv:1905.08942},
  pages = {arXiv:1905.08942},
  archivePrefix = {arXiv},
  eprint = {1905.08942},
}

For the initial design of BTB, usage of Recommenders, and initial evaluation, please see:

Laura Gustafson. "Bayesian Tuning and Bandits: An Extensible, Open Source Library for AutoML." Masters thesis, MIT EECS, June 2018.

  @mastersthesis{gustafson2018bayesian,
    author = {Gustafson, Laura},
    title = {Bayesian Tuning and Bandits: An Extensible, Open Source Library for AutoML},
    month = {May},
    year = {2018},
    url = {https://dai.lids.mit.edu/wp-content/uploads/2018/05/Laura_MEng_Final.pdf},
    type = {M. Eng Thesis},
    school = {Massachusetts Institute of Technology},
    address = {Cambridge, MA},
  }

History

0.3.4 - 2019-12-24

With this release we introduce a BTBSession class. This class represents the process of selecting and tuning several tunables until the best possible configuration fo a specific scorer is found. We also have improved and fixed some minor bugs arround the code (described in the issues below).

New Features

  • BTBSession that makes BTB more user friendly.

Internal Improvements

Improved unittests, removed old dependencies, added more MLChallenges and fixed an issue with the bound methods.

Resolved Issues

  • Issue #145: Implement BTBSession.
  • Issue #155: Set defaut to None for CategoricalHyperParam is not possible.
  • Issue #157: Metamodel _MODEL_KWARGS_DEFAULT becomes mutable.
  • Issue #158: Remove mock dependency from the package.
  • Issue #160: Add more Machine Learning Challenges and more estimators.

0.3.3 - 2019-12-11

Fix a bug where creating an instance of Tuner ends in an error.

Internal Improvements

Improve unittests to use spec_set in order to detect errors while mocking an object.

Resolved Issues

  • Issue #153: Bug with tunner logger message that avoids creating the Tunner.

0.3.2 - 2019-12-10

With this release we add the new benchmark challenge MLChallenge which allows users to perform benchmarking over datasets with machine learning estimators, and also some new features to make the workflow easier.

New Features

  • New MLChallenge challenge that allows performing crossvalidation over datasets and machine learning estimators.
  • New from_dict function for Tunable class in order to instantiate from a dictionary that contains information over hyperparameters.
  • New default value for each hyperparameter type.

Resolved Issues

  • Issue #68: Remove btb.tuning.constants module.
  • Issue #120: Tuner repr not helpful.
  • Issue #121: HyperParameter repr not helpful.
  • Issue #141: Imlement propper logging to the tuning section.
  • Issue #150: Implement Tunable from_dict.
  • Issue #151: Add default value for hyperparameters.
  • Issue #152: Support None as a choice in CategoricalHyperPrameters.

0.3.1 - 2019-11-25

With this release we introduce a benchmark module for BTB which allows the users to perform a benchmark over a series of challenges.

New Features

  • New benchmark module.
  • New submodule named challenges to work toghether with benchmark module.

Resolved Issues

  • Issue #139: Implement a Benchmark for BTB

0.3.0 - 2019-11-11

With this release we introduce an improved BTB that has a major reorganization of the project with emphasis on an easier way of interacting with BTB and an easy way of developing, testing and contributing new acquisition functions, metamodels, tuners and hyperparameters.

New project structure

The new major reorganization comes with the btb.tuning module. This module provides everything needed for the tuning process and comes with three new additions Acquisition, Metamodel and Tunable. Also there is an update to the Hyperparamters and Tuners. This changes are meant to help developers and contributors to easily develop, test and contribute new Tuners.

New API

There is a slightly new way of using BTB as the new Tunable class is introduced, that is meant to be the only requiered object to instantiate a Tuner. This Tunable class represents a collection of HyperParams that need to be tuned as a whole, at once. Now, in order to create a Tuner, a Tunable instance must be created first with the hyperparameters of the objective function.

New Features

  • New Hyperparameters that allow an easier interaction for the final user.
  • New Tunable class that manages a collection of Hyperparameters.
  • New Tuner class that is a python mixin that requieres of Acquisition and Metamodel as parents. Also now works with a single Tunable object.
  • New Acquisition class, meant to implement an acquisition function to be inherit by a Tuner.
  • New Metamodel class, meant to implement everything that a certain model needs and be inherit by the Tuner.
  • Reorganization of the selection module to follow a similar API to tuning.

Resolved Issues

  • Issue #131: Reorganize the project structure.
  • Issue #133: Implement Tunable class to control a list of hyperparameters.
  • Issue #134: Implementation of Tuners for the new structure.
  • Issue #140: Reorganize selectors.

0.2.5

Bug Fixes

  • Issue #115: HyperParameter subclass instantiation not working properly

0.2.4

Internal Improvements

  • Issue #62: Test for None in HyperParameter.cast instead of HyperParameter.__init__

Bug fixes

  • Issue #98: Categorical hyperparameters do not support None as input
  • Issue #89: Fix the computation of avg_rewards in BestKReward

0.2.3

Bug Fixes

  • Issue #84: Error in GP tuning when only one parameter is present bug
  • Issue #96: Fix pickling of HyperParameters
  • Issue #98: Fix implementation of the GPEi tuner

0.2.2

Internal Improvements

  • Updated documentation

Bug Fixes

  • Issue #94: Fix unicode param_type caused error on python 2.

0.2.1

Bug fixes

  • Issue #74: ParamTypes.STRING tunables do not work

0.2.0

New Features

  • New Recommendation module
  • New HyperParameter types
  • Improved documentation and examples
  • Fully tested Python 2.7, 3.4, 3.5 and 3.6 compatibility
  • HyperParameter copy and deepcopy support
  • Replace print statements with logging

Internal Improvements

  • Integrated with Travis-CI
  • Exhaustive unit testing
  • New implementation of HyperParameter
  • Tuner builds a grid of real values instead of indices
  • Resolve Issue #29: Make args explicit in __init__ methods
  • Resolve Issue #34: make all imports explicit

Bug Fixes

  • Fix error from mixing string/numerical hyperparameters
  • Inverse transform for categorical hyperparameter returns single item

0.1.2

  • Issue #47: Add missing requirements in v0.1.1 setup.py
  • Issue #46: Error on v0.1.1: 'GP' object has no attribute 'X'

0.1.1

  • First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baytune-0.3.4.tar.gz (113.2 kB view hashes)

Uploaded Source

Built Distribution

baytune-0.3.4-py2.py3-none-any.whl (51.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page