Skip to main content

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

Project description

The Oboe systems

This bundle of libraries, Oboe and TensorOboe, are automated machine learning (AutoML) systems that use collaborative filtering to find good models for supervised learning tasks within a user-specified time limit. Further hyperparameter tuning can be performed afterwards.

The name comes from the musical instrument oboe: in an orchestra, oboe plays an initial note which the other instruments use to tune to the right frequency before the performance begins. Our Oboe systems play a similar role in AutoML: we use meta-learning to select a promising set of models or to build an ensemble for a new dataset. Users can either directly use the selected models or further fine-tune their hyperparameters.

On a new dataset:

  • Oboe searches for promising estimators (supervised learners) by matrix factorization and classical experiment design. It requires a pre-processed dataset: one-hot encode categorical features and then standardize all features to have zero meanand unit variance. For a complete description, refer to our paper OBOE: Collaborative Filtering for AutoML Model Selection at KDD 2019.

  • TensorOboe searches for promising pipelines, which are directed graphs of learning components here, including imputation, encoding, standardization, dimensionality reduction and estimation. Thus it can accept a raw dataset, possibly with missing entries, different types of features, not-centered features, etc. For a complete description, refer to our paper AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space at KDD 2020.

This bundle of systems is still under developement and subjects to change. For any questions, please submit an issue. The authors will respond as soon as possible.

Installation

The easiest way is to install using pip:

pip install oboe

Dependencies with verified versions

The following libraries are required. The versions in brackets are the versions that are verified to work. Older versions may work but are not guaranteed.

  • Python (3.7.3)
  • numpy (1.16.4)
  • scipy (1.4.1)
  • pandas (0.24.2)
  • scikit-learn (0.22.1)
  • multiprocessing (>=0.70.5)
  • tensorly (0.4.4)
  • OpenML (0.9.0)
  • mkl (>=1.0.0)
  • re
  • os
  • json

Examples

For more detailed examples, please refer to the Jupyter notebooks in the example folder. A basic classification example:

method = 'Oboe' # 'Oboe' or 'TensorOboe'
problem_type = 'classification'

from oboe import AutoLearner, error

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
x = np.array(data['data'])
y = np.array(data['target'])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
m.fit(x_train, y_train)
y_predicted = m.predict(x_test)

print("prediction error (balanced error rate): {}".format(error(y_test, y_predicted, 'classification')))    
print("selected models: {}".format(m.get_models()))

References

[1] Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell. OBOE: Collaborative filtering for AutoML model selection. KDD 2019.

[2] Chengrun Yang, Jicong Fan, Ziyang Wu, Madeleine Udell. AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space. KDD 2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for oboe, version 0.0.4
Filename, size File type Python version Upload date Hashes
Filename, size oboe-0.0.4-py3-none-any.whl (37.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size oboe-0.0.4.tar.gz (31.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page