arbok

A wrapper toolbox that provides compatibility layers between TPOT and Auto-Sklearn and OpenML

Project description

Arbok (Automl wrapper toolbox for openml compatibility) provides wrappers for TPOT and Auto-Sklearn, as a compatibility layer between these tools and OpenML.

The wrapper extends Sklearn’s BaseSearchCV and provides all the internal parameters that OpenML needs, such as cv_results_, best_index_, best_params_, best_score_ and classes_.

Installation

pip install arbok

Simple example

import openml
from arbok import AutoSklearnWrapper, TPOTWrapper


task = openml.tasks.get_task(31)
dataset = task.get_dataset()

# Get the AutoSklearn wrapper and pass parameters like you would to AutoSklearn
clf = AutoSklearnWrapper(
    time_left_for_this_task=3600, per_run_time_limit=360
)

# Or get the TPOT wrapper and pass parameters like you would to TPOT
clf = TPOTWrapper(
    generations=100, population_size=100, verbosity=2
)

# Execute the task
run = openml.runs.run_model_on_task(task, clf)
run.publish()

print('URL for run: %s/run/%d' % (openml.config.server, run.run_id))

Preprocessing data

To make the wrapper more robust, we need to preprocess the data. We can fill the missing values, and one-hot encode categorical data.

First, we get a mask that tells us whether a feature is a categorical feature or not.

dataset = task.get_dataset()
_, categorical = dataset.get_data(return_categorical_indicator=True)
categorical = categorical[:-1]  # Remove last index (which is the class)

Next, we setup a pipeline for the preprocessing. We are using a ConditionalImputer, which is an imputer which is able to use different strategies for categorical (nominal) and numerical data.

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
from arbok import ConditionalImputer

preprocessor = make_pipeline(

    ConditionalImputer(
        categorical_features=categorical,
        strategy="mean",
        strategy_nominal="most_frequent"
    ),

    OneHotEncoder(
        categorical_features=categorical, handle_unknown="ignore", sparse=False
    )
)

And finally, we put everything together in one of the wrappers.

clf = AutoSklearnWrapper(
    preprocessor=preprocessor, time_left_for_this_task=3600, per_run_time_limit=360
)

Limitations

Currently only the classifiers are implemented. Regression is therefore not possible.
For TPOT, the config_dict variable can not be set, because this causes problems with the API.

Benchmarking

Installing the arbok package includes the arbench cli tool. We can generate a json file like this:

from arbok.bench import Benchmark
bench = Benchmark()
config_file = bench.create_config_file(

    # Wrapper parameters
    wrapper={"refit": True, "verbose": False, "retry_on_error": True},

    # TPOT parameters
    tpot={
        "max_time_mins": 6,              # Max total time in minutes
        "max_eval_time_mins": 1          # Max time per candidate in minutes
    },

    # Autosklearn parameters
    autosklearn={
        "time_left_for_this_task": 360,  # Max total time in seconds
        "per_run_time_limit": 60         # Max time per candidate in seconds
    }
)

And then, we can call arbench like this:

arbench --classifier autosklearn --task-id 31 --config config.json

Or calling arbok as a python module:

python -m arbok --classifier autosklearn --task-id 31 --config config.json

Running a benchmark on batch systems

To run a large scale benchmark, we can create a configuration file like above, and generate and submit jobs to a batch system as follows.

# We create a benchmark setup where we specify the headers, the interpreter we
# want to use, the directory to where we store the jobs (.sh-files), and we give
# it the config-file we created earlier.
bench = Benchmark(
    headers="#PBS -lnodes=1:cpu3\n#PBS -lwalltime=1:30:00",
    python_interpreter="python3",  # Path to interpreter
    root="/path/to/project/",
    jobs_dir="jobs",
    config_file="config.json",
    log_file="log.json"
)

# Create the config file like we did in the section above
config_file = bench.create_config_file(

    # Wrapper parameters
    wrapper={"refit": True, "verbose": False, "retry_on_error": True},

    # TPOT parameters
    tpot={
        "max_time_mins": 6,              # Max total time in minutes
        "max_eval_time_mins": 1          # Max time per candidate in minutes
    },

    # Autosklearn parameters
    autosklearn={
        "time_left_for_this_task": 360,  # Max total time in seconds
        "per_run_time_limit": 60         # Max time per candidate in seconds
    }
)

# Next, we load the tasks we want to benchmark on from OpenML.
# In this case, we load a list of task id's from study 99.
tasks = openml.study.get_study(99).tasks

# Next, we create jobs for both tpot and autosklearn.
bench.create_jobs(tasks, classifiers=["tpot", "autosklearn"])

# And finally, we submit the jobs using qsub
bench.submit_jobs()

Preprocessing parameters

from arbok import ParamPreprocessor
import numpy as np
from sklearn.feature_selection import VarianceThreshold
from sklearn.pipeline import make_pipeline

X = np.array([
    [1, 2, True, "foo", "one"],
    [1, 3, False, "bar", "two"],
    [np.nan, "bar", None, None, "three"],
    [1, 7, 0, "zip", "four"],
    [1, 9, 1, "foo", "five"],
    [1, 10, 0.1, "zip", "six"]
], dtype=object)

# Manually specify types, or use types="detect" to automatically detect types
types = ["numeric", "mixed", "bool", "nominal", "nominal"]

pipeline = make_pipeline(ParamPreprocessor(types="detect"), VarianceThreshold())

pipeline.fit_transform(X)

Output:

[[-0.4472136  -0.4472136   1.41421356 -0.70710678 -0.4472136  -0.4472136
   2.23606798 -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
  -0.85226648  1.        ]
 [-0.4472136   2.23606798 -0.70710678 -0.70710678 -0.4472136  -0.4472136
  -0.4472136  -0.4472136  -0.4472136   2.23606798  0.4472136  -0.4472136
  -0.5831297  -1.        ]
 [ 2.23606798 -0.4472136  -0.70710678 -0.70710678 -0.4472136  -0.4472136
  -0.4472136  -0.4472136   2.23606798 -0.4472136  -2.23606798  2.23606798
  -1.39054004 -1.        ]
 [-0.4472136  -0.4472136  -0.70710678  1.41421356 -0.4472136   2.23606798
  -0.4472136  -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
   0.49341743 -1.        ]
 [-0.4472136  -0.4472136   1.41421356 -0.70710678  2.23606798 -0.4472136
  -0.4472136  -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
   1.031691    1.        ]
 [-0.4472136  -0.4472136  -0.70710678  1.41421356 -0.4472136  -0.4472136
  -0.4472136   2.23606798 -0.4472136  -0.4472136   0.4472136  -0.4472136
   1.30082778  1.        ]]

Project details

Release history Release notifications | RSS feed

This version

0.1.21

Aug 1, 2018

0.1.20

Jun 21, 2018

0.1.19

Jun 21, 2018

0.1.18

Jun 19, 2018

0.1.17

Jun 19, 2018

0.1.16

Jun 19, 2018

0.1.15

Jun 19, 2018

0.1.14

Jun 18, 2018

0.1.13

Jun 16, 2018

0.1.12

Jun 6, 2018

0.1.10

May 29, 2018

0.1.9

May 29, 2018

0.1.8

May 29, 2018

0.1.7

May 25, 2018

0.1.6

May 24, 2018

0.1.5

May 23, 2018

0.1.4

May 23, 2018

0.1.3

May 16, 2018

0.1.2

May 16, 2018

0.1.1

May 14, 2018

0.1.0

May 14, 2018

0.0.9

May 14, 2018

0.0.8

May 13, 2018

0.0.7

May 12, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arbok-0.1.21.tar.gz (15.0 kB view details)

Uploaded Aug 1, 2018 Source

Built Distribution

arbok-0.1.21-py3-none-any.whl (21.2 kB view details)

Uploaded Aug 1, 2018 Python 3

File details

Details for the file arbok-0.1.21.tar.gz.

File metadata

Download URL: arbok-0.1.21.tar.gz
Upload date: Aug 1, 2018
Size: 15.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for arbok-0.1.21.tar.gz
Algorithm	Hash digest
SHA256	`8154b36d9ca2633e5327d9eb0a669bb4674fd43ba69c42fc8b9eee7335422b49`
MD5	`20e07107fa3ee5a2cbf93ab224601fd4`
BLAKE2b-256	`db4aad817d0af1536e4667c37068792ca7fcd869010438814c04677409a39cde`

See more details on using hashes here.

File details

Details for the file arbok-0.1.21-py3-none-any.whl.

File metadata

Download URL: arbok-0.1.21-py3-none-any.whl
Upload date: Aug 1, 2018
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for arbok-0.1.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9aad449bd189a18e38c9e3cc360772d1ee77739d14d42aeef59a570bfc89d162`
MD5	`312dcad535c1fcd371dd83a9d1b79238`
BLAKE2b-256	`aec048f3b17326c84e1cbdc39181131834ee00e6b3ed5206507b01eb0e506c01`

See more details on using hashes here.

arbok 0.1.21

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Installation

Simple example

Preprocessing data

Limitations

Benchmarking

Running a benchmark on batch systems

Preprocessing parameters

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes