Skip to main content

A wrapper toolbox that provides compatibility layers between TPOT and Auto-Sklearn and OpenML

Project description

Arbok (Automl wrapper toolbox for openml compatibility) provides wrappers for TPOT and Auto-Sklearn, as a compatibility layer between these tools and OpenML.

The wrapper extends Sklearn’s BaseSearchCV and provides all the internal parameters that OpenML needs, such as cv_results_, best_index_, best_params_, best_score_ and classes_.

Installation

pip install arbok

Simple example

import openml
from arbok import AutoSklearnWrapper, TPOTWrapper


task = openml.tasks.get_task(31)
dataset = task.get_dataset()

# Get the AutoSklearn wrapper and pass parameters like you would to AutoSklearn
clf = AutoSklearnWrapper(
    time_left_for_this_task=3600, per_run_time_limit=360
)

# Or get the TPOT wrapper and pass parameters like you would to TPOT
clf = TPOTWrapper(
    generations=100, population_size=100, verbosity=2
)

# Execute the task
run = openml.runs.run_model_on_task(task, clf)
run.publish()

print('URL for run: %s/run/%d' % (openml.config.server, run.run_id))

Preprocessing data

To make the wrapper more robust, we need to preprocess the data. We can fill the missing values, and one-hot encode categorical data.

First, we get a mask that tells us whether a feature is a categorical feature or not.

dataset = task.get_dataset()
_, categorical = dataset.get_data(return_categorical_indicator=True)
categorical = categorical[:-1]  # Remove last index (which is the class)

Next, we setup a pipeline for the preprocessing. We are using a ConditionalImputer, which is an imputer which is able to use different strategies for categorical (nominal) and numerical data.

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
from arbok import ConditionalImputer

preprocessor = make_pipeline(

    ConditionalImputer(
        categorical_features=categorical,
        strategy="mean",
        strategy_nominal="most_frequent"
    ),

    OneHotEncoder(
        categorical_features=categorical, handle_unknown="ignore", sparse=False
    )
)

And finally, we put everything together in one of the wrappers.

clf = AutoSklearnWrapper(
    preprocessor=preprocessor, time_left_for_this_task=3600, per_run_time_limit=360
)

Limitations

  • Currently only the classifiers are implemented. Regression is therefore not possible.

  • For TPOT, the config_dict variable can not be set, because this causes problems with the API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arbok-0.0.8.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

arbok-0.0.8-py3-none-any.whl (11.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page