DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning.
Project description
DeepPipe: Deep Pipeline Embeddings for AutoML
DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning. For detailed information, refer to our paper Deep Pipeline Embeddings for AutoML accepted at KDD 2023. Additionally, you can visit our blog-post to have a friendly insight on how our method works.
Installation
We present an API for optimizing pipelines in scikit-learn based on the TensorOboe search space. You can use it to search for accurate pipelines or for benchmarking your Machine Learning model on tabular data.
conda create -n deeppipe_env python==3.9
pip install deeppipe_api
Getting started
We present an example using an OpenML dataset. However, it works with any tabular data typed as pandas dataframe.
from deeppipe_api.deeppipe import load_data, openml, DeepPipe
task_id = 37
task = openml.tasks.get_task(task_id)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50, #bo iterations
time_limit = 3600 #in seconds
)
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
#Test
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)
#print best pipeline
print(deep_pipe.model)
Ensemble of Pipelines
It is possible to ensemble the best pipelines, by using a greedy approach.
from deeppipe_api.deeppipe import load_data, openml, DeepPipe
task = openml.tasks.get_task(task_id=37)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50, #bo iterations
time_limit = 3600, #in seconds
create_ensemble = False,
ensemble_size = 10,
)
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)
Advanced Usage
For meta-training DeepPipe or testing other search spaces, you can refer to the folder src/deeppipe_api/experiments/
.
Our Paper
If you use this repository/package, please cite our paper:
@article{arango2023deep,
title={Deep Pipeline Embeddings for AutoML},
author={Arango, Sebastian Pineda and Grabocka, Josif},
journal={arXiv preprint arXiv:2305.14009},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deeppipe_api-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 624e2a6695455e13e875f7c007d87049740262e7bfe11215c7204d1e8d3a4cf2 |
|
MD5 | 06e1f1bddfef669eb2cc71c84360ac16 |
|
BLAKE2b-256 | e06f0d5638effd77364ed3be78179fe4fed50bfda0d4163c110d05a8982775e4 |