DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning.
Project description
DeepPipe: Deep Pipeline Embeddings for AutoML
DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning. For detailed information, refer to our paper Deep Pipeline Embeddings for AutoML accepted at KDD 2023. Additionally, you can visit our blog-post to have a friendly insight on how our method works.
Installation
We present an API for optimizing pipelines in scikit-learn based on the TensorOboe search space. You can use it to search for accurate pipelines or for benchmarking your Machine Learning model on tabular data.
conda create -n deeppipe_env python==3.9
pip install deeppipe_api
Getting started
We present an example using an OpenML dataset. However, it works with any tabular data typed as pandas dataframe.
from deeppipe_api.deeppipe import load_data, openml, DeepPipe
task_id = 37
task = openml.tasks.get_task(task_id)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50, #bo iterations
time_limit = 3600 #in seconds
)
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
#Test
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)
#print best pipeline
print(deep_pipe.model)
Ensemble of Pipelines
It is possible to ensemble the best pipelines, by using a greedy approach.
from deeppipe_api.deeppipe import load_data, openml, DeepPipe
task = openml.tasks.get_task(task_id=37)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50, #bo iterations
time_limit = 3600, #in seconds
create_ensemble = False,
ensemble_size = 10,
)
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)
Advanced Usage
For meta-training DeepPipe or testing other search spaces, you can refer to the folder src/deeppipe_api/experiments/
.
Our Paper
If you use this repository/package, please cite our paper:
@article{arango2023deep,
title={Deep Pipeline Embeddings for AutoML},
author={Arango, Sebastian Pineda and Grabocka, Josif},
journal={arXiv preprint arXiv:2305.14009},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deeppipe_api-0.1.4.tar.gz
.
File metadata
- Download URL: deeppipe_api-0.1.4.tar.gz
- Upload date:
- Size: 49.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9adee9baad172cde6e68b00ffe2cf9c833d5b304fd2095a7e48949087581e63 |
|
MD5 | c6818454acff942199ee595a52a1b9ff |
|
BLAKE2b-256 | 6a20f6321fc1e811f95ef46573cad1e65659fc89194c4365fca88442f2374457 |
File details
Details for the file deeppipe_api-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: deeppipe_api-0.1.4-py3-none-any.whl
- Upload date:
- Size: 51.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 624e2a6695455e13e875f7c007d87049740262e7bfe11215c7204d1e8d3a4cf2 |
|
MD5 | 06e1f1bddfef669eb2cc71c84360ac16 |
|
BLAKE2b-256 | e06f0d5638effd77364ed3be78179fe4fed50bfda0d4163c110d05a8982775e4 |