Skip to main content

DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning.

Project description

DeepPipe: Deep Pipeline Embeddings for AutoML

DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning. For detailed information, refer to our paper Deep Pipeline Embeddings for AutoML accepted at KDD 2023. Additionally, you can visit our blog-post to have a friendly insight on how our method works.

DeepPipe Architecture

Installation

We present an API for optimizing pipelines in scikit-learn based on the TensorOboe search space. You can use it to search for accurate pipelines or for benchmarking your Machine Learning model on tabular data.

conda create -n deeppipe_env python==3.9
pip install deeppipe_api

Getting started

We present an example using an OpenML dataset. However, it works with any tabular data typed as pandas dataframe.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task_id = 37
task = openml.tasks.get_task(task_id)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600 #in seconds
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)

#Test
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)

#print best pipeline
print(deep_pipe.model)

Ensemble of Pipelines

It is possible to ensemble the best pipelines, by using a greedy approach.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task = openml.tasks.get_task(task_id=37)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600, #in seconds
                    create_ensemble = False,
                    ensemble_size = 10,
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score) 

Advanced Usage

For meta-training DeepPipe or testing other search spaces, you can refer to the folder src/deeppipe_api/experiments/.

Our Paper

If you use this repository/package, please cite our paper:

@article{arango2023deep,
  title={Deep Pipeline Embeddings for AutoML},
  author={Arango, Sebastian Pineda and Grabocka, Josif},
  journal={arXiv preprint arXiv:2305.14009},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeppipe_api-0.1.4.tar.gz (49.3 MB view details)

Uploaded Source

Built Distribution

deeppipe_api-0.1.4-py3-none-any.whl (51.3 MB view details)

Uploaded Python 3

File details

Details for the file deeppipe_api-0.1.4.tar.gz.

File metadata

  • Download URL: deeppipe_api-0.1.4.tar.gz
  • Upload date:
  • Size: 49.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for deeppipe_api-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c9adee9baad172cde6e68b00ffe2cf9c833d5b304fd2095a7e48949087581e63
MD5 c6818454acff942199ee595a52a1b9ff
BLAKE2b-256 6a20f6321fc1e811f95ef46573cad1e65659fc89194c4365fca88442f2374457

See more details on using hashes here.

File details

Details for the file deeppipe_api-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for deeppipe_api-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 624e2a6695455e13e875f7c007d87049740262e7bfe11215c7204d1e8d3a4cf2
MD5 06e1f1bddfef669eb2cc71c84360ac16
BLAKE2b-256 e06f0d5638effd77364ed3be78179fe4fed50bfda0d4163c110d05a8982775e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page