Skip to main content

DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning.

Project description

DeepPipe: Deep Pipeline Embeddings for AutoML

DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning. For detailed information, refer to our paper Deep Pipeline Embeddings for AutoML accepted at KDD 2023. Additionally, you can visit our blog-post to have a friendly insight on how our method works.

DeepPipe Architecture

Installation

We present an API for optimizing pipelines in scikit-learn based on the TensorOboe search space. You can use it to search for accurate pipelines or for benchmarking your Machine Learning model on tabular data.

conda create -n deeppipe_env python==3.9
pip install deeppipe_api

Getting started

We present an example using an OpenML dataset. However, it works with any tabular data typed as pandas dataframe.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task_id = 37
task = openml.tasks.get_task(task_id)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600 #in seconds
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)

#Test
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)

#print best pipeline
print(deep_pipe.model)

Ensemble of Pipelines

It is possible to ensemble the best pipelines, by using a greedy approach.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task = openml.tasks.get_task(task_id=37)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600, #in seconds
                    create_ensemble = False,
                    ensemble_size = 10,
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score) 

Advanced Usage

For meta-training DeepPipe or testing other search spaces, you can refer to the folder src/deeppipe_api/experiments/.

Our Paper

If you use this repository/package, please cite our paper:

@article{arango2023deep,
  title={Deep Pipeline Embeddings for AutoML},
  author={Arango, Sebastian Pineda and Grabocka, Josif},
  journal={arXiv preprint arXiv:2305.14009},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeppipe_api-0.1.4.tar.gz (49.3 MB view hashes)

Uploaded Source

Built Distribution

deeppipe_api-0.1.4-py3-none-any.whl (51.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page