project_description
Project description
unipipe
Unified pipeline library.
:warning: Experimental :warning:
- Build batch pipelines in Python that run anywhere -- on your laptop, on the server, and in the cloud.
- Easily scale local experiments to the cloud without any changes
- Save time by only writing each pipeline once
- Save money by only paying for the compute infrastructure you need
About
unipipe
makes it easy to build batch pipelines in Python, then run them either locally or in the cloud. It was originally created for machine learning workflows, but it works for any batch data processing pipeline.
Install
From PyPI:
# Minimal install
pip install unipipe
# With additional executors (e.g. 'docker', 'vertex')
pip install unipipe[vertex]
From source:
# Minimal install
pip install "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"
# With additional executors (e.g. 'docker', 'vertex')
pip install[vertex] "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"
If you'd like to contribute, install all dependencies and pre-commit hooks:
# Install all dependencies
pip install "unipipe[all] @ git+ssh://git@github.com/fkodom/unipipe.git"
# Setup pre-commit hooks
pre-commit install
Getting Started
Build a pipeline once using the unipipe
DSL:
from unipipe import dsl
@dsl.component
def say_hello(name: str) -> str:
return f"Hello, {name}!"
@dsl.pipeline
def pipeline():
say_hello(name="world")
Then, run the pipeline using any of the supported backends:
from unipipe import run
run(
# Supported executors include:
# 'python' --> runs in the current Python process
# 'docker' --> runs each component in a separate Docker container
# 'vertex' --> runs in GCP through Vertex, which in turn uses KFP
executor="python",
pipeline=pipeline(),
)
Expected output:
INFO:root:[say_hello-1603ae3e] - Hello, world!
More Examples
Link | Description |
---|---|
Hello World | Create/run your first unipipe pipeline |
Hello Pipeline | Create pipelines with multiple steps |
Multi-output Components | Build components that return more than one type-checked value |
Pipeline Arguments | Make pipelines reusable with dynamic inputs |
Dependency Management | Install and use other Python packages in your pipelines |
Hardware Specs | Request hardware (CPUs, Memory, GPUs) for your pipeline runs |
Nested Pipelines | Call existing pipelines from inside another pipeline |
Control Flow | Add conditional control flow to your pipelines |
Advanced Control Flow | Best practices for advanced control flow |
Private Dependencies | Using private Python packages |
Why unipipe
?
unipipe
was designed to mitigate issues with Kubeflow Pipelines (KFP).- Kubeflow and KFP are often used by machine learning engineers to orchestrate training jobs, data preprocessing, and other computationally intensive tasks.
- KFP pipelines only run on Kubeflow.
- Kubeflow requires specialized knowledge and additional compute resources. It can be expensive and/or impractical for individuals and small teams.
- Managed, serverless platforms like Vertex (Google Cloud) exist, which automate all of that. But still, pipelines only run on KFP/Vertex -- not on your laptop.
- Why write the same pipeline twice?
- KFP developers often write multiple pipeline scripts. One for their laptop, and another for the cloud.
- TODO: Finish this section...
TODO
unipipe
is still in early development, so there are lots of things to do. :sweat_smile: I won't list everything here -- just some of the larger, long-term goals.
- Add executor for KFP clusters, in addition to Vertex.
- Better up-front type checking (i.e. before running the pipeline).
- Apache Beam backend and executor (???)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.