project_description
Project description
unipipe
Unified pipeline library.
:warning: Experimental :warning:
- Build batch pipelines in Python that run anywhere -- on your laptop, on the server, and in the cloud.
- Easily scale local experiments to the cloud without any changes
- Save time by only writing each pipeline once
- Save money by only paying for the compute infrastructure you need
About
unipipe
makes it easy to build batch pipelines in Python, then run them either locally or in the cloud. It was originally created for machine learning workflows, but it works for any batch data processing pipeline.
Install
From PyPI:
# Minimal install
pip install unipipe
# With additional executors (e.g. 'docker', 'vertex')
pip install unipipe[vertex]
From source:
# Minimal install
pip install "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"
# With additional executors (e.g. 'docker', 'vertex')
pip install[vertex] "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"
If you'd like to contribute, install all dependencies and pre-commit hooks:
# Install all dependencies
pip install "unipipe[all] @ git+ssh://git@github.com/fkodom/unipipe.git"
# Setup pre-commit hooks
pre-commit install
Getting Started
Build a pipeline once using the unipipe
DSL:
from unipipe import dsl
@dsl.component
def say_hello(name: str) -> str:
return f"Hello, {name}!"
@dsl.pipeline
def pipeline():
say_hello(name="world")
Then, run the pipeline using any of the supported backends:
from unipipe import run
run(
# Supported executors include:
# 'python' --> runs in the current Python process
# 'docker' --> runs each component in a separate Docker container
# 'vertex' --> runs in GCP through Vertex, which in turn uses KFP
executor="python",
pipeline=pipeline(),
)
Expected output:
INFO:root:[say_hello-1603ae3e] - Hello, world!
Run Any Python Script
Or scale any Python script to the cloud using the unipipe
CLI:
# Same choices of executors as above.
unipipe run-script \
--executor vertex \
--pipeline-root "gs://bucket-name/artifact-root/ \
./examples/ex01_hello_world.py
This makes experimentation easy. unipipe
will automatically compose your script into a pipeline, and launch it with your chosen executor. See this example for more details.
More Examples
Link | Description |
---|---|
Hello World | Create/run your first unipipe pipeline |
Hello Pipeline | Create pipelines with multiple steps |
Multi-output Components | Build components that return more than one type-checked value |
Pipeline Arguments | Make pipelines reusable with dynamic inputs |
Dependency Management | Install and use other Python packages in your pipelines |
Hardware Specs | Request hardware (CPUs, Memory, GPUs) for your pipeline runs |
Nested Pipelines | Call existing pipelines from inside another pipeline |
Control Flow | Add conditional control flow to your pipelines |
Advanced Control Flow | Best practices for advanced control flow |
Private Dependencies | Using private Python packages |
Run Any Python Script | Run any Python script using unipipe |
Why unipipe
?
unipipe
was designed to mitigate issues with Kubeflow Pipelines (KFP).- Kubeflow and KFP are often used by machine learning engineers to orchestrate training jobs, data preprocessing, and other computationally intensive tasks.
- KFP pipelines only run on Kubeflow.
- Kubeflow requires specialized knowledge and additional compute resources. It can be expensive and/or impractical for individuals and small teams.
- Managed, serverless platforms like Vertex (Google Cloud) exist, which automate all of that. But still, pipelines only run on KFP/Vertex -- not on your laptop.
- Why write the same pipeline twice?
- KFP developers often write multiple pipeline scripts. One for their laptop, and another for the cloud.
- TODO: Finish this section...
TODO
- Add executor for KFP clusters, in addition to Vertex.
- Better up-front type checking (in progress).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.