Skip to main content

project_description

Project description

unipipe

Unified pipeline library.

:warning: Experimental :warning:

  • Build batch pipelines in Python that run anywhere -- on your laptop, on the server, and in the cloud.
  • Easily scale local experiments to the cloud without any changes
  • Save time by only writing each pipeline once
  • Save money by only paying for the compute infrastructure you need

About

unipipe makes it easy to build batch pipelines in Python, then run them either locally or in the cloud. It was originally created for machine learning workflows, but it works for any batch data processing pipeline.

Install

From PyPI:

# Minimal install
pip install unipipe

# With additional executors (e.g. 'docker', 'vertex')
pip install unipipe[vertex]

From source:

# Minimal install
pip install "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"

# With additional executors (e.g. 'docker', 'vertex')
pip install[vertex] "unipipe @ git+ssh://git@github.com/fkodom/unipipe.git"

If you'd like to contribute, install all dependencies and pre-commit hooks:

# Install all dependencies
pip install "unipipe[all] @ git+ssh://git@github.com/fkodom/unipipe.git"
# Setup pre-commit hooks
pre-commit install

Getting Started

Build a pipeline once using the unipipe DSL:

from unipipe import dsl

@dsl.component
def say_hello(name: str) -> str:
    return f"Hello, {name}!"

@dsl.pipeline
def pipeline():
    say_hello(name="world")

Then, run the pipeline using any of the supported backends:

from unipipe import run

run(
    # Supported executors include:
    #   'python' --> runs in the current Python process
    #   'docker' --> runs each component in a separate Docker container
    #   'vertex' --> runs in GCP through Vertex, which in turn uses KFP
    executor="python",
    pipeline=pipeline(),
)

Expected output:

INFO:root:[say_hello-1603ae3e] - Hello, world!

More Examples

Link Description
Hello World Create/run your first unipipe pipeline
Hello Pipeline Create pipelines with multiple steps
Multi-output Components Build components that return more than one type-checked value
Pipeline Arguments Make pipelines reusable with dynamic inputs
Dependency Management Install and use other Python packages in your pipelines
Hardware Specs Request hardware (CPUs, Memory, GPUs) for your pipeline runs
Nested Pipelines Call existing pipelines from inside another pipeline
Control Flow Add conditional control flow to your pipelines
Advanced Control Flow Best practices for advanced control flow
Private Dependencies Using private Python packages

Why unipipe?

  1. unipipe was designed to mitigate issues with Kubeflow Pipelines (KFP).
    • Kubeflow and KFP are often used by machine learning engineers to orchestrate training jobs, data preprocessing, and other computationally intensive tasks.
  2. KFP pipelines only run on Kubeflow.
    • Kubeflow requires specialized knowledge and additional compute resources. It can be expensive and/or impractical for individuals and small teams.
    • Managed, serverless platforms like Vertex (Google Cloud) exist, which automate all of that. But still, pipelines only run on KFP/Vertex -- not on your laptop.
  3. Why write the same pipeline twice?
    • KFP developers often write multiple pipeline scripts. One for their laptop, and another for the cloud.
    • TODO: Finish this section...

TODO

unipipe is still in early development, so there are lots of things to do. :sweat_smile: I won't list everything here -- just some of the larger, long-term goals.

  1. Add executor for KFP clusters, in addition to Vertex.
  2. Better up-front type checking (i.e. before running the pipeline).
  3. Apache Beam backend and executor (???)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unipipe-0.2.2.tar.gz (20.5 kB view hashes)

Uploaded Source

Built Distribution

unipipe-0.2.2-py3-none-any.whl (19.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page