theflow

A simple framework to build and run flows

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Pipeline development framework, easy to experiment and compare different pipelines, quick to deploy to workflow orchestration tools

Most of the current workflow orchestrators focus on executing the already-developed pipelines in production. This library focuses on the pipeline development process. It aims to make it easy to develop pipeline, and once the user reaches a good pipeline, it aims to make it easy to export to other production-grade workflow orchestrators. Notable features:

Manage pipeline experiment: store pipeline run outputs, compare pipelines & visualize.
Support pipeline as code: allow complex customization.
Support pipeline as configuration - suitable for plug-and-play when pipeline is more stable.
Fast pipeline execution, auto-cache & run from cache when necessary.
Allow version control of artifacts with git-lfs/dvc/god...
Export pipeline to compatible workflow orchestration tools (e.g. Argo workflow, Airflow, Kubeflow...).

Install

pip install theflow

Quick start

(A code walk-through of this session is stored in examples/10-minutes-quick-start.ipynb. You can run it with Google Colab (TODO - attach the link).)

Pipeline can be defined as code. You initialize all the ops in self.initialize and route them in self.run.

from theflow import Function

# Define some operations used inside the pipeline
# Operation 1: normal class-based Python object
class IncrementBy(Function):

  x: int

  def run(self, y):
    return self.x + y

# Operation 2: normal Python function
def decrement_by_5(x):
  return x - 5

# Declare flow
class MathFlow(Function):

  increment: Function
  decrement: Function

  def run(self, x):
    # Route the operations in the flow
    y = self.increment(x)
    y = self.decrement(y)
    y = self.increment(y)
    return y

flow = MathFlow(increment=IncrementBy(x=10), decrement=decrement_by_5)

You run the pipeline by directly calling it. The output is the same object returned by self.run.

output = flow(x=5)
print(f"{output=}, {type(output)=}")      # output=5, type(output)=int

You can investigate pipeline's last run through the last_run property.

flow.last_run.id()                        # id of the last run
flow.last_run.logs()                      # list all information of each step
# [TODO] flow.last_run.visualize(path="vis.png")   # export the graph in `vis.png` file

Future features

Arguments management
Cache
- cache by runs, organized by root task, allow reproducible
- specify the files
- the keys are like lru_cache, takes in the original input key, specify the cache, but the cache should be file-backed, for run-after-run execution.
- cli command to manipulate cache
Compare pipeline in a result folder
Dynamically create reproducible config
Support pipeline branching and merging
Support single process or multi-processing pipeline running
Can synchronize changes in the workflow, allowing logs from different run to be compatible with each other
Compare different runs
- Same cache directory
- Compare evaluation result based on kwargs
CLI List runs
CLI Delete unnecessary runs
Add coverage, pre-commit, CI...

License

MIT License.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.8.6

Apr 6, 2024

0.8.5

Jan 30, 2024

0.8.4

Jan 21, 2024

0.8.3

Jan 10, 2024

0.8.2

Dec 19, 2023

0.8.1

Dec 9, 2023

0.8.0

Dec 7, 2023

0.7.6

Dec 6, 2023

0.7.5

Dec 4, 2023

0.7.4

Dec 4, 2023

0.7.3

Dec 3, 2023

0.7.2

Dec 1, 2023

0.7.1

Dec 1, 2023

0.7.0

Nov 19, 2023

0.6.0

Nov 18, 2023

0.5.1

Nov 17, 2023

0.5.0

Oct 24, 2023

0.4.10

Oct 7, 2023

0.4.9

Oct 4, 2023

0.4.8

Oct 2, 2023

0.4.7

Oct 2, 2023

0.4.6

Oct 1, 2023

0.4.5

Sep 30, 2023

0.4.4

Sep 28, 2023

0.4.3

Sep 28, 2023

0.4.2

Sep 26, 2023

0.4.1

Sep 25, 2023

0.4.0

Sep 17, 2023

0.3.4

Sep 17, 2023

0.3.3

Sep 12, 2023

0.3.2

Sep 11, 2023

0.3.1

Sep 5, 2023

0.3.0

Sep 4, 2023

0.2.3

Sep 4, 2023

0.2.2

Aug 30, 2023

0.2.1

Aug 27, 2023

0.2.0

Aug 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theflow-0.8.6.tar.gz (52.2 kB view hashes)

Uploaded Apr 6, 2024 Source

Built Distribution

theflow-0.8.6-py3-none-any.whl (48.2 kB view hashes)

Uploaded Apr 6, 2024 Python 3

Hashes for theflow-0.8.6.tar.gz

Hashes for theflow-0.8.6.tar.gz
Algorithm	Hash digest
SHA256	`3b0b830e9c5691f2a697aa8ff53927e9c965c2422c5001bef6de9a13ef7f577c`
MD5	`5238444e73e42a4a72a6e50b85562284`
BLAKE2b-256	`9a34e5e4c44d1514885fc161606941e7e92f59f81842c82bc366b32121c6b3b7`

Hashes for theflow-0.8.6-py3-none-any.whl

Hashes for theflow-0.8.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b430c6ea0d2617b0cb5087df7b7922d7592619e7b7ffe8dbd164db6f219e8951`
MD5	`752119751b2391fa719f0af60a69da1a`
BLAKE2b-256	`a1aee7e2f20bedfce559ce52f5f82b180f58c58431576768a7d29621f18ee504`