A tiny pipeline builder
Project description
chkpt
A tiny pipeline builder
What
chkpt
is a zero-dependency, 100-line library that makes it easy to define and execute checkpointed pipelines.
It features...
- Fluent pipeline construction
- Transparent caching of expensive operations
- JSON serialization
How
Defining a Stage
Stage
s are the atomic units of work in chkpt
and correspond to single Python functions. Existing functions need only use a decorator @chkpt.Stage.wrap()
to be used as a Stage
:
@chkpt.Stage.wrap()
def stage1():
return "123"
# stage1 is now a Stage instance
assert isinstance(stage1, chkpt.Stage)
# but the original function is still accessible
assert stage1.func() == "123"
Stage
s can also accept parameters to be provided by other Stage
s in the final Pipeline
:
@chkpt.Stage.wrap()
def stage2(stage1_input):
return [stage1_input, "456"]
Defining a Pipeline
Pipeline
s define the excution graph of Stage
s to be run. Stage
s are combined with shift operators (<<
and >>
) to direct the dataflow:
# Each defines a pipeline calculating `stage1` and passing its output to `stage2`.
pipeline = stage1 >> stage2
pipeline = stage2 << stage1
pipeline = stage2 << (stage1,)
pipeline = (stage1,) >> stage2
pipeline = () >> stage1 >> stage2
More complex pipelines should be defined from the leaves down:
result1 = (stage1, stage2) >> stage3
result2 = (result1, stage1) >> stage4
pipeline = result2 >> stage5
Executing a Pipeline
Pipeline
s can be directly executed which will use the default config settings:
result = pipeline()
The defaults can be configured by passing a Config
instance:
# Will store all stage results and attempt to load already-stored results, if present.
result = pipeline(chkpt.Config(store=True, load=True, dir='/tmp'))
Examples
For detailed usage, see the examples/ directory.
The following is a brief example pipeline:
import chkpt
@chkpt.Stage.wrap()
def make_dataset1():
...
@chkpt.Stage.wrap()
def big_download2():
...
@chkpt.Stage.wrap()
def work_in_progress_analysis(dataset1, dataset2):
...
pipeline = (make_dataset1, big_download2) >> work_in_progress_analysis
# Work-intensive inputs only run once, caching on reruns.
result = pipeline(chkpt.Config(load=[make_dataset1, big_download2]))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chkpt-0.1.0.tar.gz
.
File metadata
- Download URL: chkpt-0.1.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b8553e61698b8452b19e8b2efbb01923fce59c444bf1b9c813d14091d999b8c |
|
MD5 | fbcfc2bb41114f18bb1f43add1dafb9b |
|
BLAKE2b-256 | 9f506ef44e733536d8fe17c12b5b044a208a3886444c3accf87df4fd2e63a3ba |
File details
Details for the file chkpt-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: chkpt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9116898e3d219d6a8064d438e979ed5d919fcce51c6b0e9cc72f85ed7fd3cbb5 |
|
MD5 | 9246fc6bbd00d9522228ff5aea6acd16 |
|
BLAKE2b-256 | d18d53f4d17aef83020862e9030d175ffc28ef04426fa82d0b3bd283f8ee978b |