Pipeline library that simplifies creation of pipelines that run on top of hail Batch and other compute enviornments
Project description
step-pipeline
Python library for defining and executing pipelines that run in container or VM execution environments like Hail Batch, Terra, SGE, etc., and that localize/delocalize input/output files to cloud storage services like Google Storage buckets. Currently, it supports Hail Batch and Google Cloud Storage only.
The main goal of this library is to reduce repetitive code in your python pipeline definition script (PPDS) by taking care of common pipeline needs such as skipping execution steps that don't need to run because their output files already exist.
The things it takes care of include:
- before submitting your pipeline for execution, it
a) checks pipeline input files and throws an error if any are missing.
b) skips steps whose outputs already exist and are newer than the inputs. - adds command-line args to your PPDS for skipping some steps and/or forcing re-execution of others
- provides a simplified API for localizing input files and delocalizing output files using different strategies (copy, gcfuse, etc.)
- notifies you via slack or email when the pipeline completes
- optionally, records profiling info by starting a background process within the job execution container to record cpu and memory at regular intervals while your commands are running
- optionally, generates an image of the pipeline DAG
NOTE: some features only work if specific tools are installed inside the docker container or local environment that's executing your pipeline.
Installation
To install the step-pipeline
library, run:
python3 -m pip install step-pipeline
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for step_pipeline-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bea2c4b46737c13ceb3bcbabf6000b4dabe778610fcb4656124bf1eccf531130 |
|
MD5 | 08057bf222583dfb2eb9ecc30a55962a |
|
BLAKE2b-256 | ad8fcecc2bad33c3fd9ad4da435eab310c5ffde5a757ef4fa102bb86b1acb378 |