Reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your laptop!
Project description
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine!
Motivation
Would you like fully reproducible research or reusable workflows that seamlessly run on HPC clusters?
Tired of writing and managing large Slurm submission scripts? Do you have comment out large parts of your pipeline whenever its results have been generated? Hate YAML?
Don't waste your precious time! awflow allows you to directly describe complex pipelines in Python, that run on your personal computer and large HPC clusters.
import glob
import numpy as np
import os
from awflow import after, ensure, job, schedule
n = 10000
tasks = 10
@ensure(lambda i: os.path.exists(f'pi-{i}.npy'))
@job(cpus='4', memory='4GB', array=tasks)
def estimate(i: int):
print(f'Executing task {i + 1} / {tasks}.')
x = np.random.random(n)
y = np.random.random(n)
pi_estimate = (x**2 + y**2 <= 1)
np.save(f'pi-{i}.npy', pi_estimate)
@after(estimate)
@ensure(lambda: os.path.exists('pi.npy'))
@job(cpus='4')
def merge():
files = glob.glob('pi-*.npy')
stack = np.vstack([np.load(f) for f in files])
pi_estimate = stack.sum() / (n * tasks) * 4
print('π ≅', pi_estimate)
np.save('pi.npy', pi_estimate)
merge.prune() # Prune jobs whose postconditions have been satisfied
schedule(merge, backend='local') # Executes merge and its dependencies
Executing this Python program (python examples/pi.py --backend slurm) on a Slurm HPC cluster will launch the following jobs.
1803299 all merge username PD 0:00 1 (Dependency)
1803298_[6-9] all estimate username PD 0:00 1 (Resources)
1803298_3 all estimate username R 0:01 1 compute-xx
1803298_4 all estimate username R 0:01 1 compute-xx
1803298_5 all estimate username R 0:01 1 compute-xx
The following example shows how workflow graphs can be dynamically allocated:
from awflow import after, job, schedule, terminal_nodes
@job(cpus='2', memory='4GB', array=5)
def generate(i: int):
print(f'Generating data block {i}.')
@after(generate)
@job(cpus='1', memory='2GB', array=5)
def postprocess(i: int):
print(f'Postprocessing data block {i}.')
def do_experiment(parameter):
r"""This method allocates a `fit` and `make_plot` job
based on the specified parameter."""
@after(postprocess)
@job(name=f'fit_{parameter}') # By default, the name is equal to the function name
def fit():
print(f'Fit {parameter}.')
@after(fit)
@job(name=f'plt_{parameter}') # Simplifies the identification of the logfile
def make_plot():
print(f'Plot {parameter}.')
# Programmatically build workflow
for parameter in [0.1, 0.2, 0.3, 0.4, 0.5]:
do_experiment(parameter)
leafs = terminal_nodes(generate, prune=True) # Find terminal nodes of workflow graph
schedule(*leafs, backend='local')
Check the examples directory to explore the functionality.
Usage
TODO
Available backends
Currently, awflow.schedule only supports a local and slurm backend.
Installation
The awflow package is available on PyPi, which means it is installable via pip.
you@local:~ $ pip install awflow
If you would like the latest features, you can install it using this Git repository.
you@local:~ $ pip install git+https://github.com/JoeriHermans/awflow
If you would like to run the examples as well, be sure to install the optional example dependencies.
you@local:~ $ pip install 'awflow[examples]'
Roadmap and TODO
- Should
schedulereturn metadata of jobs and workflow? - Check for cyclic dependencies.
- More examples and documentation.
- Utilities to cleanup generated metadata and crashed jobs for the Slurm backend.
- Can jobs submit jobs on both local and Slurm backend?
Contributing
See CONTRIBUTING.md.
License
As described in the LICENSE file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awflow-0.1.0.tar.gz.
File metadata
- Download URL: awflow-0.1.0.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
820780cdabc4d08c0aaa53517a40af7063dc10cf6578d84fbea3da948d7cbb4c
|
|
| MD5 |
958aa7a52141810b12ef95b55f27d709
|
|
| BLAKE2b-256 |
7f9071ea2752ec98d93d3b4ea92591df041f0e5f75f6e71c6ad538373fb2efcd
|
File details
Details for the file awflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: awflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93a846172994faf2bd79b55a005d151f234d52683c532b07769f6521edbdb94f
|
|
| MD5 |
77f53dde16bd77c8f14294bc16b75619
|
|
| BLAKE2b-256 |
648b2018c836342c99664c7d4867111a281c75da764789d8938eef5fc26a40e7
|