Skip to main content

Reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your laptop!

Project description

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine!

Motivation

Would you like fully reproducible research or reusable workflows that seamlessly run on HPC clusters? Tired of writing and managing large Slurm submission scripts? Do you have comment out large parts of your pipeline whenever its results have been generated? Hate YAML? Don't waste your precious time! awflow allows you to directly describe complex pipelines in Python, that run on your personal computer and large HPC clusters.

import glob
import numpy as np
import os

from awflow import after, ensure, job, schedule

n = 10000
tasks = 10

@ensure(lambda i: os.path.exists(f'pi-{i}.npy'))
@job(cpus='4', memory='4GB', array=tasks)
def estimate(i: int):
    print(f'Executing task {i + 1} / {tasks}.')
    x = np.random.random(n)
    y = np.random.random(n)
    pi_estimate = (x**2 + y**2 <= 1)
    np.save(f'pi-{i}.npy', pi_estimate)

@after(estimate)
@ensure(lambda: os.path.exists('pi.npy'))
@job(cpus='4')
def merge():
    files = glob.glob('pi-*.npy')
    stack = np.vstack([np.load(f) for f in files])
    pi_estimate = stack.sum() / (n * tasks) * 4
    print('π ≅', pi_estimate)
    np.save('pi.npy', pi_estimate)

merge.prune()  # Prune jobs whose postconditions have been satisfied

schedule(merge, backend='local')  # Executes merge and its dependencies

Executing this Python program (python examples/pi.py --backend slurm) on a Slurm HPC cluster will launch the following jobs.

           1803299       all    merge username PD       0:00      1 (Dependency)
     1803298_[6-9]       all estimate username PD       0:00      1 (Resources)
         1803298_3       all estimate username  R       0:01      1 compute-xx
         1803298_4       all estimate username  R       0:01      1 compute-xx
         1803298_5       all estimate username  R       0:01      1 compute-xx

The following example shows how workflow graphs can be dynamically allocated:

from awflow import after, job, schedule, terminal_nodes

@job(cpus='2', memory='4GB', array=5)
def generate(i: int):
    print(f'Generating data block {i}.')

@after(generate)
@job(cpus='1', memory='2GB', array=5)
def postprocess(i: int):
    print(f'Postprocessing data block {i}.')

def do_experiment(parameter):
    r"""This method allocates a `fit` and `make_plot` job
    based on the specified parameter."""

    @after(postprocess)
    @job(name=f'fit_{parameter}')  # By default, the name is equal to the function name
    def fit():
        print(f'Fit {parameter}.')

    @after(fit)
    @job(name=f'plt_{parameter}')  # Simplifies the identification of the logfile
    def make_plot():
        print(f'Plot {parameter}.')

# Programmatically build workflow
for parameter in [0.1, 0.2, 0.3, 0.4, 0.5]:
    do_experiment(parameter)

leafs = terminal_nodes(generate, prune=True)  # Find terminal nodes of workflow graph
schedule(*leafs, backend='local')

Check the examples directory to explore the functionality.

Usage

TODO

Available backends

Currently, awflow.schedule only supports a local and slurm backend.

Installation

The awflow package is available on PyPi, which means it is installable via pip.

you@local:~ $ pip install awflow

If you would like the latest features, you can install it using this Git repository.

you@local:~ $ pip install git+https://github.com/JoeriHermans/awflow

If you would like to run the examples as well, be sure to install the optional example dependencies.

you@local:~ $ pip install 'awflow[examples]'

Roadmap and TODO

  • Should schedule return metadata of jobs and workflow?
  • Check for cyclic dependencies.
  • More examples and documentation.
  • Utilities to cleanup generated metadata and crashed jobs for the Slurm backend.
  • Can jobs submit jobs on both local and Slurm backend?

Contributing

See CONTRIBUTING.md.

License

As described in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awflow-0.1.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

awflow-0.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file awflow-0.1.0.tar.gz.

File metadata

  • Download URL: awflow-0.1.0.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for awflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 820780cdabc4d08c0aaa53517a40af7063dc10cf6578d84fbea3da948d7cbb4c
MD5 958aa7a52141810b12ef95b55f27d709
BLAKE2b-256 7f9071ea2752ec98d93d3b4ea92591df041f0e5f75f6e71c6ad538373fb2efcd

See more details on using hashes here.

File details

Details for the file awflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: awflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for awflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93a846172994faf2bd79b55a005d151f234d52683c532b07769f6521edbdb94f
MD5 77f53dde16bd77c8f14294bc16b75619
BLAKE2b-256 648b2018c836342c99664c7d4867111a281c75da764789d8938eef5fc26a40e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page