Skip to main content

DAG interface for SLURM schedulers

Project description

Slurmflow

Slurmflow is a job dependency manager built to interact with slurm clusters! It serves as an abstraction between your data processing and the implementation details of Slurm. Slurmflow represents your tasks as a Directed Acyclic graph, where each node is a sbatch script to be called upon. When you "run" the dag, Slurflow recursively traverses the graph from sink (no outgoing nodes) to the source (no incoming nodes) and submits each script to slurm using the --dependency flag.

Example

Here is a simple example with three scripts (t1, t2, t3), with t3 requiring outputs of the previous two tasks. When creating each Job, all that's needed is a name for the node and the filename of a sbatch script to call. Once your jobs are created, it's possible to use the bitshift operators [<<, >>] to set dependencies. The operator 1 >> 2 denotes that 1 must finish before 2 can run, while 1 << 2 denotes the opposite. If your pipeline allows for certain steps to be run in parallel, you can add them into a list [] and perform the bitshift operations. For a more conventional syntax, you can alseo use the set_upstream or set_downstream methods like so t1.set_downstream(t2). The equivalent notation using bitshifting is t1 >> t2.

Workflow.py

from slurmflow import Job
from slurmflow import DAG

with DAG('simple_slurm_workflow', env=env) as d:
    t1 = Job('Task1', 'task1.sh', dag=d)
    t2 = Job('Task2', 'task2.sh', dag=d)
    t3 = Job('Task3', 'task3.sh', dag=d)


[t1, t2] >> t3

d.run()
d.plot()

Scripts

task1.sh

#!/bin/bash
#SBATCH --job-name=slurmflow_test_1
#SBATCH --output=slurmflow_test_1.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G

cat data1.txt > data4.txt

env

echo "task 1 complete"

task2.sh

#!/bin/bash
#SBATCH --job-name=slurmflow_test_2
#SBATCH --output=slurmflow_test_2.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G

cat data2.txt > data3.txt

env

echo "task 2 complete"

task3.sh

#!/bin/bash
#SBATCH --job-name=slurmflow_test_3
#SBATCH --output=slurmflow_test_3.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G

cat data3.txt >> final.txt
cat data4.txt >> final.txt
rm data3.txt data4.txt

env

echo "temporary files removed"

Data Files

data1.txt

1
2
3
4
5

data2.txt

10
11
12
13
14
15

final.txt

10
11
12
13
14
15
1
2
3
4
5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurmflow-0.2.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

slurmflow-0.2.2-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file slurmflow-0.2.2.tar.gz.

File metadata

  • Download URL: slurmflow-0.2.2.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for slurmflow-0.2.2.tar.gz
Algorithm Hash digest
SHA256 b44dcf717f475b26c233e140a9124863dd02bd78d13d7033e6ebb325ee1c8cc2
MD5 db18c501ff76569e6c0efa9d6f9d5c04
BLAKE2b-256 5ef9b396a40fc6dda455810b1f5ca56ed451d2c676d158c3c6182e81ec6d1f6b

See more details on using hashes here.

File details

Details for the file slurmflow-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: slurmflow-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for slurmflow-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 abaafdc39b2ced4dfb1871c9e67f8cb9fdd6fa0d2f52bfaf9fa7af90ee2d16ca
MD5 776a9f5beba0a9430b3d50f95ae21f98
BLAKE2b-256 3bf003c3c83b63ebd3e5262dac829800164291da582666d09c9a2aab0fc46c1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page