DAG interface for SLURM schedulers
Project description
Slurmflow
Slurmflow is a job dependency manager built to interact with slurm clusters! It serves as an abstraction between your data processing and the implementation details of Slurm. Slurmflow represents your tasks as a Directed Acyclic graph, where each node is a sbatch script to be called upon. When you "run" the dag, Slurflow recursively traverses the graph from sink (no outgoing nodes) to the source (no incoming nodes) and submits each script to slurm using the --dependency flag.
Example
Here is a simple example with three scripts (t1, t2, t3), with t3 requiring outputs of the previous two tasks. When creating each Job, all that's needed is a name for the node and the filename of a sbatch script to call. Once your jobs are created, it's possible to use the bitshift operators [<<, >>] to set dependencies. The operator 1 >> 2 denotes that 1 must finish before 2 can run, while 1 << 2 denotes the opposite. If your pipeline allows for certain steps to be run in parallel, you can add them into a list [] and perform the bitshift operations. For a more conventional syntax, you can alseo use the set_upstream or set_downstream methods like so t1.set_downstream(t2). The equivalent notation using bitshifting is t1 >> t2.
Workflow.py
from slurmflow import Job
from slurmflow import DAG
with DAG('simple_slurm_workflow', env=env) as d:
t1 = Job('Task1', 'task1.sh', dag=d)
t2 = Job('Task2', 'task2.sh', dag=d)
t3 = Job('Task3', 'task3.sh', dag=d)
[t1, t2] >> t3
d.run()
d.plot()
Scripts
task1.sh
#!/bin/bash
#SBATCH --job-name=slurmflow_test_1
#SBATCH --output=slurmflow_test_1.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G
cat data1.txt > data4.txt
env
echo "task 1 complete"
task2.sh
#!/bin/bash
#SBATCH --job-name=slurmflow_test_2
#SBATCH --output=slurmflow_test_2.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G
cat data2.txt > data3.txt
env
echo "task 2 complete"
task3.sh
#!/bin/bash
#SBATCH --job-name=slurmflow_test_3
#SBATCH --output=slurmflow_test_3.out
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=1G
cat data3.txt >> final.txt
cat data4.txt >> final.txt
rm data3.txt data4.txt
env
echo "temporary files removed"
Data Files
data1.txt
1
2
3
4
5
data2.txt
10
11
12
13
14
15
final.txt
10
11
12
13
14
15
1
2
3
4
5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file slurmflow-0.2.2.tar.gz
.
File metadata
- Download URL: slurmflow-0.2.2.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b44dcf717f475b26c233e140a9124863dd02bd78d13d7033e6ebb325ee1c8cc2 |
|
MD5 | db18c501ff76569e6c0efa9d6f9d5c04 |
|
BLAKE2b-256 | 5ef9b396a40fc6dda455810b1f5ca56ed451d2c676d158c3c6182e81ec6d1f6b |
File details
Details for the file slurmflow-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: slurmflow-0.2.2-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abaafdc39b2ced4dfb1871c9e67f8cb9fdd6fa0d2f52bfaf9fa7af90ee2d16ca |
|
MD5 | 776a9f5beba0a9430b3d50f95ae21f98 |
|
BLAKE2b-256 | 3bf003c3c83b63ebd3e5262dac829800164291da582666d09c9a2aab0fc46c1b |