Slurm job workflow management
Project description
srunx
A unified CLI, web dashboard, and Python API for SLURM job management.
Stop juggling sbatch scripts, squeue loops, and SSH sessions.
- Submit & manage SLURM jobs from CLI, browser, or Python
- Orchestrate multi-step workflows with YAML and dependency graphs
- Monitor GPU availability and job states with Slack notifications
- SSH remote — submit jobs, sync files, and browse remote clusters from your laptop
- Container-native — Pyxis, Apptainer, and Singularity support built in
Installation
Requires Python 3.12+ and access to a SLURM cluster (local or via SSH).
uv add srunx
For the web dashboard:
uv add "srunx[web]"
Quick Start
# Submit a job
srunx submit python train.py --name training --gpus-per-node 2 --conda ml_env
# Check status and resources
srunx list --show-gpus
srunx resources
# Run a YAML workflow
srunx flow run workflow.yaml
CLI
| Command | Description |
|---|---|
srunx submit |
Submit a SLURM job |
srunx status |
Check job status |
srunx list |
List jobs in queue |
srunx cancel |
Cancel a job |
srunx logs |
View / stream job logs |
srunx resources |
Display GPU availability |
srunx monitor |
Monitor jobs, resources, or cluster |
srunx flow |
Run / validate YAML workflows |
srunx ssh |
Remote SLURM operations over SSH |
srunx history |
Show job execution history |
srunx report |
Generate job execution report |
srunx config |
Manage configuration |
srunx template |
Manage job templates |
srunx ui |
Launch the web dashboard |
Web Dashboard
A dashboard for visual cluster management. Connect to your SLURM cluster over SSH and manage jobs, workflows, and resources from a browser.
srunx ui # -> http://127.0.0.1:8000
srunx ui --port 3000 # custom port
Jobs — Browse, search, filter, and cancel jobs.
Workflow DAG — Visualize job dependencies. Run workflows directly from the UI.
Resources — GPU and node availability per partition.
Explorer — Browse remote files via SSH mounts. Shell scripts can be submitted as sbatch jobs directly from the file tree.
Workflow Orchestration
Define pipelines in YAML with dependency graphs and Jinja2-parameterized variables:
name: experiment
args:
model: "bert-base-uncased"
output_dir: "/outputs/{{ model }}"
jobs:
- name: preprocess
command: ["python", "preprocess.py"]
nodes: 1
- name: train
command: ["python", "train.py", "--model", "{{ model }}"]
depends_on: [preprocess]
gpus_per_node: 2
conda: ml_env
- name: evaluate
command: ["python", "eval.py", "--output", "{{ output_dir }}"]
depends_on: [train]
Jobs run as soon as their dependencies complete — independent branches execute in parallel automatically.
argswith Jinja2 templates for reusable, parameterized pipelines- Retry support with configurable delay
- Dry-run mode and partial execution (
--from,--to,--job)
Monitoring
# Monitor a job until completion
srunx monitor jobs 12345
# Wait for GPUs, then submit
srunx monitor resources --min-gpus 4
srunx submit python train.py --gpus-per-node 4
# Periodic cluster reports to Slack
srunx monitor cluster --schedule 1h --notify $SLACK_WEBHOOK
Remote SSH
Keep your local editor workflow while running on the cluster:
# Submit to remote cluster
srunx ssh submit train.py --host dgx-server
# Manage connection profiles
srunx ssh profile add myserver --ssh-host dgx1
# Map local directories to remote and sync with rsync
srunx ssh profile mount add myserver workspace \
--local ~/projects/ml-exp --remote /home/user/ml-exp
srunx ssh sync
- SSH config hosts, saved profiles, and proxy jump support
- Environment variable passthrough (
--env KEY=VALUE,--env-local WANDB_API_KEY) - File sync via rsync — auto-detects profile from current directory
Slack Notifications
srunx flow run workflow.yaml --slack
Python API
from srunx import Job, JobResource, JobEnvironment, Slurm
job = Job(
name="training",
command=["python", "train.py"],
resources=JobResource(nodes=1, gpus_per_node=2, time_limit="4:00:00"),
environment=JobEnvironment(conda="ml_env"),
)
client = Slurm()
completed = client.run(job) # submit and wait for completion
Why srunx?
Tools like submitit and simple-slurm handle job submission, and workflow engines like Snakemake or Nextflow handle pipelines. srunx covers both — plus monitoring, SSH remote access, a web dashboard, and container support — in a single, lightweight package. If you want one tool that covers the full SLURM workflow without heavyweight infrastructure, srunx is a good fit.
Documentation
Full documentation at ksterx.github.io/srunx.
Development
git clone https://github.com/ksterx/srunx.git
cd srunx
uv sync --dev
uv run pytest
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file srunx-1.0.1.tar.gz.
File metadata
- Download URL: srunx-1.0.1.tar.gz
- Upload date:
- Size: 606.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12f9a06bdffec3b4c60dec35fb25cfb1485a61a046b319df6bad1e6261767027
|
|
| MD5 |
53b3e6d7bbe43c087ddf8974c2f94597
|
|
| BLAKE2b-256 |
c55bb05ff841de61fa3d82875e61dbc664ed618e6189ff13eb234bd15739d890
|
File details
Details for the file srunx-1.0.1-py3-none-any.whl.
File metadata
- Download URL: srunx-1.0.1-py3-none-any.whl
- Upload date:
- Size: 540.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36bad8fea689cd4840aef17e45054ec8acffa2862a70906ca4b1d6414566f3ad
|
|
| MD5 |
bad55a25c0f6f7a227586a3c7d6b48cf
|
|
| BLAKE2b-256 |
be1469d8cf4b535338cc97a99d305b7fdf0d3ac3a99c5677021abe45277195c2
|