Skip to main content

CLI tool to manage SLURM jobs via sacct/squeue/sbatch/scancel

Project description

jobmanager

A command-line tool to manage SLURM jobs via sacct, squeue, sbatch, and scancel.

Requirements

  • Python 3.7+
  • SLURM (sacct, squeue, scontrol, scancel)
  • PyYAML (pip install pyyaml, included as a dependency)

Installation

pip install slurm-jobmanager

Or install directly from the repository:

git clone https://github.com/Dannyzimmer/SlurmJobManager
cd SlurmJobManager
pip install .

Intended workflow

[!Important] jobmanager is designed to run only one job at a time for a given directory. This encourages an organized workflow where each job has its own directory with associated metadata and logs. Running multiple jobs in the same directory is not supported and may lead to incorrect behavior.

Each job lives in its own directory containing a metadata.json file (auto generated via sacct -j <jobid> --json > metadata.json). The typical usage pattern is:

jobs/
├── run_001/
│   ├── metadata.json
│   └── job_logs
│       ├── LOG_run_001_6552064.log
│       └── LOG_run_001_6552081.log
├── run_002/
│   ├── metadata.json
│   └── job_logs
│       ├── LOG_run_001_6552092.log
│       └── LOG_run_001_6552101.log

Navigate to the job directory and run commands without extra flags — jobmanager picks up metadata.json automatically:

cd jobs/run_001
jobmanager status
jobmanager watch

Job identification

All job-specific commands resolve the target job in this order:

  1. --jobid ID — fetch metadata from SLURM on the fly
  2. --metadata FILE — use an explicit metadata file
  3. (default) — use metadata.json in the current directory

Commands

list

List your active jobs (wraps squeue).

jobmanager list
jobmanager list --user alice

status

Print the current state of a job (RUNNING, PENDING, COMPLETED, …).

jobmanager status

summary

Print a full summary: partition, CPUs, memory, elapsed time, state, working directory, etc.

jobmanager summary

elapsed

Print the elapsed wall-clock time of a job.

jobmanager elapsed

watch

Stream the job's stdout in real time until it reaches a terminal state.

jobmanager watch
jobmanager watch --interval 10   # poll every 10 s (default: 5)

stop

Cancel a job (scancel). Prompts for confirmation unless --yes is passed.

jobmanager stop
jobmanager stop --yes

print-request

Print a SLURM batch script header reproducing the resource requests of the job. Redirect to a file to reuse it.

jobmanager print-request > job.sh

resubmit

Resubmit a job using the original submission command recovered from sacct. If the current job is still active, prompts to cancel it first.

jobmanager resubmit

template

Create a run.sh SLURM batch script in the current directory. Job name is set automatically from the directory name.

jobmanager template            # blank placeholders (must be filled before submitting)
jobmanager template --test     # 1 node, 1 task, 1 CPU, 1 GB, 1 min
jobmanager template --quick    # 1 node, 1 task, 2 CPUs, 4 GB, 1 h
jobmanager template --medium   # 1 node, 4 tasks, 8 CPUs, 32 GB, 24 h

If a requirements.txt is present in the current directory, jobmanager template automatically appends a virtualenv setup block to the script:

# module load python/3.11   ← only if module_loads is set in config
python3 -m venv .venv         # or ${TMPDIR}/.venv_${SLURM_JOBID} if tmp_dir_var is set
source .venv/bin/activate
pip install -r requirements.txt

config

Manage the configuration file (~/.config/jobmanager/config.yaml).

jobmanager config init    # create the config file with documented defaults
jobmanager config edit    # open the config file in an interactive editor

Available settings:

Key Default Description
tmp_dir_var (unset) Environment variable holding the node's local temp dir (e.g. TMPDIR, LOCAL_SCRATCH). Used for venv placement.
module_loads [] List of environment modules to load before creating a virtualenv.
log_dir job_logs Directory for SLURM log files (relative to the working directory).
sacct_retries 6 Times to retry sacct when a job is not yet registered in the accounting DB.

submit

Validate and submit a batch script via sbatch. Checks for zero-value resource directives and warns if the current directory's job is still active.

jobmanager submit              # uses run.sh in the current directory
jobmanager submit path/to/script.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_jobmanager-0.2.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_jobmanager-0.2.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file slurm_jobmanager-0.2.1.tar.gz.

File metadata

  • Download URL: slurm_jobmanager-0.2.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for slurm_jobmanager-0.2.1.tar.gz
Algorithm Hash digest
SHA256 28fd3c55ba2155dea781e39bd2e386eb5bdad84ad6754feee75679a45ca962c0
MD5 179ddbd642d9a0d7e3862670ca7d2346
BLAKE2b-256 3e637f41baa98c5ebaa22e3fbec26e18ab6e92dca3af62719be794b8db58ba65

See more details on using hashes here.

File details

Details for the file slurm_jobmanager-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for slurm_jobmanager-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93888add1d486291e824ad9d5f2802e554b6e5aba6e4f2323d8ecef9c7ef6488
MD5 8f3fa27c4ada70dd93a415ca0445c97d
BLAKE2b-256 bfe9bd979bb1b0e42a58dc017b5f6f418557c1898b624ecfea207f231d8170f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page