Skip to main content

A Prefect 3 TaskRunner that submits tasks to SLURM clusters via submitit

Project description

prefect-submitit

A Prefect 3 TaskRunner that submits tasks to SLURM clusters via submitit.

Features

  • sbatch submission -- submit individual tasks as SLURM jobs and task.map() calls as job arrays, with automatic chunking when arrays exceed cluster limits
  • srun submission -- run tasks as srun steps within an existing allocation (salloc), avoiding per-task scheduling overhead
  • Batched execution -- group multiple items per SLURM job with units_per_worker to reduce scheduling overhead (works with both sbatch and srun modes)
  • Local mode -- swap to local execution for development without changing your flow code
  • Prefect UI integration -- task run names include SLURM job IDs for easy cross-referencing with squeue/sacct

Requirements

  • Python >= 3.12
  • Prefect >= 3.6, < 4.0
  • SLURM cluster (for sbatch/srun modes) or Docker (for local development)

Installation

pip install prefect-submitit

With pixi:

pixi add prefect-submitit

With conda:

conda install -c conda-forge prefect-submitit

Quick start

from prefect import flow, task
from prefect_submitit import SlurmTaskRunner


@task
def add(x: int, y: int) -> int:
    return x + y


@flow(task_runner=SlurmTaskRunner(partition="cpu", time_limit="00:10:00"))
def my_flow():
    # Single task
    future = add.submit(1, 2)
    print(future.result())  # 3

    # Map over inputs (submitted as a SLURM job array)
    futures = add.map([1, 2, 3], [4, 5, 6])
    print([f.result() for f in futures])  # [5, 7, 9]


if __name__ == "__main__":
    my_flow()

Execution modes

The runner supports three execution modes, selected via the execution_mode parameter or the SLURM_TASKRUNNER_BACKEND environment variable (slurm, srun, or local):

Mode Dispatch Requires Best for
slurm sbatch SLURM access Batch workloads
srun srun Active allocation (SLURM_JOB_ID) Interactive / low-latency
local None Nothing Development and testing

slurm (default)

Each .submit() becomes a SLURM job via sbatch. Each .map() becomes a job array with automatic chunking when the array exceeds cluster limits.

SlurmTaskRunner(execution_mode="slurm", partition="gpu", gpus_per_node=1)

srun

Runs tasks as srun steps inside an existing SLURM allocation. Requires a prior salloc or sbatch session with SLURM_JOB_ID set. Avoids per-task scheduling overhead, making it suited for many small tasks sharing a single allocation.

salloc -N2 --mem=32G --time=02:00:00 -- python my_flow.py
SlurmTaskRunner(execution_mode="srun", mem_gb=4, cpus_per_task=2)

local

Runs tasks locally via submitit's LocalExecutor. SLURM parameters are ignored. For development and testing without a cluster.

SlurmTaskRunner(execution_mode="local")

Or via environment variable:

export SLURM_TASKRUNNER_BACKEND=local

Configuration

Parameter Default Description
partition "cpu" SLURM partition
time_limit "01:00:00" Wall time (HH:MM:SS)
mem_gb 4 Memory per job in GB
gpus_per_node 0 GPUs per job
cpus_per_task 1 CPUs per task
units_per_worker 1 Items per SLURM job (>1 enables batched execution)
slurm_array_parallelism 1000 Max concurrent array tasks
execution_mode None "slurm", "srun", or "local". Falls back to SLURM_TASKRUNNER_BACKEND env var, then "slurm"
poll_interval mode-dependent Seconds between status checks (slurm=5.0, srun=0.5, local=1.0)
max_poll_time None Max seconds to poll before timing out. Default: time_limit × 2
log_folder "slurm_logs" Directory for submitit logs
fail_on_error True Raise on SLURM job failure
max_array_size None Override auto-detected cluster MaxArraySize
srun_launch_concurrency 128 Max concurrent srun steps (srun mode only)

Additional keyword arguments are passed through to submitit (e.g. slurm_gres="gpu:a100:1").

Examples

The examples/ directory contains Jupyter notebooks demonstrating each feature on a real SLURM cluster:

Notebook Covers
01_single_task_submission Submitting individual tasks as SLURM jobs
02_job_arrays_with_map task.map() with automatic job array chunking
03_batched_execution Grouping items per job with units_per_worker
04_error_handling_and_cancellation Failure propagation and job cancellation
05_local_mode_and_development Local execution mode for dev/testing
slurm_submit_and_run.py Minimal script for the Docker SLURM environment

To run the notebooks: install dependencies, register the Jupyter kernel, and start the Prefect server (see Development below), then open any notebook and select the Prefect-Submitit kernel.

Prefect server

The repo includes a prefect-server CLI to run a local Prefect server backed by PostgreSQL (handles SLURM concurrency better than SQLite). The server uses UID-based port allocation to avoid conflicts on shared nodes.

pixi run prefect-start   # Start in background (PostgreSQL + Prefect)
pixi run prefect-stop    # Stop the server

The CLI automatically:

  • Initializes PostgreSQL on first run (stored in ~/.prefect-submitit/postgres/)
  • Picks UID-based ports: Prefect on even ports (4200--5798), PostgreSQL on odd ports (5433--7031)
  • Uses the node's FQDN so SLURM workers can reach it (falls back to IP if FQDN is unresolvable)
  • Writes a discovery file to ~/.prefect-submitit/server.json
  • Tunes connection pool sizes for high-concurrency SLURM workloads

Direct CLI

prefect-server start [--bg] [--sqlite] [--restart] [--port N] [--pg-port N]
prefect-server stop [-f]
prefect-server status
prefect-server init-db [--reset]
  • --sqlite uses SQLite instead of PostgreSQL
  • --restart stops any existing server before starting
  • start is idempotent -- skips if the server is already healthy

Server discovery

Workers resolve the Prefect API URL in this order: PREFECT_SUBMITIT_SERVER env var → PREFECT_API_URL env var → discovery file (auto-written by prefect-server start).

Development

Requires pixi:

pixi install
pixi run -e dev fmt       # Format and lint
pixi run -e dev test      # Run unit tests

Docker SLURM environment

A containerized single-node SLURM cluster is included for development and integration testing without access to a real cluster:

pixi run slurm-build     # Build the Docker image
pixi run slurm-up        # Start the SLURM container
pixi run slurm-shell     # Shell into the running container
pixi run slurm-down      # Stop and remove the container

See docker/README.md for details.

Integration tests

Integration tests submit real SLURM jobs and are gated behind --run-slurm. Tests are split by submission mode:

# sbatch tests (standard SLURM submission)
pixi run -e dev test-sbatch          # On a real cluster
pixi run -e dev test-sbatch-docker   # In the Docker environment

# srun tests (within an allocation)
pixi run -e dev test-srun            # On a real cluster (wraps in salloc)
pixi run -e dev test-srun-docker     # In the Docker environment (wraps in salloc)

Tests cover single submission, job arrays, batched execution, cancellation, failure handling, polling, and environment propagation.

IDE setup (VS Code)

Python interpreter: Set the pixi environment as your VS Code Python interpreter:

pixi run which python
# Example: /home/user/prefect-submitit/.pixi/envs/default/bin/python

In VS Code: Ctrl+Shift+P → "Python: Select Interpreter" → paste the path.

Jupyter kernel: Register the pixi environment as a Jupyter kernel so notebooks use the correct packages:

pixi run install-kernel

In VS Code: open a .ipynb file → click "Select Kernel" → choose Prefect-Submitit.

License

BSD 3-Clause. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_submitit-0.1.7.tar.gz (80.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_submitit-0.1.7-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file prefect_submitit-0.1.7.tar.gz.

File metadata

  • Download URL: prefect_submitit-0.1.7.tar.gz
  • Upload date:
  • Size: 80.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prefect_submitit-0.1.7.tar.gz
Algorithm Hash digest
SHA256 7cc0d61d6e1a1270cc15a218836c09934962a7466276b9f753b23c2c8a77afd2
MD5 bbd03c2695a5457537d2a8335d3973da
BLAKE2b-256 2c94f3cc88d06ee8208c74f0029d9c79d49e2d3d2cc075e2d82e693eaa80db9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for prefect_submitit-0.1.7.tar.gz:

Publisher: release.yml on dexterity-systems/prefect-submitit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prefect_submitit-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_submitit-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1af5820d91dd5cd2e3dea39f64706b404ddf4105c1a6fce3c5a60f1eb2310ed4
MD5 7a5ca933e9e5f18d848ca8105751f476
BLAKE2b-256 62d35adbc835481c8b04acfaedbb34bf2425309f2813d0600215fdc2d880769f

See more details on using hashes here.

Provenance

The following attestation bundles were made for prefect_submitit-0.1.7-py3-none-any.whl:

Publisher: release.yml on dexterity-systems/prefect-submitit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page