Skip to main content

A Prefect 3 TaskRunner that submits tasks to SLURM clusters via submitit

Project description

prefect-submitit

A Prefect 3 TaskRunner that submits tasks to SLURM clusters via submitit.

Features

  • Single task submission -- submit individual Prefect tasks as SLURM jobs
  • Job arrays -- submit task.map() calls as SLURM job arrays with automatic chunking when the array exceeds cluster limits
  • Batched execution -- group multiple items per SLURM job with units_per_worker to reduce scheduling overhead
  • Local mode -- swap to local execution for testing without changing your flow code (execution_mode="local" or SLURM_TASKRUNNER_BACKEND=local)
  • Prefect UI integration -- task run names include SLURM job IDs for easy cross-referencing with squeue/sacct

Installation

pip install prefect-submitit

Or with conda (after the first conda-forge release):

conda install -c conda-forge prefect-submitit

Quick Start

from prefect import flow, task
from prefect_submitit import SlurmTaskRunner


@task
def add(x: int, y: int) -> int:
    return x + y


@flow(task_runner=SlurmTaskRunner(partition="cpu", time_limit="00:10:00"))
def my_flow():
    # Single task
    future = add.submit(1, 2)
    print(future.result())  # 3

    # Map over inputs (submitted as a SLURM job array)
    futures = add.map([1, 2, 3], [4, 5, 6])
    print([f.result() for f in futures])  # [5, 7, 9]


if __name__ == "__main__":
    my_flow()

Configuration

SlurmTaskRunner(
    partition="gpu",  # SLURM partition
    time_limit="04:00:00",  # Wall time (HH:MM:SS)
    mem_gb=16,  # Memory per job
    gpus_per_node=1,  # GPUs per job
    units_per_worker=10,  # Items per SLURM job (batched mode)
    execution_mode="slurm",  # "slurm" or "local"
    slurm_array_parallelism=100,  # Max concurrent array tasks
    log_folder="slurm_logs",  # Where submitit writes logs
    fail_on_error=True,  # Raise on SLURM job failure
    poll_interval=2.0,  # Seconds between job status checks
    max_poll_time=3600,  # Max seconds to poll before timing out
    max_array_size=1000,  # Override auto-detected max SLURM array size
)

Any additional keyword arguments are passed through to submitit (e.g. slurm_gres="gpu:a100:1").

Examples

The examples/ directory contains Jupyter notebooks that demonstrate each feature end-to-end on a real SLURM cluster:

Notebook Covers
01_single_task_submission Submitting individual tasks as SLURM jobs
02_job_arrays_with_map task.map() with automatic job array chunking
03_batched_execution Grouping items per job with units_per_worker
04_error_handling_and_cancellation Failure propagation and job cancellation
05_local_mode_and_development Local execution mode for dev/testing

To run them: install dependencies, register the Jupyter kernel, and start the Prefect server (see Development below), then open any notebook and select the Prefect-Submitit kernel.

Integration Tests

The test suite includes SLURM integration tests that submit real jobs to verify the runner works on your cluster. Use these to validate a new deployment:

pixi run -e dev test-slurm

Tests cover single submission, job arrays, batched execution, cancellation, failure handling, polling, and environment propagation. They are marked with @pytest.mark.slurm and skipped unless --run-slurm is passed.

Local Testing

Set the environment variable to skip SLURM entirely:

export SLURM_TASKRUNNER_BACKEND=local

Or pass it directly:

SlurmTaskRunner(execution_mode="local")

Development

Requires Pixi:

pixi install
pixi run -e dev fmt
pixi run -e dev test

Prefect Server

The repo includes a prefect-server CLI to run a local Prefect server backed by PostgreSQL (handles SLURM concurrency better than SQLite). The server uses a UID-based port to avoid conflicts on shared nodes.

pixi run prefect-start   # Start in background (PostgreSQL + Prefect)
pixi run prefect-stop    # Stop the server

The CLI automatically:

  • Initializes PostgreSQL on first run (stored in ~/.prefect-submitit/postgres/)
  • Picks a UID-based port (range 4200-4999) to avoid conflicts
  • Uses the node's FQDN so SLURM workers can reach it (falls back to IP if FQDN is unresolvable)
  • Writes a discovery file to ~/.prefect-submitit/server.json
  • Tunes connection pool sizes for high-concurrency SLURM workloads

Direct CLI

prefect-server start [--bg] [--sqlite] [--restart] [--port N] [--pg-port N]
prefect-server stop [-f]
prefect-server status
prefect-server init-db [--reset]
  • --sqlite uses SQLite instead of PostgreSQL
  • --restart stops any existing server before starting
  • start is idempotent — skips if the server is already healthy

Server discovery

Workers resolve the Prefect API URL in this order: PREFECT_SUBMITIT_SERVER env var → PREFECT_API_URL env var → discovery file (auto-written by prefect-server start).

IDE Setup (VSCode)

Python Interpreter: Set the Pixi environment as your VSCode Python interpreter:

pixi run which python
# Example output: /home/user/prefect-submitit/.pixi/envs/default/bin/python

In VSCode: Ctrl+Shift+P → "Python: Select Interpreter" → paste the path.

Jupyter Kernel: Register the Pixi environment as a Jupyter kernel so notebooks use the correct packages:

pixi run install-kernel

In VSCode: open a .ipynb file → click "Select Kernel" → choose Prefect-Submitit.

Tip: If the Jupyter kernel takes 30+ seconds to start in VS Code, the Python Environments extension (ms-python.vscode-python-envs) is likely the cause. Uninstall it — the core Python extension works fine without it. Tracked upstream: microsoft/vscode-python#25804

License

BSD 3-Clause. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_submitit-0.1.4.tar.gz (61.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_submitit-0.1.4-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file prefect_submitit-0.1.4.tar.gz.

File metadata

  • Download URL: prefect_submitit-0.1.4.tar.gz
  • Upload date:
  • Size: 61.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prefect_submitit-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c4eec8349c5725c3ba0f717f3dd6276bab70ebf40a3e6af2d1e52193b2be1096
MD5 c93d6580cd1cdebce52a1141103e89df
BLAKE2b-256 f5513a632d7035c8428f0f428ef239975b1f9eed079f7351a789fe7db4ea5a7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for prefect_submitit-0.1.4.tar.gz:

Publisher: release.yml on dexterity-systems/prefect-submitit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prefect_submitit-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for prefect_submitit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e3478b269dc0a954b89aa3e6e086eb75698183886e31ec6e3924eaef15b3522f
MD5 222227b7a191f23505888c7ff9cc4cca
BLAKE2b-256 a422e6f0bea02600ebb91386dd715522cbe8f6f549eb18578070d92a4d72cdc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for prefect_submitit-0.1.4-py3-none-any.whl:

Publisher: release.yml on dexterity-systems/prefect-submitit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page