Pythonic SDK for Slurm.
Project description
Python Slurm SDK
A developer-friendly SDK to define, submit, and manage Slurm jobs in Python with native array job support, fluent dependency APIs, and container packaging.
Quick Start
Simple Task with Dependencies
from slurm import Cluster, task
@task(time="00:01:00", mem="1G")
def preprocess(dataset: str) -> str:
return f"processed_{dataset}"
@task(time="00:05:00", mem="4G", cpus_per_task=4)
def train_model(data: str, lr: float) -> dict:
return {"model": f"trained_on_{data}", "lr": lr, "accuracy": 0.95}
@task(time="00:01:00", mem="1G")
def evaluate(model: dict) -> str:
return f"Model accuracy: {model['accuracy']}"
if __name__ == "__main__":
# Local backend for offline development
with Cluster(backend_type="local") as cluster:
# Fluent dependency API with .after()
prep_job = preprocess("dataset.csv")
train_job = train_model.after(prep_job)(data=prep_job, lr=0.001)
eval_job = evaluate.after(train_job)(model=train_job)
eval_job.wait()
print(eval_job.get_result())
Workflow Orchestration
Workflows run on the cluster and orchestrate multiple tasks, enabling complex pipelines that survive preemption and scheduling delays:
from slurm import Cluster, task, workflow
from slurm.workflow import WorkflowContext
@task(time="00:05:00", mem="2G")
def train_epoch(epoch: int, data_path: str) -> dict:
return {"epoch": epoch, "loss": 1.0 / (epoch + 1), "checkpoint": f"ckpt_{epoch}.pt"}
@task(time="00:02:00", mem="1G")
def evaluate(checkpoint: str) -> float:
return 0.85 + (int(checkpoint.split("_")[1].split(".")[0]) * 0.02)
@workflow(time="01:00:00", mem="512M")
def training_pipeline(epochs: int, data_path: str, ctx: WorkflowContext) -> dict:
"""Multi-epoch training with evaluation after each epoch."""
results = []
for epoch in range(epochs):
# Submit training job
train_job = train_epoch(epoch=epoch, data_path=data_path)
train_result = train_job.get_result()
# Submit eval after training completes
eval_job = evaluate.after(train_job)(checkpoint=train_result["checkpoint"])
accuracy = eval_job.get_result()
results.append({"epoch": epoch, "loss": train_result["loss"], "accuracy": accuracy})
return {"results": results, "best_accuracy": max(r["accuracy"] for r in results)}
if __name__ == "__main__":
cluster = Cluster(
backend_type="ssh",
hostname="login.hpc.example.com",
username="myuser",
)
with cluster:
# The workflow itself runs as a job on the cluster
job = training_pipeline(epochs=3, data_path="/data/train")
job.wait()
print(job.get_result())
Array Jobs for Parallel Processing
from slurm import Cluster, task
@task(time="00:02:00", mem="2G", cpus_per_task=2)
def process_chunk(chunk_id: int, start: int, end: int) -> dict:
return {"chunk": chunk_id, "sum": sum(range(start, end))}
@task(time="00:01:00", mem="1G")
def aggregate(results: list) -> int:
return sum(r["sum"] for r in results)
if __name__ == "__main__":
with Cluster(backend_type="local") as cluster:
# Map task over items to create native SLURM array job
chunks = [
{"chunk_id": i, "start": i * 1000, "end": (i + 1) * 1000}
for i in range(10)
]
array_job = process_chunk.map(chunks)
# Aggregate results from array job
final = aggregate.after(array_job)(results=array_job.get_results())
final.wait()
print(f"Total: {final.get_result()}")
SSH Backend with Container Packaging
from slurm import Cluster, task
@task(
time="00:10:00",
mem="8G",
packaging="container",
packaging_platform="linux/amd64",
packaging_push=True,
packaging_registry="myregistry.io/myproject/",
)
def compute_intensive_task(n: int) -> float:
import numpy as np
return np.mean(np.random.random(n))
if __name__ == "__main__":
cluster = Cluster(
backend_type="ssh",
hostname="login.hpc.example.com",
username="myuser",
)
with cluster:
job = compute_intensive_task(1_000_000)
job.wait()
print(job.get_result())
Documentation
- Contributing Guide - Development setup, testing, and contribution guidelines
- AGENTS.md - Guide for AI agents working with this codebase
- Changelog - Release history and changes
- examples/ - Complete working examples including parallelization patterns
Running on an ARM Mac (Apple Silicon)
You can build and publish x86_64 (``linux/amd64```) images from an ARM Mac by
enabling QEMU emulation inside your container runtime. The example below shows
the required one-time setup for Podman; Docker Desktop users can follow a
similar flow using docker buildx (no additional configuration usually needed).
-
Install QEMU locally. This provides the binary emulators that Podman will load into its virtual machine:
brew install qemu
-
Make sure the Podman machine is created and running. If you have not initialised it yet:
podman machine init --now
-
Register the QEMU static binaries inside the Podman machine so it can run amd64 containers. This step reboots the VM and only needs to be performed once (repeat if you recreate the machine):
podman machine ssh sudo rpm-ostree install qemu-user-static podman machine ssh sudo systemctl reboot
-
After the machine restarts, confirm cross-building works:
podman build --platform linux/amd64 -t test-amd64 . podman run --rm --platform linux/amd64 test-amd64 uname -m
The final command should print
x86_64even though you are on an ARM host.
With QEMU registered you can now build the example container in this project
using the provided Slurmfile (platform = "linux/amd64") and push it to your
registry before submitting jobs to an x86 cluster.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slurm_sdk-0.4.5.tar.gz.
File metadata
- Download URL: slurm_sdk-0.4.5.tar.gz
- Upload date:
- Size: 452.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fef4e2edc312b4435d8298e1346da655f4c36618fed27e8bfb17c33e37f6e2da
|
|
| MD5 |
5b74a294b7f1c9725fd7dea3de782f3a
|
|
| BLAKE2b-256 |
63e11b05c082991c68cdd04ba679f11c4f6e3db85a0639686852aa981a79fd68
|
Provenance
The following attestation bundles were made for slurm_sdk-0.4.5.tar.gz:
Publisher:
publish.yml on ville-k/slurm_sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurm_sdk-0.4.5.tar.gz -
Subject digest:
fef4e2edc312b4435d8298e1346da655f4c36618fed27e8bfb17c33e37f6e2da - Sigstore transparency entry: 924248149
- Sigstore integration time:
-
Permalink:
ville-k/slurm_sdk@100e1c69161621ddbbc868bd182b6c0d7de70b8e -
Branch / Tag:
refs/tags/v0.4.5 - Owner: https://github.com/ville-k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@100e1c69161621ddbbc868bd182b6c0d7de70b8e -
Trigger Event:
release
-
Statement type:
File details
Details for the file slurm_sdk-0.4.5-py3-none-any.whl.
File metadata
- Download URL: slurm_sdk-0.4.5-py3-none-any.whl
- Upload date:
- Size: 295.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bb1e44719b62a91e2faf2e059fa4266d2951eeadca7207d29d20442a17bcc20
|
|
| MD5 |
81e3f227716bf214240151fdb7341d02
|
|
| BLAKE2b-256 |
4066c9f3622355c66ed60621b6a019ee94200c0ad959d9ccb2cabda1d9c41eec
|
Provenance
The following attestation bundles were made for slurm_sdk-0.4.5-py3-none-any.whl:
Publisher:
publish.yml on ville-k/slurm_sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurm_sdk-0.4.5-py3-none-any.whl -
Subject digest:
3bb1e44719b62a91e2faf2e059fa4266d2951eeadca7207d29d20442a17bcc20 - Sigstore transparency entry: 924248244
- Sigstore integration time:
-
Permalink:
ville-k/slurm_sdk@100e1c69161621ddbbc868bd182b6c0d7de70b8e -
Branch / Tag:
refs/tags/v0.4.5 - Owner: https://github.com/ville-k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@100e1c69161621ddbbc868bd182b6c0d7de70b8e -
Trigger Event:
release
-
Statement type: