Yeet code at Slurm clusters. A Modal-like abstraction for multi-cluster Slurm job submission.
Project description
yeet — Yeet Code at Slurm Clusters
A Modal-like abstraction for submitting jobs to multiple Slurm clusters over SSH.
Problem
Managing 3+ Slurm clusters with different filesystems, GPUs, partitions, and configurations is painful. You want to write code and yeet it at a cluster without caring about sbatch scripts, SSH sessions, or rsync incantations.
Design Principles
- Submit and forget: fire off a function or script, get results back later.
- Resource-aware routing: say what you need (GPU type, memory), yeet picks the right cluster.
- Explicit override: always allow forcing a specific cluster.
- No serialization magic: send source code, not pickled objects. Avoids import/path hell.
- uv-native: auto-sync
pyproject.toml+uv.lock, runuv syncon remote before execution. - Volume abstraction: name cluster-local paths (datasets, checkpoints), reference by logical name.
- Direct cluster-to-cluster sync: for large data, rsync directly between clusters when possible.
Architecture
Built on top of SlurmPilot for SSH, sbatch generation, and multi-cluster support. yeet adds:
- Decorator + explicit submission APIs
- Resource-aware cluster routing
- Volume abstraction for data paths
- Function source extraction (no pickle)
- Auto uv environment sync
- Cross-cluster volume sync with smart routing
- Rich CLI with progress bars
Package Structure
yeet/
├── __init__.py # Public API exports
├── config.py # Cluster config loading (~/.yeet/clusters/*.yaml)
├── decorator.py # @run decorator → RemoteFunction
├── job.py # Job class: status, logs, download, cancel
├── router.py # Match resource hints → best cluster
├── remote.py # Wrapper around SlurmPilot for submission
├── serializer.py # Function source extraction → remote script
├── sync.py # rsync with progress (local↔remote and cluster↔cluster)
├── volume.py # Volume path resolution
├── cli.py # CLI commands
└── py.typed
tests/
├── test_config.py
├── test_serializer.py
├── test_router.py
└── test_volume.py
pyproject.toml
Cluster Configuration
Each cluster is defined in ~/.yeet/clusters/<name>.yaml:
name: sprint
host: sprint.uni.de
user: dariush
partitions:
gpu:
gpus: [a100]
max_memory: 256G
max_time: "72:00:00"
cpu:
gpus: []
max_memory: 128G
max_time: "168:00:00"
volumes:
datasets: /scratch/dariush/datasets
checkpoints: /scratch/dariush/checkpoints
models: /scratch/dariush/models
remote_dir: /scratch/dariush/yeet_jobs
python: uv
setup_commands:
- "module load cuda/12.1"
# Which other clusters this cluster can SSH to (aliases from its ~/.ssh/config)
reachable:
cispa: cispa
jureca: jureca
API
Decorator API (Modal-style)
from yeetjobs import run, Volume
@run(gpu="a100", memory="32G", time="4:00:00")
def train(lr: float = 0.001):
import torch
data = Volume("datasets") / "imagenet"
out = Volume("checkpoints")
# ... training ...
torch.save(model, out / "model.pt")
job = train.submit(lr=0.0003) # auto-routes to cluster with A100s
job = train.submit(lr=0.0003, cluster="sprint") # explicit cluster
Explicit API (for scripts)
from yeetjobs import submit
job = submit(
"train.py",
args=["--lr", "0.001"],
gpu="a100",
sync_dir="./src",
time="4:00:00",
)
Job Management
job.status() # PENDING / RUNNING / COMPLETED / FAILED
job.logs() # stdout + stderr
job.download("checkpoints", "*.pt", "./results/") # rsync artifacts back
job.cancel()
Volume Sync Between Clusters
from yeetjobs import sync
sync(from_cluster="sprint", to_cluster="cispa", volume="checkpoints", pattern="run_42/")
Sync logic:
- If source can reach destination → SSH into source, push via rsync
- If destination can reach source → SSH into destination, pull via rsync
- If neither → relay through local machine (download + upload)
CLI
yeet ls # all jobs across all clusters
yeet status <job_id> # job status
yeet logs <job_id> # stdout/stderr
yeet cancel <job_id> # cancel job
yeet clusters # show clusters + capabilities
yeet upload <local_path> <volume> --cluster X # upload data to cluster
yeet download <job_id> <remote_path> <local> # download artifacts
yeet sync --from X --to Y --volume V [--pattern P]
How It Works Under the Hood
@rundecorator → creates aRemoteFunctioncapturing resource hints.submit()→ router checks hints against cluster configs, picks best match- Code sync → rsyncs project dir to
{remote_dir}/{job_name}/on chosen cluster - uv sync → rsyncs
pyproject.toml+uv.lock, runsuv syncin sbatch preamble - Script generation → extracts function source, writes wrapper
.pywith Volume resolution and argument injection - Submission → SlurmPilot handles SSH → sbatch → returns job ID
- Monitoring → Job object wraps SlurmPilot's status/log retrieval over SSH
- Artifacts →
job.download()rsyncs files back;yeet syncmoves between clusters
Implementation Order
| # | Step | Complexity |
|---|---|---|
| 1 | Project scaffolding (pyproject.toml, package structure) | Low |
| 2 | Config system (YAML loading, validation, cluster registry) | Medium |
| 3 | Volume (path-like object, runtime resolution) | Low |
| 4 | Sync — local↔remote (rsync wrapper with rich progress) | Medium |
| 5 | Sync — cluster↔cluster (direct rsync via SSH, with fallback) | Medium |
| 6 | Router (match gpu/memory/time hints to cluster+partition) | Medium |
| 7 | Serializer (function source extraction → executable script) | Medium |
| 8 | Remote (SlurmPilot wrapper: configure, submit, status, logs) | Medium |
| 9 | Decorator API (@run → RemoteFunction → .submit()) | Medium |
| 10 | Explicit submit API (submit script with args) | Low |
| 11 | Job class (status, logs, download, cancel) | Medium |
| 12 | CLI (click-based, all commands) | Medium |
| 13 | Tests (config, serializer, router, volume, sync logic) | Medium |
Dependencies
slurmpilot— SSH, sbatch generation, multi-cluster, job statusclick— CLI frameworkpyyaml— config parsingrich— progress bars, nice terminal output
Not in v0.1
- Multi-GPU / multi-node jobs
- Auto-retry on preemption / checkpointing
- Job arrays / hyperparameter sweeps
- Web dashboard
- Continuous sync / file watching
- Async job waiting / callbacks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yeetjobs-0.1.0.tar.gz.
File metadata
- Download URL: yeetjobs-0.1.0.tar.gz
- Upload date:
- Size: 85.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b4b6b3700d0bcefdad19593c3c046e568c17c15f36e3b5a915352da7bc065af
|
|
| MD5 |
d23116529893153e66038525fa8e6d94
|
|
| BLAKE2b-256 |
e1913400e6a86927c702e099ef2b7aba56d3483cca6508b5a1dc9f0e9283d9a2
|
Provenance
The following attestation bundles were made for yeetjobs-0.1.0.tar.gz:
Publisher:
publish.yml on dwahdany/yeet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yeetjobs-0.1.0.tar.gz -
Subject digest:
6b4b6b3700d0bcefdad19593c3c046e568c17c15f36e3b5a915352da7bc065af - Sigstore transparency entry: 986788877
- Sigstore integration time:
-
Permalink:
dwahdany/yeet@bf7e475c051a21ef3f30cb85ac588172ba26f36b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dwahdany
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf7e475c051a21ef3f30cb85ac588172ba26f36b -
Trigger Event:
release
-
Statement type:
File details
Details for the file yeetjobs-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yeetjobs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
030e53c7c1ac1784521a50c1940beeb7c716b472daa93ec86748357ec5e2869e
|
|
| MD5 |
1b86baad427f7964e68c3665036aa329
|
|
| BLAKE2b-256 |
c97b70ac8ecd55c93623e85be3f04482fc5dc3bdb1ab2dcf4ebcb7f101fbd77f
|
Provenance
The following attestation bundles were made for yeetjobs-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dwahdany/yeet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yeetjobs-0.1.0-py3-none-any.whl -
Subject digest:
030e53c7c1ac1784521a50c1940beeb7c716b472daa93ec86748357ec5e2869e - Sigstore transparency entry: 986788936
- Sigstore integration time:
-
Permalink:
dwahdany/yeet@bf7e475c051a21ef3f30cb85ac588172ba26f36b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dwahdany
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf7e475c051a21ef3f30cb85ac588172ba26f36b -
Trigger Event:
release
-
Statement type: