Skip to main content

Render + policy library for SLURM JobSpecs (Mila + DRAC clusters)

Project description

salvo

Policy and render library for SLURM. salvo turns a JobSpec into byte-stable sbatch text and decides what to do when a job hits OOM or gets preempted. It does not move code to the cluster, does not open SSH connections, and is not a runner. For transport, pair it with cluv or invoke sbatch directly.

Install

pip install pysalvo

Two CLI commands ship for diagnostics and one-shot rendering. Everything else is meant to be imported.

salvo doctor                                 # topology, ssh alias hygiene, manifest freshness
salvo render spec.yaml --cluster mila        # JobSpec YAML to sbatch text on stdout

Render

from salvo import JobSpec, render

spec = JobSpec(
    name="train",
    cmd=["python", "train.py"],
    gpus=1, cpus=8, mem="32G", time="2h",
    on_oom=["bump_mem(1.5x, max=128G)", "fail"],
)

sbatch_text = render(
    spec,
    cluster_id="mila",
    account="mila",
    partition="unkillable",
)  # str, byte-stable, no side effects

Same JobSpec, same cluster, same inputs, same bytes every run. Useful when you need to audit what was actually submitted six months later.

If you leave account and partition off, salvo picks them via salvo.dispatch by shelling out to squeue for live capacity. That path only works on a SLURM login node; library callers (cluv, xgenius) should pass both explicitly to keep render() pure.

OOM policy

from salvo.policy import parse, apply_oom, OomContext

steps = parse(["bump_mem(1.5x, max=128G)", "escalate_partition", "fail"])

new_spec, action = apply_oom(prev_spec, OomContext(kind="cpu", max_rss_mb=33_500))
# new_spec is a fresh JobSpec with bumped mem, or None if the policy says fail

The DSL is intentionally small. Steps run in order until one applies:

  • bump_mem(<factor>x, max=<size>) — multiplicative bump, capped
  • escalate_partition — clear partition so the next render picks a larger tier
  • fail — terminal
  • bump_gpus(...), callback(...) — parse today, execute in a later release

Cluster topology

Five presets ship in salvo/topology/presets/: mila, rorqual, narval, beluga, cedar. More DRAC clusters land as YAMLs are contributed. Each YAML lists accounts, partitions, and capacity rules.

from salvo.topology import load_preset, list_presets

list_presets()                      # ['beluga', 'cedar', 'mila', 'narval', 'rorqual']
cluster = load_preset("mila")       # ClusterTopology

Contributing a new cluster is one YAML file. See CONTRIBUTING.md.

Decorator (optional, Python-native)

If you'd rather write a function than a YAML spec:

from salvo import cluster

@cluster.submit(gpus=1, cpus=8, mem="32G", time="2h",
                on_oom=["bump_mem(1.5x, max=128G)", "fail"])
def train(seed: int):
    ...

handle = train.submit(seed=42)

train.submit(...) runs the local-sbatch convenience path: it renders, calls sbatch, and returns a handle. Skip the decorator if you're embedding salvo inside another tool's submit flow; just call render() and let that tool do the submission.

Use with cluv

cluv handles SSH, code sync, and the sbatch call. salvo handles the policy. The natural composition is cluv importing salvo's policy library when a user opts in:

# in your project's pyproject.toml
[tool.cluv.retry]
on_oom = ["bump_mem(1.5x, max=128G)", "fail"]
max_hops = 5

This is a proposal, not a shipped feature in cluv. An upstream issue is in draft.

Use with anything that calls sbatch

salvo has no SSH or async dependencies. It is a pydantic + stdlib library. Any tool that runs sbatch can import salvo.render, salvo.policy.apply_oom, and salvo.topology.load_preset independently. Two runnable examples under examples/ show the wiring end-to-end: cluv_integration.py (policy + render) and xgenius_integration.py (policy-only, when the host tool keeps its own renderer).

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysalvo-0.2.0.tar.gz (87.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysalvo-0.2.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file pysalvo-0.2.0.tar.gz.

File metadata

  • Download URL: pysalvo-0.2.0.tar.gz
  • Upload date:
  • Size: 87.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysalvo-0.2.0.tar.gz
Algorithm Hash digest
SHA256 233f6e4fc3081a3985c0a8780963310065323156dc3ff0f76f99c48b21191d18
MD5 dc98df20a67332551fcb4bcabba49770
BLAKE2b-256 c6023e2522e27fd32cb772c7c795b45151c542a9dfcb9f9b64ca4a198ae64fd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysalvo-0.2.0.tar.gz:

Publisher: release.yml on wietzesuijker/salvo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysalvo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pysalvo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysalvo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90f4ed70898ed71be5ae6e8a49a68077896b4b6597ba115b44842e801056798f
MD5 0e58701bdcb8caa5b54120c7b25502a3
BLAKE2b-256 96c8e5c6e05827c8ce6b338fc5735b56d42be7636f79ab91f58093b70919eaaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysalvo-0.2.0-py3-none-any.whl:

Publisher: release.yml on wietzesuijker/salvo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page