Skip to main content

Simple objective optimization orchestration for Slurm clusters

Project description

slurptuna – Run Optuna on Slurm (HPC hyperparameter optimization made simple)

Run Optuna hyperparameter optimization on Slurm clusters without writing sbatch scripts or managing distributed workers.

Running Optuna on a Slurm cluster (HPC) is not straightforward. slurptuna provides a simple way to run Optuna on Slurm with minimal setup.

In practice, running Optuna on Slurm clusters usually means:

  • writing and managing sbatch job arrays
  • coordinating distributed Optuna trials
  • aggregating results across workers

While Optuna supports distributed optimization, integrating it with Slurm typically requires custom orchestration.

slurptuna removes that overhead by handling job submission, parallel execution, and result aggregation automatically.

Install

pip install slurptuna

Or with uv:

uv add slurptuna

Usage

Here is a minimal example of running Optuna on Slurm using slurptuna:

(1) Write your loss function in a script

# my_model.py
from datetime import timedelta
from slurptuna import execution_mode, loss, optimize_run

@loss(
    name="my_model",
    description="Fit alpha/beta",
    parameter_space={"alpha": (0.0, 1.0), "beta": (0.0, 1.0)},
)
def my_model(params, seed):
    return abs(params["alpha"] - 0.3) + abs(params["beta"] - 0.7)

if __name__ == "__main__":
    result = optimize_run(
        my_model,
        mode=execution_mode("distributed"),
        n_trials=20,
        n_seeds=400,
        chunk_size=20,
        worker_time_limit=timedelta(hours=2),
        slurm_qos="short",
    )
    print(result.best_params)
    # best params and best value are also written to runs/my_model_v0001/summary.json

Distributed defaults are tuned for common short-queue clusters:

  • worker_time_limit=timedelta(hours=2)
  • slurm_qos="short"

If your cluster uses a different QoS (or none), set slurm_qos accordingly. slurptuna automatically retries submission without --qos if the specified QoS is rejected.

Parameter space

Loss functions may accept either (params, seed) or (params, seed, context). Most users can ignore context; it is reserved for framework-provided metadata.

Tuple shorthand is interpreted as (min, max):

parameter_space={"alpha": (0.0, 1.0)}

You can also use explicit specs when needed:

from slurptuna import search_param

parameter_space={
    "alpha": search_param(range=(0.0, 1.0)),
    "steps": search_param(range=(1, 10), dtype="int"),
    "mode": search_param(allowed=["fast", "slow"]),
}

(2) Submit your script as a long-running controller job on Slurm:

sbatch run_controller.sh my_model.py

run_controller.sh:

#!/bin/bash
#SBATCH --job-name=slurptuna-controller
#SBATCH --time=04:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G

source .venv/bin/activate
python "$1"

The controller submits and monitors chunk/reduce array jobs automatically — you just wait for the result.

Dynamic script styles (sys.argv) in distributed mode

slurptuna forwards the launcher's sys.argv to worker imports by default, so argv-driven dynamic loss naming works across Slurm jobs.

Important: the loss still must be registered when the module is imported. A loss defined only inside if __name__ == "__main__": is not visible to workers.

Performance

On a real Slurm cluster (short QoS, 4-CPU tasks), slurptuna scales seed throughput dramatically:

Case Seeds Wall time Seeds/s Speedup
single, 1 worker (baseline) 100,000 84 s 1,191
single, 4 processes 400,000 89 s 4,494 3.8×
distributed, 100 tasks × 4 processes 40,000,000 399 s 100,251 84×

The distributed case evaluates 40 million seeds in ~7 minutes — work that would take ~9 hours sequentially.

See benchmark/ for the full benchmark setup and loss function.

Docs

younesstrittmatter.github.io/slurptuna

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurptuna-0.0.7.tar.gz (148.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurptuna-0.0.7-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file slurptuna-0.0.7.tar.gz.

File metadata

  • Download URL: slurptuna-0.0.7.tar.gz
  • Upload date:
  • Size: 148.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurptuna-0.0.7.tar.gz
Algorithm Hash digest
SHA256 8f80c4ae6d485b2b81c295113c04c930940041b6067aff4056887003a79faa52
MD5 7335b4fb3d91bbd58ff9e71095cc4eb5
BLAKE2b-256 13439af69107df155d9f291f873d3948fda9f1deca38c413e6a898201a6c0c6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurptuna-0.0.7.tar.gz:

Publisher: publish.yml on younesStrittmatter/slurptuna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slurptuna-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: slurptuna-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurptuna-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8ea50cd63a355567e01176b9caa1ca918be99be169807c9ae353842ecddbb471
MD5 5ebb6f59e487cb8257855fc05ffc641f
BLAKE2b-256 49b26f66b07b26b368888f0fffdeeb4f2509cad5e8fe42f5c009942972e4d9bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurptuna-0.0.7-py3-none-any.whl:

Publisher: publish.yml on younesStrittmatter/slurptuna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page