Skip to main content

Simple objective optimization orchestration for Slurm clusters

Project description

slurptuna – Run Optuna on Slurm (HPC hyperparameter optimization made simple)

Run Optuna hyperparameter optimization on Slurm clusters without writing sbatch scripts or managing distributed workers.

Running Optuna on a Slurm cluster (HPC) is not straightforward. slurptuna provides a simple way to run Optuna on Slurm with minimal setup.

In practice, running Optuna on Slurm clusters usually means:

  • writing and managing sbatch job arrays
  • coordinating distributed Optuna trials
  • aggregating results across workers

While Optuna supports distributed optimization, integrating it with Slurm typically requires custom orchestration.

slurptuna removes that overhead by handling job submission, parallel execution, and result aggregation automatically.

Install

pip install slurptuna

Or with uv:

uv add slurptuna

Usage

Here is a minimal example of running Optuna on Slurm using slurptuna:

(1) Write your loss function in a script

# my_model.py
from datetime import timedelta
from slurptuna import execution_mode, loss, optimize_run

@loss(
    name="my_model",
    description="Fit alpha/beta",
    parameter_space={"alpha": (0.0, 1.0), "beta": (0.0, 1.0)},
)
def my_model(params, seed):
    return abs(params["alpha"] - 0.3) + abs(params["beta"] - 0.7)

if __name__ == "__main__":
    result = optimize_run(
        my_model,
        mode=execution_mode("distributed"),
        n_trials=20,
        n_seeds=400,
        chunk_size=20,
        worker_time_limit=timedelta(hours=2),
        slurm_qos="short",
    )
    print(result.best_params)
    # best params and best value are also written to runs/my_model_v0001/summary.json

Distributed defaults are tuned for common short-queue clusters:

  • worker_time_limit=timedelta(hours=2)
  • slurm_qos="short"

If your cluster uses a different QoS (or none), set slurm_qos accordingly. slurptuna automatically retries submission without --qos if the specified QoS is rejected.

Parameter space

Loss functions may accept either (params, seed) or (params, seed, context). Most users can ignore context; it is reserved for framework-provided metadata.

Tuple shorthand is interpreted as (min, max):

parameter_space={"alpha": (0.0, 1.0)}

You can also use explicit specs when needed:

from slurptuna import search_param

parameter_space={
    "alpha": search_param(range=(0.0, 1.0)),
    "steps": search_param(range=(1, 10), dtype="int"),
    "mode": search_param(allowed=["fast", "slow"]),
}

(2) Submit your script as a long-running controller job on Slurm:

sbatch run_controller.sh my_model.py

run_controller.sh:

#!/bin/bash
#SBATCH --job-name=slurptuna-controller
#SBATCH --time=04:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G

source .venv/bin/activate
python "$1"

The controller submits and monitors chunk/reduce array jobs automatically — you just wait for the result.

Performance

On a real Slurm cluster (short QoS, 4-CPU tasks), slurptuna scales seed throughput dramatically:

Case Seeds Wall time Seeds/s Speedup
single, 1 worker (baseline) 100,000 84 s 1,191
single, 4 processes 400,000 89 s 4,494 3.8×
distributed, 100 tasks × 4 processes 40,000,000 399 s 100,251 84×

The distributed case evaluates 40 million seeds in ~7 minutes — work that would take ~9 hours sequentially.

See benchmark/ for the full benchmark setup and loss function.

Docs

younesstrittmatter.github.io/slurptuna

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurptuna-0.0.6.tar.gz (145.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurptuna-0.0.6-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file slurptuna-0.0.6.tar.gz.

File metadata

  • Download URL: slurptuna-0.0.6.tar.gz
  • Upload date:
  • Size: 145.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurptuna-0.0.6.tar.gz
Algorithm Hash digest
SHA256 d24b5258fd64e66da71c28145de818b513693af3a150cc1f0b69d7981d9c3723
MD5 764c5b07b52e6c1d972dbae38362075c
BLAKE2b-256 b3f9b00a07aa30c1f6f25cfdecf4bebb3352202856616b34f1648d82ea1b97e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurptuna-0.0.6.tar.gz:

Publisher: publish.yml on younesStrittmatter/slurptuna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slurptuna-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: slurptuna-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurptuna-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 70f56a94b90d5f733be8b7b3fcb2920679cbcb9e43bbb6de5f8cc9227c12cf20
MD5 fa1890e0a010c3845e4294e9c8e39836
BLAKE2b-256 ad22eca2c5f8564b166e664386e11bd1aa353967bad2e63226b3fffbc52b577e

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurptuna-0.0.6-py3-none-any.whl:

Publisher: publish.yml on younesStrittmatter/slurptuna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page