Launch and monitor hyperparameter optimization job arrays on SLURM
Project description
HyperHerd
Hyperparameter sweeps on SLURM, run by an autonomous agent. Declare your search in YAML, hand over a one-line launcher script, and walk away — herd monitor submits trials in stages, diagnoses failures, retries the ones SLURM can fix, and pings you on Discord only when it can't.
📖 Full documentation: allenwlynch.github.io/hyperherd
What you get
- One-command sweeps. Write a YAML, run
herd run, and that's it — no sbatch boilerplate, no manual resubmits. - An agent that operates the sweep for you.
herd monitorramps trials in stages, diagnoses failures, bumps memory or wall-time when that's the right fix, and only interrupts you when it isn't. - Two-way Discord control. A dedicated channel per sweep with deterministic slash commands (
/status,/run,/cancel,/tail, …) and free-form mentions for the agent. - Resume from anywhere. Pull the plug, edit the sweep, re-run — completed trials stick, failed ones go back to the queue.
- Edit mid-run. Bump a learning-rate range or add a value; the next
herd runappends the new trials without disturbing the ones already running. - Configs you don't have to memorize. A bundled Claude Code skill writes
hyperherd.yamlfor you from a one-paragraph description.
Hydra is the recommended trainer harness (its CLI consumes the override format natively), but the launcher is free-form bash — parse the arguments however you want.
Quick start
# Install (Python 3.8+ for the base CLI)
pip install hyperherd
# Install the Claude Code skill for authoring sweep configs
herd install-skill
# Scaffold a workspace
herd init my_experiment
# Edit my_experiment/hyperherd.yaml and my_experiment/launch.sh, then:
herd run my_experiment --dry-run # preview
herd run my_experiment # submit
herd status my_experiment # one-shot status
To run the autonomous monitor (Python ≥ 3.10 + a Discord bot — see Discord setup):
pip install 'hyperherd[monitor]'
herd monitor my_experiment
Have Claude Code set you up
Open Claude Code in your project directory and paste this — it'll walk you through install, config authoring, validation, and (if you want it) the autonomous monitor end-to-end:
Help me set up HyperHerd. Read the setup guide at
https://raw.githubusercontent.com/AllenWLynch/hyperherd/main/docs/setup-help.md
and follow it — start with the Phase 0 interview, then drive the rest.
The full guide is also browsable at allenwlynch.github.io/hyperherd/setup-help.
Documentation
- Getting started — install, scaffold, run your first sweep
- Autonomous monitor — the agent daemon: setup, Discord control, failure triage
- Discord setup — one-time bot creation walkthrough
- Sweep config reference — every field in
hyperherd.yaml - Conditions — filter or modify parameter combinations
- Launcher script — contract + examples (Apptainer, conda, non-Hydra)
- Command reference — every
herdsubcommand - Claude Code skill — authoring configs by asking Claude
Requirements
- Python ≥ 3.8 for the base
herdCLI - Python ≥ 3.10 for the
[monitor]extras (the autonomous monitor — Discord, Claude Agent SDK) - A SLURM cluster with
sbatch,sacct,squeue,scancelon the submission host
Releasing
Versions are derived from git tags via setuptools_scm — there's nothing to bump in pyproject.toml. To cut a release:
git tag -a v0.1.0 -m "first release"
git push origin v0.1.0
The Publish to PyPI GitHub Action picks up the tag, builds an sdist + wheel with version 0.1.0, verifies the built version matches the tag, and uploads to PyPI via Trusted Publisher (OIDC — no API token in repo secrets).
One-time PyPI setup before the first release:
- Create the
hyperherdproject on PyPI. - In the project's "Publishing" settings, add a Trusted Publisher: owner
AllenWLynch, repohyperherd, workflowpublish.yml, environmentpypi.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hyperherd-0.0.0.tar.gz.
File metadata
- Download URL: hyperherd-0.0.0.tar.gz
- Upload date:
- Size: 209.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e2ff71f5c2df94a0986dc23cac59e6ae930429bc021483f98d2cf325b21e596
|
|
| MD5 |
e5180d07a96b2ea0b0c2c18ed5efe5a7
|
|
| BLAKE2b-256 |
845c8601870539305c0f0f34517bb140c83a304e36fbd9c8c0a4851fb0b6dd0d
|
File details
Details for the file hyperherd-0.0.0-py3-none-any.whl.
File metadata
- Download URL: hyperherd-0.0.0-py3-none-any.whl
- Upload date:
- Size: 123.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
863e8affd26148b444a7c9e3bf258c70388407d312955a5de0581cac8ba2028d
|
|
| MD5 |
465354a1fab6198066274ea9774eddf4
|
|
| BLAKE2b-256 |
f4beb755024e0a4f9377405b0a87ed36bbfba94deebda4e607552cb4931a04e5
|