Launch and monitor hyperparameter optimization job arrays on SLURM

Project description

HyperHerd

Hyperparameter sweeps on SLURM, run by an autonomous agent. Declare your search in YAML, hand over a one-line launcher script, and walk away — herd monitor submits trials in stages, diagnoses failures, retries the ones SLURM can fix, and pings you on Discord only when it can't.

📖 Full documentation: allenwlynch.github.io/hyperherd

What you get

One-command sweeps. Write a YAML, run herd run, and that's it — no sbatch boilerplate, no manual resubmits.
An agent that operates the sweep for you. herd monitor ramps trials in stages, diagnoses failures, bumps memory or wall-time when that's the right fix, and only interrupts you when it isn't.
Two-way Discord control. A dedicated channel per sweep with deterministic slash commands (/status, /run, /cancel, /tail, …) and free-form mentions for the agent.
Resume from anywhere. Pull the plug, edit the sweep, re-run — completed trials stick, failed ones go back to the queue.
Edit mid-run. Bump a learning-rate range or add a value; the next herd run appends the new trials without disturbing the ones already running.
Configs you don't have to memorize. A bundled Claude Code skill writes hyperherd.yaml for you from a one-paragraph description.

Hydra is the recommended trainer harness (its CLI consumes the override format natively), but the launcher is free-form bash — parse the arguments however you want.

Quick start

# Install (Python 3.8+ for the base CLI)
pip install hyperherd

# Install the Claude Code skill for authoring sweep configs
herd install-skill

# Scaffold a workspace
herd init my_experiment

# Edit my_experiment/hyperherd.yaml and my_experiment/launch.sh, then:
herd run my_experiment --dry-run    # preview
herd run my_experiment              # submit
herd status my_experiment           # one-shot status

To run the autonomous monitor (Python ≥ 3.10 + a Discord bot — see Discord setup):

pip install 'hyperherd[monitor]'
herd monitor my_experiment

Have Claude Code set you up

Open Claude Code in your project directory and paste this — it'll walk you through install, config authoring, validation, and (if you want it) the autonomous monitor end-to-end:

Help me set up HyperHerd. Read the setup guide at
https://raw.githubusercontent.com/AllenWLynch/hyperherd/main/docs/setup-help.md
and follow it — start with the Phase 0 interview, then drive the rest.

The full guide is also browsable at allenwlynch.github.io/hyperherd/setup-help.

Documentation

Getting started — install, scaffold, run your first sweep
Autonomous monitor — the agent daemon: setup, Discord control, failure triage
Discord setup — one-time bot creation walkthrough
Sweep config reference — every field in hyperherd.yaml
Conditions — filter or modify parameter combinations
Launcher script — contract + examples (Apptainer, conda, non-Hydra)
Command reference — every herd subcommand
Claude Code skill — authoring configs by asking Claude

Requirements

Python ≥ 3.8 for the base herd CLI
Python ≥ 3.10 for the [monitor] extras (the autonomous monitor — Discord, Claude Agent SDK)
A SLURM cluster with sbatch, sacct, squeue, scancel on the submission host

Releasing

Versions are derived from git tags via setuptools_scm — there's nothing to bump in pyproject.toml. To cut a release:

git tag -a v0.1.0 -m "first release"
git push origin v0.1.0

The Publish to PyPI GitHub Action picks up the tag, builds an sdist + wheel with version 0.1.0, verifies the built version matches the tag, and uploads to PyPI via Trusted Publisher (OIDC — no API token in repo secrets).

One-time PyPI setup before the first release:

Create the hyperherd project on PyPI.
In the project's "Publishing" settings, add a Trusted Publisher: owner AllenWLynch, repo hyperherd, workflow publish.yml, environment pypi.

Project details

Release history Release notifications | RSS feed

0.1.2

May 7, 2026

0.1.1

May 5, 2026

0.1.0

May 4, 2026

This version

0.0.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperherd-0.0.0.tar.gz (209.9 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperherd-0.0.0-py3-none-any.whl (123.3 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file hyperherd-0.0.0.tar.gz.

File metadata

Download URL: hyperherd-0.0.0.tar.gz
Upload date: May 4, 2026
Size: 209.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hyperherd-0.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0e2ff71f5c2df94a0986dc23cac59e6ae930429bc021483f98d2cf325b21e596`
MD5	`e5180d07a96b2ea0b0c2c18ed5efe5a7`
BLAKE2b-256	`845c8601870539305c0f0f34517bb140c83a304e36fbd9c8c0a4851fb0b6dd0d`

See more details on using hashes here.

File details

Details for the file hyperherd-0.0.0-py3-none-any.whl.

File metadata

Download URL: hyperherd-0.0.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 123.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hyperherd-0.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`863e8affd26148b444a7c9e3bf258c70388407d312955a5de0581cac8ba2028d`
MD5	`465354a1fab6198066274ea9774eddf4`
BLAKE2b-256	`f4beb755024e0a4f9377405b0a87ed36bbfba94deebda4e607552cb4931a04e5`

See more details on using hashes here.

hyperherd 0.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HyperHerd

What you get

Quick start

Have Claude Code set you up

Documentation

Requirements

Releasing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes