Skip to main content

Launch and monitor hyperparameter optimization job arrays on SLURM

Project description

HyperHerd

⚠️ Pre-release / actively developed. HyperHerd is in soft launch — the YAML schema, CLI flags, and Python API may change without notice between versions. Pin to an exact version (hyperherd==X.Y.Z) if you build on top of it, and expect breaking changes until a tagged 1.0.

Hyperparameter sweeps on SLURM, run by an autonomous agent. Declare your search in YAML, hand over a one-line launcher script, and walk away — herd monitor submits trials in stages, diagnoses failures, retries the ones SLURM can fix, and pings you on Discord only when it can't.

📖 Full documentation: allenwlynch.github.io/hyperherd

What you get

  • One-command sweeps. Write a YAML, run herd run, and that's it — no sbatch boilerplate, no manual resubmits.
  • An agent that operates the sweep for you. herd monitor ramps trials in stages, diagnoses failures, bumps memory or wall-time when that's the right fix, and only interrupts you when it isn't.
  • Two-way Discord control. A dedicated channel per sweep with deterministic slash commands (/status, /run, /cancel, /tail, …) and free-form mentions for the agent.
  • Resume from anywhere. Pull the plug, edit the sweep, re-run — completed trials stick, failed ones go back to the queue.
  • Edit mid-run. Bump a learning-rate range or add a value; the next herd run appends the new trials without disturbing the ones already running.
  • Configs you don't have to memorize. A bundled Claude Code skill writes hyperherd.yaml for you from a one-paragraph description.

Hydra is the recommended trainer harness (its CLI consumes the override format natively), but the launcher is free-form bash — parse the arguments however you want.

Quick start

# Install (Python 3.8+ for the base CLI)
pip install hyperherd

# Install the Claude Code skill for authoring sweep configs
herd install-skill

# Scaffold a workspace
herd init my_experiment

# Edit my_experiment/hyperherd.yaml and my_experiment/launch.sh, then:
herd run my_experiment --dry-run    # preview
herd run my_experiment              # submit
herd status my_experiment           # one-shot status

To run the autonomous monitor (Python ≥ 3.10 + a Discord bot — see Discord setup):

pip install 'hyperherd[monitor]'
herd monitor my_experiment

Have Claude Code set you up

Open Claude Code in your project directory and paste this — it'll walk you through install, config authoring, validation, and (if you want it) the autonomous monitor end-to-end:

Help me set up HyperHerd. Read the setup guide at
https://raw.githubusercontent.com/AllenWLynch/hyperherd/main/docs/setup-help.md
and follow it — start with the Phase 0 interview, then drive the rest.

The full guide is also browsable at allenwlynch.github.io/hyperherd/setup-help.

Documentation

Requirements

  • Python ≥ 3.8 for the base herd CLI
  • Python ≥ 3.10 for the [monitor] extras (the autonomous monitor — Discord, Claude Agent SDK)
  • A SLURM cluster with sbatch, sacct, squeue, scancel on the submission host

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperherd-0.1.2.tar.gz (670.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hyperherd-0.1.2-py3-none-any.whl (141.0 kB view details)

Uploaded Python 3

File details

Details for the file hyperherd-0.1.2.tar.gz.

File metadata

  • Download URL: hyperherd-0.1.2.tar.gz
  • Upload date:
  • Size: 670.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperherd-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8ed29c038d6d0e13e077e4bb278777e6c781bf1a94c66d6385bd0c4b0210c97f
MD5 659d4d9083da9246d2b4cab4232eb607
BLAKE2b-256 8eb36015b7e33b05bdc4f6d5a774557f33cfdac03d84b64fd935f3d807c9106c

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperherd-0.1.2.tar.gz:

Publisher: publish.yml on AllenWLynch/hyperherd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hyperherd-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: hyperherd-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 141.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperherd-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f3c97bc076b26182fd838bab78360e5c4d51bb07ad4d56599c90ecb2e2d6cbb4
MD5 d0380fc6d38cf999269eb795c54cff24
BLAKE2b-256 653a43923fe0061979e5dcaf3af141ee3b8d88a505c5f457ee0e19171d89163e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperherd-0.1.2-py3-none-any.whl:

Publisher: publish.yml on AllenWLynch/hyperherd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page