Hydra launcher plugin that runs jobs on Modal (modal.com), with config-driven sandbox customisation.
Project description
hydra-modal-launcher
A Hydra launcher plugin that ships multirun jobs to Modal. Inspired by hydra-submitit-launcher and hydra-ray-launcher.
Each Hydra job runs as one invocation of a Modal function. The image and function spec (GPU, CPU, memory, secrets, volumes, timeout, parallelism) are configured from YAML. An image_builder escape hatch lets you produce a fully custom modal.Image in Python when YAML isn't enough.
Contents
- Install
- Quick start
- Common recipes
- Configuration reference
- Per-job outputs
- How the user's code reaches the container
- How it works
- Project structure
- Gotchas
- Limitations
- Troubleshooting
- Testing
Install
pip install hydra-modal-launcher
# or, from a checkout:
pip install -e ".[dev]"
Requires python>=3.10, hydra-core>=1.3, and modal>=1.4. You also need modal token new to be configured on the host that runs the sweep.
Quick start
# my_app.py
import hydra
from omegaconf import DictConfig
@hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg: DictConfig) -> float:
return float(cfg.lr) * float(cfg.epochs)
if __name__ == "__main__":
main()
# conf/config.yaml
defaults:
- _self_
- override hydra/launcher: modal
lr: 0.01
epochs: 10
hydra:
launcher:
parallelism: 3 # 1 = serial, N = cap, -1 = unlimited
image:
pip_packages: [hydra-core, omegaconf]
function:
cpu: 1
memory: 1024
timeout: 300
Launch a sweep:
# Dry-run: log resolved spec without calling Modal
uv run my_app.py --multirun hydra.launcher.dry_run=true lr=0.001,0.01,0.1
# Real run (Modal credentials in env)
uv run my_app.py --multirun lr=0.001,0.01,0.1
Common recipes
These go under hydra.launcher in your config (or as --multirun overrides).
GPU job
hydra:
launcher:
parallelism: 4
function:
gpu: "L40S" # or "A100", "H100", "L40S:2" for 2x
memory: 16384
timeout: 3600
image:
pip_packages: [torch]
Use a Modal Secret
hydra:
launcher:
function:
secrets:
- my-wandb-key # resolved via modal.Secret.from_name("my-wandb-key")
Mount a Modal Volume at the sweep dir
hydra:
launcher:
function:
volumes:
${hydra.sweep.dir}: my-sweeps # mount-path → volume-name
Custom image builder (full Python control)
# custom_image.py at the project root (or anywhere on your sys.path)
import modal
def build_image(image_cfg) -> modal.Image:
return (
modal.Image.from_registry("ghcr.io/myorg/base:latest")
.pip_install("torch==2.5.0", "lightning")
.run_commands("git clone https://github.com/myorg/data.git /data")
)
hydra:
launcher:
image:
image_builder: custom_image.build_image # every other image.* field is ignored
Install deps from a requirements file or pyproject
For projects with more than a handful of pins, point the launcher at your existing dep manifest instead of duplicating entries in pip_packages. Both fields are composable — and additive with pip_packages (which still wins on name collision with the auto-pinned runtime deps).
hydra:
launcher:
image:
pip_requirements: requirements.txt # passed to Image.pip_install_from_requirements
hydra:
launcher:
image:
pip_pyproject: pyproject.toml # passed to Image.pip_install_from_pyproject
pip_pyproject_extras: [training] # → optional_dependencies=[...]
pip_packages: [extra-debug-tool] # still merged on top
Install layers are emitted heavy-first (pip_pyproject → pip_requirements → pip_packages), so editing pip_packages between runs doesn't invalidate the large transitive-dep layer.
Path resolution
pip_pyproject and pip_requirements accept both absolute and relative paths. Relative paths follow this order:
- Absolute paths are used as-is.
- Relative paths that exist relative to CWD are passed through unchanged — Modal's default behaviour.
- Otherwise, the launcher walks up from CWD looking for the nearest
pyproject.toml/setup.py/setup.cfg/.git, and if the file exists there, uses that absolute path. The resolution is logged. - No match anywhere — the path is handed to Modal unchanged so the resulting
FileNotFoundErrorsurfaces at build time.
This means you can invoke uv run scripts/train.py from any subdir and pip_pyproject: pyproject.toml will still find the project's root pyproject — same DWIM the launcher already does for source mounting.
Pin extra deps without losing the auto-pinned runtime deps
The plugin auto-adds hydra-core==<host_version> and cloudpickle==<host_version> to every built image. Your pip_packages entries are merged with these; on a name collision, your pin wins.
hydra:
launcher:
image:
pip_packages:
- "torch==2.5.0"
- "transformers>=4.50"
- "hydra-core>=1.3.0,<2" # overrides the auto-pin
Configuration reference
hydra.launcher.image (ModalImageConf)
| Field | Default | Notes |
|---|---|---|
python_version |
null |
If unset, matches the host's major.minor at launch time. Cross-version cloudpickle of __main__ functions can SIGSEGV the container; keep these aligned. |
base |
"debian_slim" |
or "from_registry" |
base_image |
null |
required when base="from_registry" |
pip_packages |
[] |
sorted before install for cache stability; merged with auto-pinned hydra-core + cloudpickle |
pip_requirements |
null |
path to a requirements file; passed to Image.pip_install_from_requirements. Relative paths are resolved against the nearest project root (see Path resolution). |
pip_pyproject |
null |
path to a pyproject.toml; passed to Image.pip_install_from_pyproject. Same resolution rules as pip_requirements. |
pip_pyproject_extras |
[] |
extras keys for pip_pyproject, forwarded as optional_dependencies=[...] |
apt_packages |
[] |
|
run_commands |
[] |
extra RUN lines |
env |
{} |
env vars baked into the image |
local_python_modules |
[] |
importable module names; passed to Image.add_local_python_source |
local_dirs |
[] |
list of {local_path, remote_path, ignore} for Image.add_local_dir |
image_builder |
null |
dotted path to (image_cfg) -> modal.Image. Overrides every other field in image. |
hydra.launcher.function (ModalFunctionConf)
| Field | Default | Notes |
|---|---|---|
gpu |
null |
e.g. "L40S", "A100:2" |
cpu |
null |
float, fractional cores |
memory |
null |
MB |
timeout |
3600 |
seconds |
secrets |
[] |
names resolved via modal.Secret.from_name |
volumes |
{} |
mount_path -> volume_name, resolved via modal.Volume.from_name |
retries |
0 |
|
region |
null |
hydra.launcher (top-level)
| Field | Default | Notes |
|---|---|---|
app_name |
"hydra-modal-launcher" |
passed to modal.App(...) |
parallelism |
-1 |
1 = serial, N caps concurrent containers via max_containers=N, -1 = unbounded |
dry_run |
false |
log resolved spec and skip app.run() |
Per-job outputs
Jobs run remotely on ephemeral Modal containers; Hydra's per-job working directory written by run_job lives on that container, not on your laptop. The launcher:
- Always writes minimal local
.hydra/{config,hydra,overrides}.yamlstubs into${hydra.sweep.dir}/<job_num>/from the parent process so downstream tooling and humans see the expected layout. - Optionally mounts a Modal Volume on the remote container via
hydra.launcher.function.volumes. If you want real artifact persistence, point a volume at the sweep dir and pull it down after the run.
Each job's Python return value is captured in JobReturn._return_value. Failures are mapped to JobReturn(status=FAILED, _return_value=<exception>).
How the user's code reaches the container
Modal does not auto-mount your CWD. The launcher inspects your task_function's module and:
- Importable package (
__module__ == "myproject.scripts.train"): adds the top-level package viaImage.add_local_python_source("myproject"). __main__(e.g.python scripts/train.py): walks up from the script's directory looking forpyproject.toml/setup.py/setup.cfg/.git. If found, mounts the whole project root viaImage.add_local_dir(<root>, "/root")with default ignores (.venv/,.git/,__pycache__/,node_modules/,multirun/,outputs/, etc.). This handles the common research-repo layout wherescripts/is a sibling of the package:myproject/ ├── pyproject.toml ← project root marker ├── myproject/ ← package │ └── lib.py ├── scripts/ │ └── train.py ← @hydra.main entrypoint └── conf/ └── config.yaml__main__with no project markers anywhere up-tree: mounts only the script's directory and warns that sibling packages will be unreachable.
Override either path by setting image.local_python_modules, image.local_dirs (with custom ignore globs per mount), or by taking full control with image.image_builder.
How it works
parent process modal cloud
────────────── ───────────
@hydra.main(main)
│
▼
ModalLauncher.launch(overrides, idx0)
│
│ 1. configure_log + Singleton.get_state()
│ 2. _resolve_sweep_configs(overrides) ┐ done on parent —
│ 3. _write_local_job_stubs(sweep_configs) │ the user's conf/ dir
│ 4. cloudpickle.dumps(launcher) │ is local-only and
│ 5. build_modal_app(launcher_cfg) ┘ doesn't exist remotely
│
▼
with modal.enable_output(), app.run():
fn.starmap(payloads, return_exceptions=True) ────► spawns N containers
│
▼
_worker.modal_entry(sweep_config, num, state, launcher_pickled)
│
│ cloudpickle.loads(launcher_pickled)
│ Singleton.set_state + setup_globals
│ HydraConfig.instance().set_config(sweep_config)
│ open_dict: hydra.job.id = modal call id
│ run_job(task_function, sweep_config, ...)
│
▼
returns JobReturn
│
▼
[JobReturn, JobReturn, ...] ────► back to Hydra's sweeper
Sweep configs are pre-resolved on the parent so the worker never needs to read the local conf/ dir from inside a Modal container. The cloudpickled launcher carries task_function and hydra_context.callbacks; the singleton snapshot is shipped separately and restored on the worker so HydraConfig.instance() resolves correctly.
Project structure
hydra-modal-launcher/
├── hydra_plugins/hydra_modal_launcher/ # the plugin (PEP 420 namespace — no __init__ on hydra_plugins/)
│ ├── config.py # dataclasses + ConfigStore registration
│ ├── modal_launcher.py # ModalLauncher(Launcher)
│ ├── _modal_app.py # pure + impure builders for modal.App / Image / Function
│ └── _worker.py # ships to the Modal container
├── example/ # Layout-A demo (entry: `uv run example/my_app.py`)
├── tests/ # pure unit tests, no Modal account required
├── AGENTS.md # ← read this if you're an AI agent
└── CHANGELOG.md
For deeper conventions and invariants — what's pure vs impure, where Modal can be imported, how to add a config field — see AGENTS.md.
Gotchas
- Host/container Python version must match. Cloudpickle ships
__main__-scoped functions by value (bytecode + cells); deserializing across Python minor versions can segfault the container. The defaultpython_version=nullauto-detects the host'smajor.minorand uses that, so you generally don't need to set it. hydra-coreandcloudpickleare added to every built image automatically, pinned to your host's installed versions. User-supplied version pins for the same package win on name collision.- Modal logs stream to your terminal during a sweep via
modal.enable_output(). Local Hydra logs and remote container stdout are interleaved.
Limitations
- No
checkpoint/ preemption support — Modal has no equivalent of SLURM's signal protocol. - No automatic sync of remote working dirs back to your laptop. Use volumes if you need it.
- Ephemeral apps only (
with app.run():). Pre-deployed apps viaFunction.from_nameare out of scope. - Image is rebuilt once per
launch()call. Modal caches build layers so subsequent runs are fast.
Troubleshooting
Runner segmentation fault (SIGSEGV) on container startup
Host and container Python versions don't match. Cloudpickle ships __main__ functions by value (bytecode); deserializing across minor versions crashes the container. Verify hydra.launcher.image.python_version is null (the default — it auto-matches your host) or set explicitly to your host's major.minor.
ModuleNotFoundError: No module named 'mypkg' on the remote
The auto-mount didn't pick up your package. If you ran the script directly (python scripts/train.py), the launcher looks for pyproject.toml / setup.py / setup.cfg / .git to mount the whole project root. If none exist, only the script's directory is mounted. Fix by either:
- adding the missing markers (an empty
pyproject.tomlis fine), or - explicitly setting
image.local_python_modules: ["mypkg"]orimage.local_dirs:in your config.
Primary config directory not found on the remote
You're on a stale build of the plugin. v0.1.0+ pre-resolves sweep configs on the parent — the worker should never call load_sweep_config. Upgrade.
Input aborted - exceeded limit of 8 retries
Container is crashing during input deserialization. Usual causes:
- Python-version mismatch (SIGSEGV — see above).
- Out-of-memory at import time. Bump
function.memory.hydra-core+omegaconfimport at ~150 MB; 256 MB is too tight. - Cloudpickle version drift. Should be auto-pinned to your host — verify with
hydra.launcher.dry_run=trueand checkcloudpickle==X.Y.Zis inpip_packages.
Modal container logs aren't showing up
You're probably running an old build. v0.1.0+ wraps the sweep in with modal.enable_output(). Upgrade.
Dry-run for everything
Add hydra.launcher.dry_run=true to any sweep. The launcher logs the resolved image spec + function kwargs and returns without calling Modal. Useful for validating config and image deps before paying for a build.
Testing
uv sync --extra dev
uv run pytest tests/
uv.lock is committed, so the sync is reproducible. Unit tests don't require a Modal account; the orchestration is pure functions where possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hydra_modal_launcher-0.2.1.tar.gz.
File metadata
- Download URL: hydra_modal_launcher-0.2.1.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daf7c0da151a67b0bd8cfff248609e9cbc80d95ff99eec04448467ff14f3cb94
|
|
| MD5 |
16963dd022061f3f8dee6d223e2f7e7a
|
|
| BLAKE2b-256 |
e06ed13328116df65fa5e82011b6e173d2c0fb9512d2b5f2b4c7c2a950e338a6
|
Provenance
The following attestation bundles were made for hydra_modal_launcher-0.2.1.tar.gz:
Publisher:
publish.yml on joncarter1/hydra-modal-launcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hydra_modal_launcher-0.2.1.tar.gz -
Subject digest:
daf7c0da151a67b0bd8cfff248609e9cbc80d95ff99eec04448467ff14f3cb94 - Sigstore transparency entry: 1523195646
- Sigstore integration time:
-
Permalink:
joncarter1/hydra-modal-launcher@0681f33e8616ecc5755c6a7fe3c219364a6493d5 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/joncarter1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0681f33e8616ecc5755c6a7fe3c219364a6493d5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file hydra_modal_launcher-0.2.1-py3-none-any.whl.
File metadata
- Download URL: hydra_modal_launcher-0.2.1-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b94f39ea3352695e43c0b0e3a0f9c61a439e016d0ca49a2fb59023e7bfe9e864
|
|
| MD5 |
d884f119c9e25110a973663ece7cc078
|
|
| BLAKE2b-256 |
3514e7fc47a28529f5053ad7728208d4ef62349c7867cb61e4cae33d9ffcd734
|
Provenance
The following attestation bundles were made for hydra_modal_launcher-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on joncarter1/hydra-modal-launcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hydra_modal_launcher-0.2.1-py3-none-any.whl -
Subject digest:
b94f39ea3352695e43c0b0e3a0f9c61a439e016d0ca49a2fb59023e7bfe9e864 - Sigstore transparency entry: 1523195713
- Sigstore integration time:
-
Permalink:
joncarter1/hydra-modal-launcher@0681f33e8616ecc5755c6a7fe3c219364a6493d5 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/joncarter1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0681f33e8616ecc5755c6a7fe3c219364a6493d5 -
Trigger Event:
push
-
Statement type: