ML experiment launcher for local, SLURM, and SSH environments
Project description
Chester
Chester (chester-ml on PyPI) is a Python experiment launcher for ML workflows. Define your training function and parameter sweep — Chester handles dispatching jobs to local subprocesses, SSH servers, or SLURM clusters, with Singularity container support, code syncing, and reproducibility snapshots baked in.
Installation
pip install chester-ml
# or
uv add chester-ml
Quick Start
1. Create .chester/config.yaml in your project root:
log_dir: data
package_manager: uv
backends:
local:
type: local
prepare: .chester/backends/local/prepare.sh
myserver:
type: ssh
host: myserver # SSH alias from ~/.ssh/config
remote_dir: /home/user/myproject
prepare: .chester/backends/myserver/prepare.sh
mycluster:
type: slurm
host: mycluster
remote_dir: /home/user/myproject
prepare: .chester/backends/mycluster/prepare.sh
slurm:
partition: gpu
time: "24:00:00"
gpus: 1
cpus_per_gpu: 8
mem_per_gpu: 32G
2. Write a launcher:
from chester.run_exp import run_experiment_lite, VariantGenerator, detect_local_gpus, flush_backend
def run_task(variant, log_dir, exp_name):
print(f"lr={variant['lr']}, batch={variant['batch_size']}")
# ... your training code ...
vg = VariantGenerator()
vg.add('lr', [1e-3, 1e-4])
vg.add('batch_size', [32, 64])
for v in vg.variants():
run_experiment_lite(
stub_method_call=run_task,
variant=v,
mode='local', # or 'myserver', 'mycluster'
exp_prefix='sweep',
max_num_processes=max(1, len(detect_local_gpus())),
)
flush_backend('local') # no-op for local; required after loop for batch SSH mode
3. Run:
python launcher.py # local
python launcher.py myserver # SSH
python launcher.py mycluster # SLURM
Features
- Three backend types: local subprocess, SSH (
nohup), SLURM (sbatch) - Singularity on all backends: GPU passthrough, persistent overlays, per-container
prepare.sh - VariantGenerator: cartesian product sweeps, dependent parameters,
order="serial"(multi-step single job) andorder="dependent"(chained SLURM jobs) - Hydra integration: pass parameters as
key=valueoverrides with OmegaConf interpolation support - Git snapshot: saves
git_info.json+git_diff.patchper run for full reproducibility - Submodule commit pinning: pin specific submodule commits per job via remote git worktrees
- SSH batch-GPU mode: accumulate jobs across variants, fire one per GPU on
flush_backend() - Extra sync dirs: rsync additional paths (datasets, checkpoints) to remote before submission
- Per-experiment SLURM overrides: tune
time,gpus,mem_per_gpu, etc. perrun_experiment_lite()call - Graceful Ctrl+C: local kills subprocesses and stops the queue; remote detaches and lets jobs keep running
Documentation
Full reference in docs/:
| Doc | What it covers |
|---|---|
| Configuration | .chester/config.yaml — all fields, global singularity block, YAML anchors |
| Backends | Local, SSH, SLURM — all options, batch-GPU, extra sync dirs |
| Singularity | Mounts, overlays, PID namespace, fakeroot, runtime override |
| Parameter Sweeps | VariantGenerator, serial/dependent ordering, derive, flush_backend |
| Hydra | hydra_enabled, flags, OmegaConf interpolations |
| Git Snapshot | git_info.json, git_diff.patch, submodule tracking, recovery |
| Submodule Pinning | Per-job submodule commit pinning via worktrees |
| Examples | Annotated real-world config patterns |
Example Configs
See docs/examples/ for annotated configs:
simple.yaml— local + SSH + SLURM, no Singularitysingularity-slurm.yaml— production SLURM + Singularity with NFS mountsmulti-gpu-ssh.yaml— multi-GPU SSH workstation with batch mode
Project Layout
myproject/
├── .chester/
│ ├── config.yaml # Main config
│ └── backends/
│ ├── local/
│ │ └── prepare.sh # Local env setup
│ ├── mycluster/
│ │ └── prepare.sh # Cluster setup (modules, paths)
│ └── myserver/
│ └── prepare.sh # SSH server setup
├── launchers/
│ └── launch_sweep.py
└── src/
Chester searches for .chester/config.yaml upward from the current directory, stopping at the .git root. Override with $CHESTER_CONFIG_PATH.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chester_ml-2.0.0.tar.gz.
File metadata
- Download URL: chester_ml-2.0.0.tar.gz
- Upload date:
- Size: 401.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07b6a90e86a7b569cf1d0fd515c3232f3beacc95c66734ae56146e74c0ca9b43
|
|
| MD5 |
5616b90bda1a3862c4773f759cd570c8
|
|
| BLAKE2b-256 |
6dd03ac5e2c134dfdb56139abda25dfac86fd20a5f95e5f68387a50bf1faeff3
|
File details
Details for the file chester_ml-2.0.0-py3-none-any.whl.
File metadata
- Download URL: chester_ml-2.0.0-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80c7a0cc765cb68076da0605ecbada7b4b1253d95fc854616f3861c06b9ba615
|
|
| MD5 |
e875d0d2b9a277c2ef77dc8fbff737fe
|
|
| BLAKE2b-256 |
cb6100805333fefde096098f8d6779dcb87094fb97df89f50428f05960dae88b
|