Experiment orchestration toolkit for Slurm-based training and evaluation workflows.
Project description
slurmforge
slurmforge is a Slurm-native stage-batch system for AI training and evaluation workflows.
It focuses on a small CLI surface:
sforge init
sforge init --list-templates
sforge init --template train-eval --output ./demo --force
cd demo
sforge validate --config experiment.yaml
sforge estimate --config experiment.yaml
sforge plan train --config experiment.yaml --dry-run=full --output plan.audit.json
sforge plan eval --config experiment.yaml --checkpoint /path/to/model.pt --input-name model_input
sforge plan run --config experiment.yaml
sforge train --config experiment.yaml --dry-run=full
sforge eval --config experiment.yaml --checkpoint /path/to/model.pt
sforge run --config experiment.yaml
sforge status --from /path/to/root --reconcile
sforge resubmit --from /path/to/root --stage eval --query state=failed
Install
python -m venv .venv
source .venv/bin/activate
python -m pip install -e '.[dev]'
Start
Create a starter project instead of writing YAML from scratch:
sforge init
For scripts or CI, choose a template explicitly:
sforge init --template train-eval --output ./demo --force
This writes ./demo/experiment.yaml, ./demo/CONFIG.sforge.md,
./demo/README.sforge.md, and the template's stage scripts.
Available starter templates:
train-eval: train produces a checkpoint; eval consumes the upstream output.train-only: one train stage with a checkpoint output.eval-checkpoint: one eval stage that consumes an explicit checkpoint path.
The generated train.py and eval.py are structured as integration scaffolds:
SECTION A - SlurmForge contract: injected CLI args and environment contract.SECTION B - Your model code: model construction, data loading, training, and eval logic to replace.SECTION C - Output contract: checkpoint and metrics files declared by the YAML.
Minimal Workflow
sforge validate --config experiment.yaml
sforge run --config experiment.yaml --dry-run=full
sforge run --config experiment.yaml --emit-only
sforge run --config experiment.yaml
sforge status --from ./runs/<project>/<experiment>/<pipeline-root> --reconcile
Use sforge train for train-only configs and sforge eval --checkpoint /path/to/model.pt for eval-only configs.
Docs
Development
ruff check src tests
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slurmforge-1.2.0.tar.gz.
File metadata
- Download URL: slurmforge-1.2.0.tar.gz
- Upload date:
- Size: 167.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f9e280b12bf51cad25ef32da34c864bc73cae78eebc6f977aa293f4beb5099a
|
|
| MD5 |
79a16856ba766b7ff7268455cceaf17e
|
|
| BLAKE2b-256 |
fa385a44c828f93373c15adbabae103569f685ee7196aa3623e5c593b8026762
|
Provenance
The following attestation bundles were made for slurmforge-1.2.0.tar.gz:
Publisher:
publish.yml on Sean-XinLi/slurmforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurmforge-1.2.0.tar.gz -
Subject digest:
2f9e280b12bf51cad25ef32da34c864bc73cae78eebc6f977aa293f4beb5099a - Sigstore transparency entry: 1413122132
- Sigstore integration time:
-
Permalink:
Sean-XinLi/slurmforge@4be0d3b2dbe4bfc16e70c2d8a60611f2f11b3f13 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Sean-XinLi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4be0d3b2dbe4bfc16e70c2d8a60611f2f11b3f13 -
Trigger Event:
release
-
Statement type:
File details
Details for the file slurmforge-1.2.0-py3-none-any.whl.
File metadata
- Download URL: slurmforge-1.2.0-py3-none-any.whl
- Upload date:
- Size: 293.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa2c796d4005d3c2869576c8a3cf93f14cf45c14202d37451c6c4c6a28d0d6bc
|
|
| MD5 |
cefebaa05280da41ce467e1b846c551d
|
|
| BLAKE2b-256 |
af0e2ffd1a492f7d674341ccd0bf30d07d0a7f6e8a0ab38da9fc1d1c1453f2e9
|
Provenance
The following attestation bundles were made for slurmforge-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on Sean-XinLi/slurmforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurmforge-1.2.0-py3-none-any.whl -
Subject digest:
fa2c796d4005d3c2869576c8a3cf93f14cf45c14202d37451c6c4c6a28d0d6bc - Sigstore transparency entry: 1413122295
- Sigstore integration time:
-
Permalink:
Sean-XinLi/slurmforge@4be0d3b2dbe4bfc16e70c2d8a60611f2f11b3f13 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Sean-XinLi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4be0d3b2dbe4bfc16e70c2d8a60611f2f11b3f13 -
Trigger Event:
release
-
Statement type: