Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
Project description
alloc
Find and fix training bottlenecks. Zero code changes.
pip install alloc
alloc run python train.py
alloc v0.0.2 — Calibrate
Run Summary
Peak VRAM 31.2 GB / 40.0 GB (A100)
VRAM used 78.0%
Avg GPU util 72.3%
Avg power 287 W
Duration 24.1s (auto-stopped: metrics stable at 18.2s)
Step time 148.5 ms (p50) / 152.1 ms (p90)
Throughput 42.3 samples/sec
Artifact: alloc_artifact.json.gz
That's it. No decorators, no config files, no code changes. Alloc wraps your command, profiles GPU usage, and tells you what's wrong.
What you get
alloc diagnose reads your training script and tells you exactly what to change:
alloc diagnose train.py
alloc diagnose — 3 findings in train.py
CRITICAL DL005 — DataLoader running in main thread
train.py:47 num_workers=0 → num_workers=8
num_workers=0 loads data in the main thread, blocking GPU computation entirely.
Expected impact: ~30-50% faster training with parallel data loading
WARNING PREC002 — Using fp16, consider bf16
train.py:56 dtype: float16 → dtype: bfloat16
H100 supports bf16 natively — eliminates loss scaling overhead.
Expected impact: ~5-10% speedup, eliminates GradScaler complexity
INFO THRU001 — cudnn.benchmark not enabled
Add: torch.backends.cudnn.benchmark = True
Expected impact: ~5-10% speedup for fixed-size inputs
Summary: 1 critical, 1 warning, 1 info
Run with --diff to generate patches | --json for CI output
alloc ghost estimates VRAM before you launch:
alloc ghost train.py --dtype bf16
Ghost Scan — 7.0B params (bf16)
Model weights 13.04 GB
Gradients 13.04 GB
Optimizer (Adam) 78.23 GB
Activations (est.) 0.50 GB
Buffer (10%) 10.48 GB
Total VRAM 115.28 GB
alloc scan ranks GPU configs without a GPU:
alloc scan --model llama-3-70b --gpu H100-80GB --num-gpus 8
Works with everything
Alloc wraps your launch command. No framework-specific setup required.
alloc run python train.py
alloc run torchrun --nproc_per_node=4 train.py
alloc run accelerate launch train.py
alloc run srun python train.py # Slurm
alloc run ray job submit -- python train.py
Multi-GPU detection is automatic (discovers all GPUs in the process tree).
Deeper signals (optional)
Add a one-line callback for step-level timing:
# HuggingFace
from alloc import HuggingFaceCallback
trainer = Trainer(..., callbacks=[HuggingFaceCallback()])
# Lightning
from alloc import LightningCallback
trainer = Trainer(..., callbacks=[LightningCallback()])
This unlocks step time p50/p90, throughput, and dataloader bottleneck detection.
All commands
| Command | What it does |
|---|---|
alloc run <cmd> |
Profile a training run (auto-stops when stable) |
alloc diagnose <script> |
AST analysis with specific fix suggestions |
alloc ghost <script> |
Estimate VRAM before launching |
alloc scan --model <name> |
Rank GPU configs remotely (no GPU needed) |
alloc catalog list |
Browse 13 GPUs with specs and pricing |
alloc init |
Configure GPU fleet and budget (.alloc.yaml) |
alloc login |
Authenticate for dashboard + auto-upload |
Every command supports --json for CI/CD integration.
Dashboard
Log in to get team visibility, budget tracking, and optimization proposals:
alloc login --browser
alloc run python train.py # auto-uploads when logged in
Dashboard at alloclabs.com
Design principles
- Zero config —
alloc run python train.pyworks out of the box - Never crash training — all Alloc failures are caught silently
- No monkey-patching — external monitoring only, deeper signals opt-in
- Local-first — works in air-gapped environments, no internet required
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alloc-0.0.7.tar.gz.
File metadata
- Download URL: alloc-0.0.7.tar.gz
- Upload date:
- Size: 164.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d40b4ce9f2682cfe6015bf654461a630fa4480eef91e7a758e1062ea72a28549
|
|
| MD5 |
dc39d080dca368b82c193e156fe3d05a
|
|
| BLAKE2b-256 |
64ed088574e4738ca011720012b1cff9e7c69d2a2a83961d7d2f7b8d0e2e8795
|
Provenance
The following attestation bundles were made for alloc-0.0.7.tar.gz:
Publisher:
publish-pypi.yml on alloc-labs/platform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloc-0.0.7.tar.gz -
Subject digest:
d40b4ce9f2682cfe6015bf654461a630fa4480eef91e7a758e1062ea72a28549 - Sigstore transparency entry: 1100958320
- Sigstore integration time:
-
Permalink:
alloc-labs/platform@fdd188ad7266838bafb36515a0ee035deef8cdbf -
Branch / Tag:
refs/tags/alloc-v0.0.7 - Owner: https://github.com/alloc-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@fdd188ad7266838bafb36515a0ee035deef8cdbf -
Trigger Event:
push
-
Statement type:
File details
Details for the file alloc-0.0.7-py3-none-any.whl.
File metadata
- Download URL: alloc-0.0.7-py3-none-any.whl
- Upload date:
- Size: 119.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
583afd03a10c0510eeb2bb9783b017b3248b7e16a52dbbc22648e12b3c4c000f
|
|
| MD5 |
f59a336aee722ca83550561515df79db
|
|
| BLAKE2b-256 |
f78bb2c2f1544354d7b97976fbfaf2e852f758df51a112a7266551321ba06577
|
Provenance
The following attestation bundles were made for alloc-0.0.7-py3-none-any.whl:
Publisher:
publish-pypi.yml on alloc-labs/platform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloc-0.0.7-py3-none-any.whl -
Subject digest:
583afd03a10c0510eeb2bb9783b017b3248b7e16a52dbbc22648e12b3c4c000f - Sigstore transparency entry: 1100958322
- Sigstore integration time:
-
Permalink:
alloc-labs/platform@fdd188ad7266838bafb36515a0ee035deef8cdbf -
Branch / Tag:
refs/tags/alloc-v0.0.7 - Owner: https://github.com/alloc-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@fdd188ad7266838bafb36515a0ee035deef8cdbf -
Trigger Event:
push
-
Statement type: