Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
Project description
alloc (by Alloc Labs)
Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
Built by Alloc Labs: reduce ML training costs with better pre-flight decisions and faster feedback loops.
What Alloc Does
Most ML teams waste spend because resource decisions are guesswork and feedback arrives too late. Alloc gives you a progressive workflow:
- Pre-flight: estimate VRAM fit and rank feasible configs by objective (
alloc scan,alloc ghost) - Calibration run: measure peak VRAM + utilization (and optionally step timing) from a short run (
alloc run) - Run history: upload artifacts for team visibility and budget-aware proposals (
alloc upload)
Alloc is launcher-first. It works with python, torchrun, accelerate, and cluster entrypoints (Slurm, Ray, Kubernetes) because it does not require framework-specific wrappers for baseline value.
Who This Is For
- Solo engineers who want a fast sanity check before burning GPU time
- ML teams who need repeatable right-sizing and bottleneck visibility
- Platform/infra leads who want budget-aware controls without rewriting training code
Why It Is Low Friction
- No code changes required for baseline value (
alloc run) - Optional deeper integration via callbacks when you want richer timing signals
- Local-first artifacts so users still get value without cloud connectivity
- Progressive adoption from local CLI to team workflows and governance
Install
pip install alloc
# With GPU monitoring support (NVML via pynvml)
pip install alloc[gpu]
Notes:
allocdoes not depend on torch. If you wantalloc ghost train.pyto infer param counts from a script, torch must be installed in that environment, otherwise use--param-count-b.alloc runwill still execute your command withoutalloc[gpu], but it cannot collect GPU metrics.
Commands
alloc scan: Remote Ghost Scan (no GPU needed)
alloc scan --model llama-3-70b --gpu A100-80GB
alloc scan --model mistral-7b --gpu A10G --strategy fsdp --num-gpus 4
alloc scan --param-count-b 13.0 --gpu H100-80GB --dtype bf16
# Objective + budget constraints
alloc scan --model llama-3-70b --gpu H100-80GB --objective fastest_within_budget --max-budget-hourly 12
# Topology hints (optional, improves planner quality)
alloc scan --param-count-b 70 --gpu H100-80GB --num-gpus 64 --num-nodes 8 --gpus-per-node 8 --interconnect infiniband
alloc ghost: Local VRAM estimation
alloc ghost train.py --dtype bf16 --batch-size 32
alloc ghost train.py --param-count-b 7.0 # manual override
Analyzes your training script to discover model parameters and computes a VRAM breakdown. Uses a three-method fallback: (1) --param-count-b manual override, (2) subprocess execution to find nn.Module classes and count parameters, (3) AST parsing for from_pretrained() calls.
alloc run: Training with GPU monitoring
alloc run python train.py # calibrate and exit (default)
alloc run --full python train.py # monitor full training run
alloc run torchrun --nproc_per_node=4 train.py
alloc run -- python train.py --epochs 10
Wraps your command, monitors GPU memory/utilization/power via pynvml, and writes an artifact.
Default: calibrate-and-exit. Auto-stops when GPU metrics stabilize, prints a verdict with bottleneck classification and a top recommendation, then exits. Use --timeout N to adjust max calibration time (default 120s). Use --full to monitor the entire run.
Multi-GPU: Automatically discovers all GPUs used by the process tree (works with torchrun, accelerate launch, etc.).
Hardware context: Captures driver version, CUDA version, and SM compute capability from NVML.
alloc login: Authenticate with dashboard
alloc login
# Prompts for email + password, stores token + refresh_token in ~/.alloc/config.json
alloc login --token <ACCESS_TOKEN>
# Paste an access token from the dashboard (no password prompt)
alloc whoami: Show current auth + org context
alloc whoami
alloc whoami --json
Prints the current identity (when logged in), plus objective, effective budget cap, and fleet counts.
alloc logout: Clear local session
alloc logout
Clears saved token/refresh_token from ~/.alloc/config.json.
alloc upload: Upload artifact to dashboard
alloc upload alloc_artifact.json.gz
Uploads a previously saved .json.gz artifact to the dashboard via POST /runs/ingest. Requires authentication (alloc login first).
If your session token has expired and a refresh_token is available (password login flow), alloc upload refreshes once and retries automatically.
alloc catalog: Browse GPU hardware catalog
alloc catalog list # list all 13 GPUs (sorted by VRAM)
alloc catalog list --sort cost # sort by $/hr
alloc catalog list --sort tflops # sort by BF16 TFLOPS
alloc catalog show H100 # detailed specs for H100
alloc catalog show nvidia-a100-sxm-80gb # lookup by stable ID
Offline reference for GPU specs, interconnect details, and cloud pricing. Supports aliases (H100, A100, T4) and stable IDs.
alloc init: Configure GPU fleet and budget
alloc init # interactive wizard
alloc init --yes # non-interactive defaults (full catalog, 50/50 priority)
alloc init --from-org --yes # pull fleet/budget/objective from your org (requires alloc login)
Creates a .alloc.yaml file in the current directory with your GPU fleet, explore list, budget, and priority weights. When present, ghost, run, and scan automatically use fleet context for recommendations. Use --no-config on any command to skip it.
alloc version
alloc version
Python API
import alloc
# Static VRAM analysis (never crashes your training)
report = alloc.ghost(model)
print(report.total_gb) # e.g., 115.42
# Or from param count (no torch needed)
report = alloc.ghost(param_count_b=7.0, dtype="bf16")
Framework Callbacks
Optional callbacks for deeper profiling. Captures step-level timing, throughput, and dataloader wait estimates.
# HuggingFace Transformers
from alloc import HuggingFaceCallback
trainer = Trainer(..., callbacks=[HuggingFaceCallback()])
# PyTorch Lightning
from alloc import LightningCallback
trainer = Trainer(..., callbacks=[LightningCallback()])
Callbacks write a .alloc_callback.json sidecar with step time (p50/p90), samples/sec, and estimated dataloader wait %. This unlocks higher confidence analysis and dataloader bottleneck detection.
Configuration
Alloc works with zero config. You can optionally configure it with environment variables and/or a .alloc.yaml in your repo.
| Variable | Default | Description |
|---|---|---|
ALLOC_API_URL |
https://alloc-production-ffc2.up.railway.app |
API endpoint for remote scans |
ALLOC_TOKEN |
(empty) | Auth token for API calls |
ALLOC_UPLOAD |
false |
Upload results to dashboard (alloc run --upload also works) |
ALLOC_OUT |
alloc_artifact.json.gz |
Artifact output path |
ALLOC_GPU_COUNT_CANDIDATES |
(empty) | Override GPU-count candidates for ranking (comma-separated ints) |
Architecture
| Module | Purpose |
|---|---|
ghost.py |
VRAM estimation from parameter count. Computes weights + gradients + optimizer + activations + buffer breakdown. |
model_extractor.py |
Three-method model discovery: subprocess execution (nn.Module finder), AST parsing (from_pretrained), manual override. |
probe.py |
External GPU monitoring via pynvml. Process-tree aware multi-GPU discovery. Captures hardware context (driver, CUDA, SM version). |
stability.py |
Multi-signal stability detection for calibrate-and-exit (VRAM plateau + util std dev + power std dev). |
catalog/ |
Bundled GPU hardware catalog (13 GPUs) with specs and pricing. Powers alloc catalog commands. |
context.py |
Context autodiscovery: git (SHA, branch, repo), container (Docker/Podman), Ray (job ID, cluster). |
artifact_writer.py |
Artifact Writer: writes alloc_artifact.json.gz with probe, ghost, hardware, and context sections. |
cli.py |
Typer CLI with ghost, run, scan, login, upload, init, catalog, version commands. |
yaml_config.py |
.alloc.yaml parser: fleet, explore, priority, budget. Loaded automatically by ghost, run, scan. |
callbacks.py |
Framework callbacks: HuggingFace TrainerCallback and Lightning Callback with step timing (p50/p90), throughput, and dataloader wait estimation. |
upload.py |
Artifact uploader: POSTs .json.gz to POST /runs/ingest. |
display.py |
Rich terminal formatting for reports. |
config.py |
Env-var-only configuration (API URL, Supabase URL, token storage). |
Design Principles
- Zero config:
alloc run python train.pyworks out of the box - No monkey-patching: External monitoring only; deeper signals are opt-in
- Never crash user's training: All Alloc failures are caught and training continues
- Progressive disclosure: Individual use first, team governance later
Telemetry Levels
Alloc intentionally starts non-invasive and adds richer signals only when you opt in.
- NVML (today): peak VRAM, GPU utilization, power draw, basic hardware context (driver/CUDA/SM), multi-GPU discovery from the process tree.
- Framework timing (today, opt-in): step time p50/p90, samples/sec, estimated dataloader wait percentage via HF/Lightning callbacks.
- Distributed timing (planned, opt-in): per-rank timing skew, communication overhead, stronger interconnect-aware recommendations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alloc-0.0.1.tar.gz.
File metadata
- Download URL: alloc-0.0.1.tar.gz
- Upload date:
- Size: 68.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a43139e98a8e6293c2ff5028fb7b435e5ec2a66132f9393ab5648bb94fea7ef
|
|
| MD5 |
c0c77a12d2b06532cc53ba273cf41d14
|
|
| BLAKE2b-256 |
20f218c8fe4e43c372cbe2ae2eb16cd88070c9bff99661786291eb2a102d45cc
|
Provenance
The following attestation bundles were made for alloc-0.0.1.tar.gz:
Publisher:
publish-pypi.yml on alloc-labs/platform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloc-0.0.1.tar.gz -
Subject digest:
2a43139e98a8e6293c2ff5028fb7b435e5ec2a66132f9393ab5648bb94fea7ef - Sigstore transparency entry: 975682919
- Sigstore integration time:
-
Permalink:
alloc-labs/platform@34ba261a1bffd5c8daf5a9817da56704217d60b1 -
Branch / Tag:
refs/tags/alloc-v0.0.1 - Owner: https://github.com/alloc-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@34ba261a1bffd5c8daf5a9817da56704217d60b1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file alloc-0.0.1-py3-none-any.whl.
File metadata
- Download URL: alloc-0.0.1-py3-none-any.whl
- Upload date:
- Size: 53.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
057e11a349dec801c5889de45f094c74174747d546f9b02a02c9857b71143b27
|
|
| MD5 |
954dd708bfdc2b0dd79010b9ad53ee75
|
|
| BLAKE2b-256 |
d41c059d014ede3210ac8fe873b51bbd24c63e180d4ce01786fb9f814663507a
|
Provenance
The following attestation bundles were made for alloc-0.0.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on alloc-labs/platform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
alloc-0.0.1-py3-none-any.whl -
Subject digest:
057e11a349dec801c5889de45f094c74174747d546f9b02a02c9857b71143b27 - Sigstore transparency entry: 975682927
- Sigstore integration time:
-
Permalink:
alloc-labs/platform@34ba261a1bffd5c8daf5a9817da56704217d60b1 -
Branch / Tag:
refs/tags/alloc-v0.0.1 - Owner: https://github.com/alloc-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@34ba261a1bffd5c8daf5a9817da56704217d60b1 -
Trigger Event:
push
-
Statement type: