Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.

These details have not been verified by PyPI

Project links

Project description

alloc (by Alloc Labs)

Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.

Built by Alloc Labs: reduce ML training costs with better pre-flight decisions and faster feedback loops.

What Alloc Does

Most ML teams waste spend because resource decisions are guesswork and feedback arrives too late. Alloc gives you a progressive workflow:

Pre-flight: estimate VRAM fit and rank feasible configs by objective (alloc scan, alloc ghost)
Calibration run: measure peak VRAM + utilization (and optionally step timing) from a short run (alloc run)
Run history: upload artifacts for team visibility and budget-aware proposals (alloc upload)

Alloc is launcher-first. It works with python, torchrun, accelerate, and cluster entrypoints (Slurm, Ray, Kubernetes) because it does not require framework-specific wrappers for baseline value.

Who This Is For

Solo engineers who want a fast sanity check before burning GPU time
ML teams who need repeatable right-sizing and bottleneck visibility
Platform/infra leads who want budget-aware controls without rewriting training code

Why It Is Low Friction

No code changes required for baseline value (alloc run)
Optional deeper integration via callbacks when you want richer timing signals
Local-first artifacts so users still get value without cloud connectivity
Progressive adoption from local CLI to team workflows and governance

Install

pip install alloc

# With GPU monitoring support (NVML via pynvml)
pip install alloc[gpu]

Notes:

alloc does not depend on torch. If you want alloc ghost train.py to infer param counts from a script, torch must be installed in that environment, otherwise use --param-count-b.
alloc run will still execute your command without alloc[gpu], but it cannot collect GPU metrics.

Commands

`alloc scan`: Remote Ghost Scan (no GPU needed)

alloc scan --model llama-3-70b --gpu A100-80GB
alloc scan --model mistral-7b --gpu A10G --strategy fsdp --num-gpus 4
alloc scan --param-count-b 13.0 --gpu H100-80GB --dtype bf16

# Objective + budget constraints
alloc scan --model llama-3-70b --gpu H100-80GB --objective fastest_within_budget --max-budget-hourly 12

# Topology hints (optional, improves planner quality)
alloc scan --param-count-b 70 --gpu H100-80GB --num-gpus 64 --num-nodes 8 --gpus-per-node 8 --interconnect infiniband

`alloc ghost`: Local VRAM estimation

alloc ghost train.py --dtype bf16 --batch-size 32
alloc ghost train.py --param-count-b 7.0   # manual override

Analyzes your training script to discover model parameters and computes a VRAM breakdown. Uses a three-method fallback: (1) --param-count-b manual override, (2) subprocess execution to find nn.Module classes and count parameters, (3) AST parsing for from_pretrained() calls.

`alloc run`: Training with GPU monitoring

alloc run python train.py                # calibrate and exit (default)
alloc run --full python train.py         # monitor full training run
alloc run torchrun --nproc_per_node=4 train.py
alloc run -- python train.py --epochs 10

Wraps your command, monitors GPU memory/utilization/power via pynvml, and writes an artifact.

Default: calibrate-and-exit. Auto-stops when GPU metrics stabilize, prints a verdict with bottleneck classification and a top recommendation, then exits. Use --timeout N to adjust max calibration time (default 120s). Use --full to monitor the entire run.

Multi-GPU: Automatically discovers all GPUs used by the process tree (works with torchrun, accelerate launch, etc.).

Hardware context: Captures driver version, CUDA version, and SM compute capability from NVML.

`alloc login`: Authenticate with dashboard

alloc login
# Prompts for email + password, stores token + refresh_token in ~/.alloc/config.json

alloc login --token <ACCESS_TOKEN>
# Paste an access token from the dashboard (no password prompt)

`alloc whoami`: Show current auth + org context

alloc whoami
alloc whoami --json

Prints the current identity (when logged in), plus objective, effective budget cap, and fleet counts.

`alloc logout`: Clear local session

alloc logout

Clears saved token/refresh_token from ~/.alloc/config.json.

`alloc upload`: Upload artifact to dashboard

alloc upload alloc_artifact.json.gz

Uploads a previously saved .json.gz artifact to the dashboard via POST /runs/ingest. Requires authentication (alloc login first).

If your session token has expired and a refresh_token is available (password login flow), alloc upload refreshes once and retries automatically.

`alloc catalog`: Browse GPU hardware catalog

alloc catalog list                           # list all 13 GPUs (sorted by VRAM)
alloc catalog list --sort cost               # sort by $/hr
alloc catalog list --sort tflops             # sort by BF16 TFLOPS
alloc catalog show H100                      # detailed specs for H100
alloc catalog show nvidia-a100-sxm-80gb      # lookup by stable ID

Offline reference for GPU specs, interconnect details, and cloud pricing. Supports aliases (H100, A100, T4) and stable IDs.

`alloc init`: Configure GPU fleet and budget

alloc init                     # interactive wizard
alloc init --yes               # non-interactive defaults (full catalog, 50/50 priority)
alloc init --from-org --yes    # pull fleet/budget/objective from your org (requires alloc login)

Creates a .alloc.yaml file in the current directory with your GPU fleet, explore list, budget, and priority weights. When present, ghost, run, and scan automatically use fleet context for recommendations. Use --no-config on any command to skip it.

`alloc version`

alloc version

Python API

import alloc

# Static VRAM analysis (never crashes your training)
report = alloc.ghost(model)
print(report.total_gb)  # e.g., 115.42

# Or from param count (no torch needed)
report = alloc.ghost(param_count_b=7.0, dtype="bf16")

Framework Callbacks

Optional callbacks for deeper profiling. Captures step-level timing, throughput, and dataloader wait estimates.

# HuggingFace Transformers
from alloc import HuggingFaceCallback
trainer = Trainer(..., callbacks=[HuggingFaceCallback()])

# PyTorch Lightning
from alloc import LightningCallback
trainer = Trainer(..., callbacks=[LightningCallback()])

Callbacks write a .alloc_callback.json sidecar with step time (p50/p90), samples/sec, and estimated dataloader wait %. This unlocks higher confidence analysis and dataloader bottleneck detection.

Configuration

Alloc works with zero config. You can optionally configure it with environment variables and/or a .alloc.yaml in your repo.

Variable	Default	Description
`ALLOC_API_URL`	`https://alloc-production-ffc2.up.railway.app`	API endpoint for remote scans
`ALLOC_TOKEN`	(empty)	Auth token for API calls
`ALLOC_UPLOAD`	`false`	Upload results to dashboard (`alloc run --upload` also works)
`ALLOC_OUT`	`alloc_artifact.json.gz`	Artifact output path
`ALLOC_GPU_COUNT_CANDIDATES`	(empty)	Override GPU-count candidates for ranking (comma-separated ints)

Architecture

Module	Purpose
`ghost.py`	VRAM estimation from parameter count. Computes weights + gradients + optimizer + activations + buffer breakdown.
`model_extractor.py`	Three-method model discovery: subprocess execution (`nn.Module` finder), AST parsing (`from_pretrained`), manual override.
`probe.py`	External GPU monitoring via `pynvml`. Process-tree aware multi-GPU discovery. Captures hardware context (driver, CUDA, SM version).
`stability.py`	Multi-signal stability detection for calibrate-and-exit (VRAM plateau + util std dev + power std dev).
`catalog/`	Bundled GPU hardware catalog (13 GPUs) with specs and pricing. Powers `alloc catalog` commands.
`context.py`	Context autodiscovery: git (SHA, branch, repo), container (Docker/Podman), Ray (job ID, cluster).
`artifact_writer.py`	Artifact Writer: writes `alloc_artifact.json.gz` with probe, ghost, hardware, and context sections.
`cli.py`	Typer CLI with `ghost`, `run`, `scan`, `login`, `upload`, `init`, `catalog`, `version` commands.
`yaml_config.py`	`.alloc.yaml` parser: fleet, explore, priority, budget. Loaded automatically by `ghost`, `run`, `scan`.
`callbacks.py`	Framework callbacks: HuggingFace `TrainerCallback` and Lightning `Callback` with step timing (p50/p90), throughput, and dataloader wait estimation.
`upload.py`	Artifact uploader: POSTs `.json.gz` to `POST /runs/ingest`.
`display.py`	Rich terminal formatting for reports.
`config.py`	Env-var-only configuration (API URL, Supabase URL, token storage).

Design Principles

Zero config: alloc run python train.py works out of the box
No monkey-patching: External monitoring only; deeper signals are opt-in
Never crash user's training: All Alloc failures are caught and training continues
Progressive disclosure: Individual use first, team governance later

Telemetry Levels

Alloc intentionally starts non-invasive and adds richer signals only when you opt in.

NVML (today): peak VRAM, GPU utilization, power draw, basic hardware context (driver/CUDA/SM), multi-GPU discovery from the process tree.
Framework timing (today, opt-in): step time p50/p90, samples/sec, estimated dataloader wait percentage via HF/Lightning callbacks.
Distributed timing (planned, opt-in): per-rank timing skew, communication overhead, stronger interconnect-aware recommendations.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.16

Mar 20, 2026

0.0.15

Mar 19, 2026

0.0.14

Mar 17, 2026

0.0.13

Mar 17, 2026

0.0.12

Mar 17, 2026

0.0.11

Mar 17, 2026

0.0.10

Mar 16, 2026

0.0.9

Mar 14, 2026

0.0.8

Mar 14, 2026

0.0.7

Mar 14, 2026

0.0.6

Mar 14, 2026

0.0.5

Mar 9, 2026

0.0.4

Mar 9, 2026

0.0.3

Feb 22, 2026

This version

0.0.1

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alloc-0.0.1.tar.gz (68.4 kB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alloc-0.0.1-py3-none-any.whl (53.2 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file alloc-0.0.1.tar.gz.

File metadata

Download URL: alloc-0.0.1.tar.gz
Upload date: Feb 21, 2026
Size: 68.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alloc-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2a43139e98a8e6293c2ff5028fb7b435e5ec2a66132f9393ab5648bb94fea7ef`
MD5	`c0c77a12d2b06532cc53ba273cf41d14`
BLAKE2b-256	`20f218c8fe4e43c372cbe2ae2eb16cd88070c9bff99661786291eb2a102d45cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloc-0.0.1.tar.gz:

Publisher: publish-pypi.yml on alloc-labs/platform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloc-0.0.1.tar.gz
- Subject digest: 2a43139e98a8e6293c2ff5028fb7b435e5ec2a66132f9393ab5648bb94fea7ef
- Sigstore transparency entry: 975682919
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: alloc-labs/platform@34ba261a1bffd5c8daf5a9817da56704217d60b1
- Branch / Tag: refs/tags/alloc-v0.0.1
- Owner: https://github.com/alloc-labs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@34ba261a1bffd5c8daf5a9817da56704217d60b1
- Trigger Event: push

File details

Details for the file alloc-0.0.1-py3-none-any.whl.

File metadata

Download URL: alloc-0.0.1-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 53.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alloc-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`057e11a349dec801c5889de45f094c74174747d546f9b02a02c9857b71143b27`
MD5	`954dd708bfdc2b0dd79010b9ad53ee75`
BLAKE2b-256	`d41c059d014ede3210ac8fe873b51bbd24c63e180d4ce01786fb9f814663507a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alloc-0.0.1-py3-none-any.whl:

Publisher: publish-pypi.yml on alloc-labs/platform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alloc-0.0.1-py3-none-any.whl
- Subject digest: 057e11a349dec801c5889de45f094c74174747d546f9b02a02c9857b71143b27
- Sigstore transparency entry: 975682927
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: alloc-labs/platform@34ba261a1bffd5c8daf5a9817da56704217d60b1
- Branch / Tag: refs/tags/alloc-v0.0.1
- Owner: https://github.com/alloc-labs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@34ba261a1bffd5c8daf5a9817da56704217d60b1
- Trigger Event: push

alloc 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

alloc (by Alloc Labs)

What Alloc Does

Who This Is For

Why It Is Low Friction

Install

Commands

alloc scan: Remote Ghost Scan (no GPU needed)

alloc ghost: Local VRAM estimation

alloc run: Training with GPU monitoring

alloc login: Authenticate with dashboard

alloc whoami: Show current auth + org context

alloc logout: Clear local session

alloc upload: Upload artifact to dashboard

alloc catalog: Browse GPU hardware catalog

alloc init: Configure GPU fleet and budget

alloc version

Python API

Framework Callbacks

Configuration

Architecture

Design Principles

Telemetry Levels

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`alloc scan`: Remote Ghost Scan (no GPU needed)

`alloc ghost`: Local VRAM estimation

`alloc run`: Training with GPU monitoring

`alloc login`: Authenticate with dashboard

`alloc whoami`: Show current auth + org context

`alloc logout`: Clear local session

`alloc upload`: Upload artifact to dashboard

`alloc catalog`: Browse GPU hardware catalog

`alloc init`: Configure GPU fleet and budget

`alloc version`