GPU Auto-Optimizer: automatically finds the fastest stable batch/precision configuration for ML scripts and vLLM.
Project description
autovram — GPU Auto-Optimizer
Stop guessing batch size and precision. autovram automatically searches for the fastest stable configuration (batch size, microbatch, precision, accumulation, vLLM knobs) for your machine and exports it as a reusable config.
- Works offline (no telemetry, no paid APIs)
- macOS/Linux first, Windows best-effort
- NVIDIA CUDA supported; Apple MPS / AMD ROCm best-effort detection + helpful messaging
- CLI + library API
Status: MVP (alpha). Contributions welcome.
Quickstart
Install
Recommended (isolated):
pipx install autovram
Editable (for development):
git clone https://github.com/your-org/autovram
cd autovram
pipx install -e .
1) Inspect your system
autovram info
Expected output (example):
System summary
──────────────
OS: Linux (x86_64)
Python: 3.11.7
PyTorch: 2.4.0
Compute: CUDA
GPU: NVIDIA RTX 4090
VRAM: 24564 MiB
2) Tune an ML script (script mode)
Run your script multiple times while autovram changes environment variables.
autovram tune --cmd "python examples/torch_train_demo.py" --timeout 60
Example output (GIF-friendly):
System summary
──────────────
OS: Linux (x86_64)
Python: 3.11.7
PyTorch: 2.4.0
Compute: CUDA
GPU: NVIDIA RTX 4090
VRAM: 24564 MiB
Tuning (engine=heuristic, metric=it_per_s)
────────────────────────────────────────
Trial 1 cfg=batch_size=1 precision=fp16 → OK it_per_s=21.4
Trial 2 cfg=batch_size=2 precision=fp16 → OK it_per_s=39.7
Trial 3 cfg=batch_size=4 precision=fp16 → OOM
Binary search between 2 and 4...
Trial 4 cfg=batch_size=3 precision=fp16 → OK it_per_s=55.2
Best stable config
──────────────────
{
"batch_size": 3,
"micro_batch": 3,
"grad_accum": 1,
"precision": "fp16"
}
Exported: .autovram/config.json
Run directory: .autovram/runs/2026-02-12_15-56-02
3) Export the config
autovram export --format dotenv --out .env.autovram
Why this exists
VRAM tuning is still mostly folklore:
- “Try batch size 8… now 16… oops OOM.”
- “Maybe bf16 is faster… unless it isn’t.”
- “vLLM knobs are confusing and GPU-specific.”
autovram makes it mechanical: run a short benchmark loop, push to the edge of OOM safely, and keep the best throughput.
How it works (high-level)
For a primary knob (e.g. batch_size):
- Start conservative.
- Exponentially increase until instability/OOM.
- Binary search between last-good and first-bad.
- For each candidate: run a short benchmark window and parse a metric.
- Save all trials to a local run directory; export the best configuration.
OOM/stability detection is based on:
- non-zero exit code
- timeout/hang
- stderr OOM patterns (CUDA OOM, cuBLAS alloc failures, etc.)
Integrating with your script
In script mode, autovram communicates through environment variables.
Minimal integration
from autovram.runtime import get_runtime_config, print_metric
cfg = get_runtime_config() # reads env vars with safe defaults
# Use cfg.batch_size, cfg.precision, etc.
# ... run a tiny benchmark window ...
print_metric(it_per_s=12.34)
Your script should print metric lines like:
AUTOVRAM_METRIC it_per_s=12.34
See: examples/torch_train_demo.py.
Engines (plugins)
autovram keeps the core minimal and supports optional engines.
Built-in:
heuristic(default): conservative heuristics + safe search loopvllm: vLLM-oriented knobs (requiresvllminstalled to actually run)
Optional:
llm-autobatch: integrates https://github.com/fabriziopfannl/llm-autobatch
Install the optional engine:
pip install "autovram[llm-autobatch]"
Roadmap
- Script mode autotuning (batch size + precision)
- Offline, no telemetry
- Config export (json / dotenv, yaml optional)
- Engine abstraction + optional llm-autobatch
- Multi-GPU support (data parallel)
- Better vLLM benchmarking (client-driven QPS)
- Torch compile / cudagraphs tuning
- More robust VRAM headroom estimation
Contributing
See CONTRIBUTING.md.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autovram-0.1.0.tar.gz.
File metadata
- Download URL: autovram-0.1.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7924236be08932aecdaf6266574a6853287cbefd97aefae6bb27a29c299b59cc
|
|
| MD5 |
8b07a42e7837d09a18564a87ded5c8ba
|
|
| BLAKE2b-256 |
dbcd7810b6b967dbb445054a9dd0a208cd9cb609dded24ab0867c52571107f60
|
File details
Details for the file autovram-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autovram-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76f8a0c3e937e9d60be0128b8d62a56bd35624a1ab346a49020dba2a6a5d9813
|
|
| MD5 |
11662602e8e6fc97dfd23e2ecc6b7fb2
|
|
| BLAKE2b-256 |
9e6d31ca986926dcc742f8bf8caf41d4dc0b901b6ca8dd6f044202d9d435908b
|