CLI tool to check GPU VRAM before loading AI models
Project description
GPU Memory Guard
A CLI utility that checks available GPU VRAM before you load AI models. Prevents OOM crashes that force a full system reboot.
Why?
If you run local inference on consumer GPUs, you know the pain:
| Without gpu-memory-guard | With gpu-memory-guard |
|---|---|
| Load 70B model on 24GB card | Check VRAM before loading |
| System freezes, GPU hangs | Get a clear warning in terminal |
| Force reboot, lose unsaved work | Pick a smaller model or free memory |
| Repeat next week | Zero OOM crashes |
One command saves you from constant reboots.
Quick Start
git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .
# Check current GPU status
gpu-guard
# Check if an 18GB model fits with 2GB safety buffer
gpu-guard --model-size 18 --buffer 2
Example output:
GPU 0: NVIDIA GeForce RTX 5090
Total: 32.00 GB
Used: 4.12 GB
Available: 27.88 GB
Model size: 18.00 GB (buffer: 2.00 GB)
Status: OK - model fits with 7.88 GB to spare
Installation
From source (recommended)
git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .
Requirements
- Python 3.8+
- NVIDIA GPU with
nvidia-smiinstalled, OR pynvmlPython package (pip install pynvml)
Usage
CLI
# Basic VRAM check
gpu-guard
# Check if a model fits (size in GB)
gpu-guard --model-size 13
# Custom safety buffer (default: 1GB)
gpu-guard --model-size 18 --buffer 2
# JSON output for scripting
gpu-guard --model-size 13 --json
# Quiet mode: exit code only (0 = fits, 1 = doesn't)
gpu-guard --model-size 7 --quiet
As a Python library
from gpu_guard import check_vram, can_load_model, get_gpu_info
# Check current VRAM
gpu_info = get_gpu_info()
for gpu in gpu_info:
print(f"GPU {gpu.device_id}: {gpu.available_memory_gb:.2f}GB available")
# Check if a model fits
result = can_load_model(model_size_gb=13.0, buffer_gb=2.0)
if result.fits:
print("Safe to load")
else:
print(f"Need {result.shortage_gb:.2f}GB more VRAM")
Scripting example
# Pre-check before launching inference
if gpu-guard --model-size 13 --quiet; then
python run_inference.py --model llama-13b
else
echo "Not enough VRAM, switching to 7B model"
python run_inference.py --model llama-7b
fi
Common model sizes (approximate VRAM)
| Model | FP16 | Q4 (GGUF) |
|---|---|---|
| 7B params | ~14 GB | ~4 GB |
| 13B params | ~26 GB | ~7 GB |
| 33B params | ~66 GB | ~18 GB |
| 70B params | ~140 GB | ~35 GB |
Roadmap
- AMD ROCm support
- Memory estimation by model architecture
- Multi-GPU split recommendations
- PyPI package (
pip install gpu-memory-guard) - Integration with Ollama and vLLM
Contributing
PRs welcome. If you want to add AMD ROCm support or model-specific memory estimation, open an issue first so we can discuss the approach.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpu_memory_guard-0.1.0.tar.gz.
File metadata
- Download URL: gpu_memory_guard-0.1.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27da21a8a5806631f9ec63ec483ec88490c929eaf83b66ce10fc28809add754f
|
|
| MD5 |
0e173dca3714eec71bab205c12224005
|
|
| BLAKE2b-256 |
144a85b524549ee4c00fa09f195067ed8a629f0bb75d35cbde62027d031fc4ab
|
Provenance
The following attestation bundles were made for gpu_memory_guard-0.1.0.tar.gz:
Publisher:
publish.yml on CastelDazur/gpu-memory-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gpu_memory_guard-0.1.0.tar.gz -
Subject digest:
27da21a8a5806631f9ec63ec483ec88490c929eaf83b66ce10fc28809add754f - Sigstore transparency entry: 1239530353
- Sigstore integration time:
-
Permalink:
CastelDazur/gpu-memory-guard@dd7866dfd73cad722a1fd614c0a1bd18fad8ca1c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CastelDazur
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd7866dfd73cad722a1fd614c0a1bd18fad8ca1c -
Trigger Event:
release
-
Statement type:
File details
Details for the file gpu_memory_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gpu_memory_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f1148ec1debe8a2280d70052ece86704c58f1701059a8889a3828f7d4bb103e
|
|
| MD5 |
2ea251bc507e229ad7ae1fd35c6a2830
|
|
| BLAKE2b-256 |
f66c12882ec407db71275428795e438a61259d451bd35083cba9dd4af217b5cb
|
Provenance
The following attestation bundles were made for gpu_memory_guard-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on CastelDazur/gpu-memory-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gpu_memory_guard-0.1.0-py3-none-any.whl -
Subject digest:
0f1148ec1debe8a2280d70052ece86704c58f1701059a8889a3828f7d4bb103e - Sigstore transparency entry: 1239530354
- Sigstore integration time:
-
Permalink:
CastelDazur/gpu-memory-guard@dd7866dfd73cad722a1fd614c0a1bd18fad8ca1c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CastelDazur
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd7866dfd73cad722a1fd614c0a1bd18fad8ca1c -
Trigger Event:
release
-
Statement type: