Skip to main content

CLI tool to check GPU VRAM before loading AI models

Project description

GPU Memory Guard

Python 3.8+ License: MIT GitHub stars

A CLI utility that checks available GPU VRAM before you load AI models. Prevents OOM crashes that force a full system reboot.

Why?

If you run local inference on consumer GPUs, you know the pain:

Without gpu-memory-guard With gpu-memory-guard
Load 70B model on 24GB card Check VRAM before loading
System freezes, GPU hangs Get a clear warning in terminal
Force reboot, lose unsaved work Pick a smaller model or free memory
Repeat next week Zero OOM crashes

One command saves you from constant reboots.

Quick Start

git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .
# Check current GPU status
gpu-guard

# Check if an 18GB model fits with 2GB safety buffer
gpu-guard --model-size 18 --buffer 2

Example output:

GPU 0: NVIDIA GeForce RTX 5090
  Total:     32.00 GB
  Used:       4.12 GB
  Available: 27.88 GB

Model size: 18.00 GB (buffer: 2.00 GB)
Status: OK - model fits with 7.88 GB to spare

Installation

From source (recommended)

git clone https://github.com/CastelDazur/gpu-memory-guard.git
cd gpu-memory-guard
pip install -e .

Requirements

  • Python 3.8+
  • NVIDIA GPU with nvidia-smi installed, OR
  • pynvml Python package (pip install pynvml)

Usage

CLI

# Basic VRAM check
gpu-guard

# Check if a model fits (size in GB)
gpu-guard --model-size 13

# Custom safety buffer (default: 1GB)
gpu-guard --model-size 18 --buffer 2

# JSON output for scripting
gpu-guard --model-size 13 --json

# Quiet mode: exit code only (0 = fits, 1 = doesn't)
gpu-guard --model-size 7 --quiet

As a Python library

from gpu_guard import check_vram, can_load_model, get_gpu_info

# Check current VRAM
gpu_info = get_gpu_info()
for gpu in gpu_info:
    print(f"GPU {gpu.device_id}: {gpu.available_memory_gb:.2f}GB available")

# Check if a model fits
result = can_load_model(model_size_gb=13.0, buffer_gb=2.0)
if result.fits:
    print("Safe to load")
else:
    print(f"Need {result.shortage_gb:.2f}GB more VRAM")

Scripting example

# Pre-check before launching inference
if gpu-guard --model-size 13 --quiet; then
    python run_inference.py --model llama-13b
else
    echo "Not enough VRAM, switching to 7B model"
    python run_inference.py --model llama-7b
fi

Common model sizes (approximate VRAM)

Model FP16 Q4 (GGUF)
7B params ~14 GB ~4 GB
13B params ~26 GB ~7 GB
33B params ~66 GB ~18 GB
70B params ~140 GB ~35 GB

Roadmap

  • AMD ROCm support
  • Memory estimation by model architecture
  • Multi-GPU split recommendations
  • PyPI package (pip install gpu-memory-guard)
  • Integration with Ollama and vLLM

Contributing

PRs welcome. If you want to add AMD ROCm support or model-specific memory estimation, open an issue first so we can discuss the approach.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_memory_guard-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpu_memory_guard-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file gpu_memory_guard-0.1.0.tar.gz.

File metadata

  • Download URL: gpu_memory_guard-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gpu_memory_guard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 27da21a8a5806631f9ec63ec483ec88490c929eaf83b66ce10fc28809add754f
MD5 0e173dca3714eec71bab205c12224005
BLAKE2b-256 144a85b524549ee4c00fa09f195067ed8a629f0bb75d35cbde62027d031fc4ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpu_memory_guard-0.1.0.tar.gz:

Publisher: publish.yml on CastelDazur/gpu-memory-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gpu_memory_guard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gpu_memory_guard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f1148ec1debe8a2280d70052ece86704c58f1701059a8889a3828f7d4bb103e
MD5 2ea251bc507e229ad7ae1fd35c6a2830
BLAKE2b-256 f66c12882ec407db71275428795e438a61259d451bd35083cba9dd4af217b5cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpu_memory_guard-0.1.0-py3-none-any.whl:

Publisher: publish.yml on CastelDazur/gpu-memory-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page