Intelligent GPU-accelerated ML package installer for PyTorch, CuPy, and llama-cpp-python

These details have not been verified by PyPI

Project links

Repository

Project description

Smart GPU Package Installer

An intelligent, autonomous installer that detects your system, GPU, and CUDA/ROCm configuration to install the optimal ML ecosystem (PyTorch, CuPy, JAX, llama-cpp-python) with zero guesswork — and can optionally provision the underlying system GPU stack (CUDA toolkit + cuDNN, or ROCm) for you. I've made this from https://gist.github.com/kipavy/2c5acabbff81a410b340464a667a12c4/93a56a0cc6bcca4868bcb7724e0f7f3331b32f02 + https://pypi.org/project/torch-installer-coff33ninja/

✨ Features

🔍 Automatic hardware detection — identifies GPU vendor (NVIDIA / AMD / Apple Silicon), driver, and CUDA/ROCm version with no flags.
🧩 Synchronized version matrix — pins torch, torchvision, and torchaudio to compatible triplets, eliminating ABI mismatches.
🌐 Cross-platform — Linux (CUDA & ROCm), Windows (CUDA, with optional auto-install), macOS (MPS), and CPU-only fallback.
📦 Multi-package — installs PyTorch, CuPy, JAX, and llama-cpp-python individually or all at once, with GPU-aware skipping.
🛠️ System stack auto-install (opt-in) — installs the GPU userspace stack (CUDA toolkit + cuDNN, or ROCm) from official vendor repos via apt/dnf/winget/chocolatey. Never touches kernel drivers, and previews the full plan before any sudo step.
🗑️ Clean uninstall — --uninstall removes the selected ecosystem, including stray nvidia-* fat-wheels, for a sterile environment.
🧹 Sterile cross-grades — purges stale nvidia-* fat-wheels and CPU-only builds before reinstalling.
✅ Isolated verification — pre- and post-install checks run in a subprocess to defeat sys.modules caching, with retry/backoff for linker sync lag.
♻️ Smart, idempotent reinstall — skips packages already working and only acts when the environment actually needs it.
🚀 Zero-install usage — run on the fly via uvx, or embed programmatically through the dependency-free ensure() API.
⚡ Faster installs with uv — automatically uses uv when available for significantly faster installs, falling back to pip transparently.
🔬 Dry-run & diagnostics — preview exact commands, inspect detected GPU info, and list supported CUDA wheel versions without touching your environment.

🚀 Quick Start

The fastest path — no install, no manual steps. With uv installed, one command fetches the tool on the fly and installs the right GPU wheels into your active environment:

# Detect GPU/CUDA and install the full ecosystem (torch + cupy + jax + llamacpp)
uvx gpu-installer

# Just PyTorch — space- or comma-separated for a subset
uvx gpu-installer torch
uvx gpu-installer torch,cupy

# Preview only — print the exact commands without running them
uvx gpu-installer --dry-run

# To use the latest unreleased version from GitLab instead of PyPI:
# uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer

uvx installs into the currently active venv/conda env. If none is active, add --python /path/to/python so it knows where to install.

Prefer a traditional install? Install the CLI once, then call it directly:

pip install gpu-installer   # or: uv pip install gpu-installer

gpu-installer                 # full auto-detect + install
gpu-installer --cpu-only      # force CPU-only builds
gpu-installer torch cupy      # a subset (space- or comma-separated)
gpu-installer --dry-run       # preview without executing

Why This Exists

Installing GPU-accelerated ML packages is notoriously fragile. The wrong wheel, a stale nvidia-* package, or a uv/pip index race can silently install a CPU-only build or brick your environment. This script handles all of that:

Detects your GPU vendor (NVIDIA / AMD / Apple Silicon) and driver version automatically
Pins all three packages to a mathematically synchronized release matrix — no more torch and torchvision ABI mismatches
Purges stale nvidia-* fat-wheel residue before any cross-grade
Runs pre- and post-install verification in an isolated subprocess to defeat sys.modules caching
Retries dynamic linker sync after fast installs (uv) with exponential backoff

Supported Platforms

Platform	GPU Backend	Notes
Linux	NVIDIA CUDA	Auto-detected via `nvcc` / `nvidia-smi`; optional toolkit+cuDNN auto-install via apt/dnf
Linux	AMD ROCm	Auto-detected via `rocminfo` / `rocm-smi`; optional ROCm userspace auto-install via apt/dnf
Windows	NVIDIA CUDA	Optional auto-install via `winget` / `chocolatey`
macOS	Apple MPS (Metal)	Auto-detected, no CUDA needed
Any	CPU-only	Use `--cpu-only`

Requirements

Python 3.8+
uv (recommended, falls back to pip automatically)
NVIDIA drivers / ROCm stack already installed for GPU builds

Shell Completion (optional)

Tab-completion for package names and flags is provided via argcomplete. It's an optional extra — the core tool stays dependency-free.

# Install with the completion extra
pip install "gpu-installer[completion]"

# Activate for the current shell (bash / zsh)
eval "$(register-python-argcomplete gpu-installer)"

# Fish
register-python-argcomplete --shell fish gpu-installer | source

Add the eval line to your ~/.bashrc / ~/.zshrc to make it permanent. For fish, write it to a completions file instead:

register-python-argcomplete --shell fish gpu-installer > ~/.config/fish/completions/gpu-installer.fish

Then gpu-installer <TAB> completes torch, cupy, jax, and llamacpp.

Colored Help (optional)

Colorized --help output is provided via rich-argparse. Like completion, it's an optional extra — the core tool stays dependency-free and falls back to plain help when it isn't installed.

pip install "gpu-installer[color]"

Use in Another Project

Other projects can install their GPU dependencies through gpu-installer without hooking into pip install (wheels have no reliable install-time hook). Trigger it explicitly — from a setup command, a first-run guard, or your app's startup.

Zero permanent dependency (recommended). Shell out via uvx, passing your own interpreter so the target is unambiguous. gpu-installer is fetched and run on the fly — never added to your dependency tree:

import subprocess, sys

subprocess.run(
    [
        "uvx", "gpu-installer", "--python", sys.executable, "torch", "cupy",
    ],
    check=True,
)

Programmatic API (if you accept the tiny zero-dep dependency). Add gpu-installer to your deps and call ensure() — it is idempotent (skips packages already working) and returns a structured result:

from gpu_installer import ensure

result = ensure(["torch", "cupy"])   # installs into the active environment
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.failed}")

ensure() accepts a list or comma string, plus python=, cpu_only=, force_cuda=, force_reinstall=, dry_run=, and quiet=. It returns an EnsureResult with target_python, cuda, installed, skipped, failed, errors (a {package: reason} map for anything that failed — carrying the installer's captured error text or the GPU probe's real failure message, not just a category), and an ok flag.

Pass quiet=True to keep gpu-installer's own status output off your stdout — it's routed to the gpu_installer logger instead, so configure that logger to capture it (the underlying uv/pip install still streams its progress):

import logging
logging.getLogger("gpu_installer").addHandler(logging.StreamHandler())

result = ensure(["torch", "cupy"], quiet=True)
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.errors}")

CUDA / PyTorch Compatibility Matrix

The script uses a strict pinned matrix to prevent rolling-release index desyncs where torch and torchvision resolve to incompatible builds:

CUDA Version	PyTorch	TorchVision	TorchAudio
11.6	2.1.2	0.16.2	2.1.2
11.7	2.2.2	0.17.2	2.2.2
11.8	2.5.1	0.20.1	2.5.1
12.1	2.5.1	0.20.1	2.5.1
12.4	2.5.1	0.20.1	2.5.1
12.6	2.11.0	0.26.0	2.11.0
12.8	2.11.0	0.26.0	2.11.0
13.0	2.11.0	0.26.0	2.11.0

If your exact CUDA version isn't listed, the script automatically picks the highest compatible version below yours.

CLI Reference

Install Options

Argument	Default	Description
`[PACKAGE ...]`	all	Packages to install, space- or comma-separated (`torch`, `cupy`, `jax`, `llamacpp`)
`-p`, `--python PATH`	active env	Target interpreter to install into (active venv/conda env by default)
`--cpu-only`	off	Force CPU-only builds for all packages
`--force-cuda VER`	auto	Override detected CUDA version (e.g. `121`, `cu12.1`)
`--force-reinstall`	off	Force reinstall even if already detected as working
`-u`, `--uninstall`	off	Uninstall the selected packages' distributions (default: all). Mutually exclusive with `--force-reinstall`
`-n`, `--dry-run`	off	Print all commands that would run, without executing
`--build`	off	Compile `llama-cpp-python` from source (CUDA/Metal). Other packages ignore it. Auto-detects the GPU's CUDA arch and forces cuBLAS
`--cmake-args STR`	none	Extra cmake defines appended to a `--build` (e.g. `"-DGGML_CUDA_FORCE_MMQ=on"`); a pre-set `CMAKE_ARGS` env var is also honored
`-l`, `--log`	off	Tee all output to a timestamped log file

Diagnostic / Info Commands

Flag	Description
`--gpu-info`	Show detected GPU model, VRAM, CUDA version, and upgrade guidance
`-d`, `--doctor`	Diagnose the active environment — run GPU verification probes on each package
`--list-cuda`	List all supported CUDA wheel versions
`--show-matching`	Show which wheel version your detected CUDA maps to

System Stack Auto-Install (opt-in)

Installs the GPU userspace stack from official vendor repos. Never touches kernel drivers. Privileged steps run via sudo after showing the full plan.

Flag	Description
`--auto-install-system`	Install the system stack: CUDA toolkit + cuDNN (NVIDIA) or ROCm (AMD) on Linux; CUDA via `winget`/`chocolatey` on Windows
`--cuda-version VER`	Toolkit version for auto-install (e.g. `12.4`)
`--yes` / `-y`	Skip the confirmation prompt (scripted / CI use)

Linux coverage: apt (Ubuntu/Debian) and dnf (RHEL/Fedora/Rocky). Always preview with --dry-run first.

Packages Installed

PyTorch Ecosystem (`torch`)

Installs torch, torchvision, and torchaudio as a pinned, synchronized triplet from the official PyTorch wheel index. Handles CUDA, ROCm, MPS, and CPU targets.

CuPy (`cupy`)

Installs the appropriate cupy-cuda{major}x wheel based on your detected CUDA major version. Skipped automatically on CPU-only or MPS systems.

JAX (`jax`)

Installs jax[cuda12] or jax[cuda13] keyed to your detected CUDA major version (the wheels bundle their own CUDA redistributables, so only a recent NVIDIA driver is required). Fully standalone — it does not depend on your installed torch. Plain jax is a real CPU build, so --cpu-only (and macOS, where JAX has no Metal backend) installs it like llama-cpp rather than skipping. CUDA wheels are Linux-only, so the GPU path is skipped on Windows (use WSL2 or --cpu-only) and on ROCm (experimental, local-build-only upstream).

Llama-CPP-Python (`llamacpp`)

Installs llama-cpp-python with GPU offload support via the abetlen pre-built wheel index. Falls back to a CPU build if no CUDA/ROCm is detected.

When a prebuilt wheel is broken on your machine (a common symptom is a clean install that segfaults on the GPU check), pass --build to compile from source instead. The build auto-detects your GPU's CUDA architecture (-DCMAKE_CUDA_ARCHITECTURES), forces cuBLAS, and sets FORCE_CMAKE=1; it needs nvcc and a C++ compiler on PATH. Add extra cmake defines with --cmake-args or a pre-set CMAKE_ARGS environment variable. Even without --build, a prebuilt wheel that fails GPU verification is automatically rebuilt from source when the toolchain is available.

Adding a Package

Each GPU package is a GpuPackage subclass in src/gpu_installer/packages/. To add one:

Create src/gpu_installer/packages/<name>.py with a class implementing install(self, plan) -> Outcome and verify(self, python=None) -> bool (optionally override preflight and declare a purge_names list).
Register it in the _PACKAGES tuple in packages/__init__.py.

That's the whole change — the CLI, ensure(), and the purge logic pick it up automatically.

How Detection Works

nvidia-smi / nvcc          →  CUDA version
rocminfo / rocm-smi        →  ROCm version
platform.system() Darwin   →  Apple MPS
(none found)               →  CPU-only

GPU model and VRAM are also parsed from nvidia-smi / rocminfo output for upgrade guidance.

Smart Reinstall Logic

The script avoids unnecessary reinstalls by checking the current state before acting:

Detected State	Action
Not installed	Install
CPU-only build, GPU requested	Auto-purge + upgrade
CUDA broken / ABI mismatch	Auto-purge + reinstall
Already working (CUDA tensor test passes)	Skip
`--force-reinstall` passed	Always purge + reinstall

The purge step removes torch, torchvision, torchaudio, cupy, jax, jaxlib (plus its jax-cuda* plugins), llama-cpp-python, and all nvidia-* fat-wheel packages scraped from pip freeze to ensure a sterile environment before reinstalling.

Post-Install Verification

After installation, the script runs isolated subprocess checks for each package:

PyTorch: imports torch, checks cuda.is_available(), runs a .cuda() tensor allocation test. Retries up to 4 times with backoff (3s / 5s / 8s) to handle dynamic linker sync lag after uv installs.
CuPy: imports cupy, calls cupy.cuda.runtime.getDeviceCount().
JAX: imports jax, inspects jax.devices(), and — when a gpu backend is present — runs a small device computation (block_until_ready()) to confirm acceleration (a CPU build passes too). Retries with backoff like PyTorch/CuPy to ride out linker sync lag.
Llama-CPP: calls llama_supports_gpu_offload() to confirm hardware acceleration.

Examples

# Check what GPU and CUDA are detected
gpu-installer --gpu-info

# Diagnose the environment without reinstalling anything
gpu-installer --doctor

# Install only PyTorch and Llama-CPP, forcing CUDA 12.1
gpu-installer torch llamacpp --force-cuda 121

# Simulate a full install on an AMD ROCm system (dry run)
gpu-installer --dry-run

# Force a clean reinstall of everything with logging
gpu-installer --force-reinstall --log

# Uninstall just PyTorch (torch, torchvision, torchaudio)
gpu-installer --uninstall torch

# Preview removal of the whole ecosystem without executing
gpu-installer --uninstall --dry-run

# Auto-install the system GPU stack (preview first!)
gpu-installer --auto-install-system --dry-run
gpu-installer --auto-install-system --cuda-version 12.4

# Then install the Python packages
gpu-installer

Troubleshooting

"CUDA available but tensor test failed" This is usually a dynamic linker sync issue immediately after a uv install. The verifier retries automatically. If it persists, run gpu-installer --doctor a few seconds later — it will re-verify without reinstalling.

Llama-CPP installs but the GPU check segfaults Some prebuilt llama-cpp-python wheels crash on specific GPUs. Run gpu-installer llamacpp --build to compile from source with your GPU's CUDA architecture and cuBLAS. If the build needs extra cmake flags, pass them with --cmake-args "...". The installer also attempts this rebuild automatically when a prebuilt wheel fails verification and a compiler toolchain is present.

"CPU-only version detected" Your installed torch was built without CUDA. Run gpu-installer --force-reinstall to trigger an auto-upgrade.

CuPy skipped on a CUDA machine CuPy requires NVIDIA or AMD GPUs. It is intentionally skipped on macOS/MPS and CPU-only environments.

uv not found The script falls back to pip automatically. Install uv for significantly faster installs: pip install uv.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

4.5.2

Jun 10, 2026

4.5.1

Jun 10, 2026

4.5.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_installer-4.5.2.tar.gz (139.6 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpu_installer-4.5.2-py3-none-any.whl (53.8 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file gpu_installer-4.5.2.tar.gz.

File metadata

Download URL: gpu_installer-4.5.2.tar.gz
Upload date: Jun 10, 2026
Size: 139.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.2.tar.gz
Algorithm	Hash digest
SHA256	`fb64224f7cb907dd88e01add4b770f11054b9cc2c9d71effec47b57ee3c5d380`
MD5	`5c9c0cdf3f28fae0a5f53384c613c4df`
BLAKE2b-256	`87396cca76a0a4242c85a756dbb068cdd64b52f385f47358a37b620895ea748c`

See more details on using hashes here.

File details

Details for the file gpu_installer-4.5.2-py3-none-any.whl.

File metadata

Download URL: gpu_installer-4.5.2-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 53.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83d26b31e890c7a54fe86a5dea9efc782ac31a4c53d6a1295c88c5d8aa8f4846`
MD5	`5a430dbb195f779807d415eef7653a81`
BLAKE2b-256	`64a4a4a8fc88a09394328995c4b5b8cf8dbd3fba96a97efd3a0c8b3575ca4e62`

See more details on using hashes here.

gpu-installer 4.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Smart GPU Package Installer

✨ Features

🚀 Quick Start

Why This Exists

Supported Platforms

Requirements

Shell Completion (optional)

Colored Help (optional)

Use in Another Project

CUDA / PyTorch Compatibility Matrix

CLI Reference

Install Options

Diagnostic / Info Commands

System Stack Auto-Install (opt-in)

Packages Installed

PyTorch Ecosystem (torch)

CuPy (cupy)

JAX (jax)

Llama-CPP-Python (llamacpp)

Adding a Package

How Detection Works

Smart Reinstall Logic

Post-Install Verification

Examples

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

PyTorch Ecosystem (`torch`)

CuPy (`cupy`)

JAX (`jax`)

Llama-CPP-Python (`llamacpp`)