Intelligent GPU-accelerated ML package installer for PyTorch, CuPy, and llama-cpp-python
Project description
Smart GPU Package Installer
An intelligent, autonomous installer that detects your system, GPU, and CUDA/ROCm configuration to install the optimal ML ecosystem (PyTorch, CuPy, JAX, llama-cpp-python) with zero guesswork — and can optionally provision the underlying system GPU stack (CUDA toolkit + cuDNN, or ROCm) for you. I've made this from https://gist.github.com/kipavy/2c5acabbff81a410b340464a667a12c4/93a56a0cc6bcca4868bcb7724e0f7f3331b32f02 + https://pypi.org/project/torch-installer-coff33ninja/
✨ Features
- 🔍 Automatic hardware detection — identifies GPU vendor (NVIDIA / AMD / Apple Silicon), driver, and CUDA/ROCm version with no flags.
- 🧩 Synchronized version matrix — pins
torch,torchvision, andtorchaudioto compatible triplets, eliminating ABI mismatches. - 🌐 Cross-platform — Linux (CUDA & ROCm), Windows (CUDA, with optional auto-install), macOS (MPS), and CPU-only fallback.
- 📦 Multi-package — installs PyTorch, CuPy, JAX, and llama-cpp-python individually or all at once, with GPU-aware skipping.
- 🛠️ System stack auto-install (opt-in) — installs the GPU userspace stack (CUDA toolkit + cuDNN, or ROCm) from official vendor repos via
apt/dnf/winget/chocolatey. Never touches kernel drivers, and previews the full plan before anysudostep. - 🗑️ Clean uninstall —
--uninstallremoves the selected ecosystem, including straynvidia-*fat-wheels, for a sterile environment. - 🧹 Sterile cross-grades — purges stale
nvidia-*fat-wheels and CPU-only builds before reinstalling. - ✅ Isolated verification — pre- and post-install checks run in a subprocess to defeat
sys.modulescaching, with retry/backoff for linker sync lag. - ♻️ Smart, idempotent reinstall — skips packages already working and only acts when the environment actually needs it.
- 🚀 Zero-install usage — run on the fly via
uvx, or embed programmatically through the dependency-freeensure()API. - ⚡ Faster installs with
uv— automatically usesuvwhen available for significantly faster installs, falling back topiptransparently. - 🔬 Dry-run & diagnostics — preview exact commands, inspect detected GPU info, and list supported CUDA wheel versions without touching your environment.
🚀 Quick Start
The fastest path — no install, no manual steps. With uv installed, one command fetches the tool on the fly and installs the right GPU wheels into your active environment:
# Detect GPU/CUDA and install the full ecosystem (torch + cupy + jax + llamacpp)
uvx gpu-installer
# Just PyTorch — space- or comma-separated for a subset
uvx gpu-installer torch
uvx gpu-installer torch,cupy
# Preview only — print the exact commands without running them
uvx gpu-installer --dry-run
# To use the latest unreleased version from GitLab instead of PyPI:
# uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer
uvxinstalls into the currently active venv/conda env. If none is active, add--python /path/to/pythonso it knows where to install.
Prefer a traditional install? Install the CLI once, then call it directly:
pip install gpu-installer # or: uv pip install gpu-installer
gpu-installer # full auto-detect + install
gpu-installer --cpu-only # force CPU-only builds
gpu-installer torch cupy # a subset (space- or comma-separated)
gpu-installer --dry-run # preview without executing
Why This Exists
Installing GPU-accelerated ML packages is notoriously fragile. The wrong wheel, a stale nvidia-* package, or a uv/pip index race can silently install a CPU-only build or brick your environment. This script handles all of that:
- Detects your GPU vendor (NVIDIA / AMD / Apple Silicon) and driver version automatically
- Pins all three packages to a mathematically synchronized release matrix — no more
torchandtorchvisionABI mismatches - Purges stale
nvidia-*fat-wheel residue before any cross-grade - Runs pre- and post-install verification in an isolated subprocess to defeat
sys.modulescaching - Retries dynamic linker sync after fast installs (
uv) with exponential backoff
Supported Platforms
| Platform | GPU Backend | Notes |
|---|---|---|
| Linux | NVIDIA CUDA | Auto-detected via nvcc / nvidia-smi; optional toolkit+cuDNN auto-install via apt/dnf |
| Linux | AMD ROCm | Auto-detected via rocminfo / rocm-smi; optional ROCm userspace auto-install via apt/dnf |
| Windows | NVIDIA CUDA | Optional auto-install via winget / chocolatey |
| macOS | Apple MPS (Metal) | Auto-detected, no CUDA needed |
| Any | CPU-only | Use --cpu-only |
Requirements
- Python 3.8+
uv(recommended, falls back topipautomatically)- NVIDIA drivers / ROCm stack already installed for GPU builds
Shell Completion (optional)
Tab-completion for package names and flags is provided via
argcomplete. It's an optional
extra — the core tool stays dependency-free.
# Install with the completion extra
pip install "gpu-installer[completion]"
# Activate for the current shell (bash / zsh)
eval "$(register-python-argcomplete gpu-installer)"
# Fish
register-python-argcomplete --shell fish gpu-installer | source
Add the eval line to your ~/.bashrc / ~/.zshrc to make it permanent. For
fish, write it to a completions file instead:
register-python-argcomplete --shell fish gpu-installer > ~/.config/fish/completions/gpu-installer.fish
Then gpu-installer <TAB> completes torch, cupy, jax, and llamacpp.
Colored Help (optional)
Colorized --help output is provided via
rich-argparse. Like completion,
it's an optional extra — the core tool stays dependency-free and falls back to
plain help when it isn't installed.
pip install "gpu-installer[color]"
Use in Another Project
Other projects can install their GPU dependencies through gpu-installer without hooking into pip install (wheels have no reliable install-time hook). Trigger it explicitly — from a setup command, a first-run guard, or your app's startup.
Zero permanent dependency (recommended). Shell out via uvx, passing your own interpreter so the target is unambiguous. gpu-installer is fetched and run on the fly — never added to your dependency tree:
import subprocess, sys
subprocess.run(
[
"uvx", "gpu-installer", "--python", sys.executable, "torch", "cupy",
],
check=True,
)
Programmatic API (if you accept the tiny zero-dep dependency). Add gpu-installer to your deps and call ensure() — it is idempotent (skips packages already working) and returns a structured result:
from gpu_installer import ensure
result = ensure(["torch", "cupy"]) # installs into the active environment
if not result.ok:
raise RuntimeError(f"GPU deps failed: {result.failed}")
ensure() accepts a list or comma string, plus python=, cpu_only=, force_cuda=, force_reinstall=, dry_run=, and quiet=. It returns an EnsureResult with target_python, cuda, installed, skipped, failed, errors (a {package: reason} map for anything that failed — carrying the installer's captured error text or the GPU probe's real failure message, not just a category), and an ok flag.
Pass quiet=True to keep gpu-installer's own status output off your stdout — it's routed to the gpu_installer logger instead, so configure that logger to capture it (the underlying uv/pip install still streams its progress):
import logging
logging.getLogger("gpu_installer").addHandler(logging.StreamHandler())
result = ensure(["torch", "cupy"], quiet=True)
if not result.ok:
raise RuntimeError(f"GPU deps failed: {result.errors}")
CUDA / PyTorch Compatibility Matrix
The script uses a strict pinned matrix to prevent rolling-release index desyncs where torch and torchvision resolve to incompatible builds:
| CUDA Version | PyTorch | TorchVision | TorchAudio |
|---|---|---|---|
| 11.6 | 2.1.2 | 0.16.2 | 2.1.2 |
| 11.7 | 2.2.2 | 0.17.2 | 2.2.2 |
| 11.8 | 2.5.1 | 0.20.1 | 2.5.1 |
| 12.1 | 2.5.1 | 0.20.1 | 2.5.1 |
| 12.4 | 2.5.1 | 0.20.1 | 2.5.1 |
| 12.6 | 2.11.0 | 0.26.0 | 2.11.0 |
| 12.8 | 2.11.0 | 0.26.0 | 2.11.0 |
| 13.0 | 2.11.0 | 0.26.0 | 2.11.0 |
If your exact CUDA version isn't listed, the script automatically picks the highest compatible version below yours.
CLI Reference
Install Options
| Argument | Default | Description |
|---|---|---|
[PACKAGE ...] |
all | Packages to install, space- or comma-separated (torch, cupy, jax, llamacpp) |
-p, --python PATH |
active env | Target interpreter to install into (active venv/conda env by default) |
--cpu-only |
off | Force CPU-only builds for all packages |
--force-cuda VER |
auto | Override detected CUDA version (e.g. 121, cu12.1) |
--force-reinstall |
off | Force reinstall even if already detected as working |
-u, --uninstall |
off | Uninstall the selected packages' distributions (default: all). Mutually exclusive with --force-reinstall |
-n, --dry-run |
off | Print all commands that would run, without executing |
--build |
off | Compile llama-cpp-python from source (CUDA/Metal). Other packages ignore it. Auto-detects the GPU's CUDA arch and forces cuBLAS |
--cmake-args STR |
none | Extra cmake defines appended to a --build (e.g. "-DGGML_CUDA_FORCE_MMQ=on"); a pre-set CMAKE_ARGS env var is also honored |
-l, --log |
off | Tee all output to a timestamped log file |
Diagnostic / Info Commands
| Flag | Description |
|---|---|
--gpu-info |
Show detected GPU model, VRAM, CUDA version, and upgrade guidance |
-d, --doctor |
Diagnose the active environment — run GPU verification probes on each package |
--list-cuda |
List all supported CUDA wheel versions |
--show-matching |
Show which wheel version your detected CUDA maps to |
System Stack Auto-Install (opt-in)
Installs the GPU userspace stack from official vendor repos. Never touches
kernel drivers. Privileged steps run via sudo after showing the full plan.
| Flag | Description |
|---|---|
--auto-install-system |
Install the system stack: CUDA toolkit + cuDNN (NVIDIA) or ROCm (AMD) on Linux; CUDA via winget/chocolatey on Windows |
--cuda-version VER |
Toolkit version for auto-install (e.g. 12.4) |
--yes / -y |
Skip the confirmation prompt (scripted / CI use) |
Linux coverage: apt (Ubuntu/Debian) and dnf (RHEL/Fedora/Rocky). Always
preview with --dry-run first.
Packages Installed
PyTorch Ecosystem (torch)
Installs torch, torchvision, and torchaudio as a pinned, synchronized triplet from the official PyTorch wheel index. Handles CUDA, ROCm, MPS, and CPU targets.
CuPy (cupy)
Installs the appropriate cupy-cuda{major}x wheel based on your detected CUDA major version. Skipped automatically on CPU-only or MPS systems.
JAX (jax)
Installs jax[cuda12] or jax[cuda13] keyed to your detected CUDA major version (the wheels bundle their own CUDA redistributables, so only a recent NVIDIA driver is required). Fully standalone — it does not depend on your installed torch. Plain jax is a real CPU build, so --cpu-only (and macOS, where JAX has no Metal backend) installs it like llama-cpp rather than skipping. CUDA wheels are Linux-only, so the GPU path is skipped on Windows (use WSL2 or --cpu-only) and on ROCm (experimental, local-build-only upstream).
Llama-CPP-Python (llamacpp)
Installs llama-cpp-python with GPU offload support via the abetlen pre-built wheel index. Falls back to a CPU build if no CUDA/ROCm is detected.
When a prebuilt wheel is broken on your machine (a common symptom is a clean install that segfaults on the GPU check), pass --build to compile from source instead. The build auto-detects your GPU's CUDA architecture (-DCMAKE_CUDA_ARCHITECTURES), forces cuBLAS, and sets FORCE_CMAKE=1; it needs nvcc and a C++ compiler on PATH. Add extra cmake defines with --cmake-args or a pre-set CMAKE_ARGS environment variable. Even without --build, a prebuilt wheel that fails GPU verification is automatically rebuilt from source when the toolchain is available.
Adding a Package
Each GPU package is a GpuPackage subclass in src/gpu_installer/packages/.
To add one:
- Create
src/gpu_installer/packages/<name>.pywith a class implementinginstall(self, plan) -> Outcomeandverify(self, python=None) -> bool(optionally overridepreflightand declare apurge_nameslist). - Register it in the
_PACKAGEStuple inpackages/__init__.py.
That's the whole change — the CLI, ensure(), and the purge logic pick it up
automatically.
How Detection Works
nvidia-smi / nvcc → CUDA version
rocminfo / rocm-smi → ROCm version
platform.system() Darwin → Apple MPS
(none found) → CPU-only
GPU model and VRAM are also parsed from nvidia-smi / rocminfo output for upgrade guidance.
Smart Reinstall Logic
The script avoids unnecessary reinstalls by checking the current state before acting:
| Detected State | Action |
|---|---|
| Not installed | Install |
| CPU-only build, GPU requested | Auto-purge + upgrade |
| CUDA broken / ABI mismatch | Auto-purge + reinstall |
| Already working (CUDA tensor test passes) | Skip |
--force-reinstall passed |
Always purge + reinstall |
The purge step removes torch, torchvision, torchaudio, cupy, jax, jaxlib (plus its jax-cuda* plugins), llama-cpp-python, and all nvidia-* fat-wheel packages scraped from pip freeze to ensure a sterile environment before reinstalling.
Post-Install Verification
After installation, the script runs isolated subprocess checks for each package:
- PyTorch: imports
torch, checkscuda.is_available(), runs a.cuda()tensor allocation test. Retries up to 4 times with backoff (3s / 5s / 8s) to handle dynamic linker sync lag afteruvinstalls. - CuPy: imports
cupy, callscupy.cuda.runtime.getDeviceCount(). - JAX: imports
jax, inspectsjax.devices(), and — when agpubackend is present — runs a small device computation (block_until_ready()) to confirm acceleration (a CPU build passes too). Retries with backoff like PyTorch/CuPy to ride out linker sync lag. - Llama-CPP: calls
llama_supports_gpu_offload()to confirm hardware acceleration.
Examples
# Check what GPU and CUDA are detected
gpu-installer --gpu-info
# Diagnose the environment without reinstalling anything
gpu-installer --doctor
# Install only PyTorch and Llama-CPP, forcing CUDA 12.1
gpu-installer torch llamacpp --force-cuda 121
# Simulate a full install on an AMD ROCm system (dry run)
gpu-installer --dry-run
# Force a clean reinstall of everything with logging
gpu-installer --force-reinstall --log
# Uninstall just PyTorch (torch, torchvision, torchaudio)
gpu-installer --uninstall torch
# Preview removal of the whole ecosystem without executing
gpu-installer --uninstall --dry-run
# Auto-install the system GPU stack (preview first!)
gpu-installer --auto-install-system --dry-run
gpu-installer --auto-install-system --cuda-version 12.4
# Then install the Python packages
gpu-installer
Troubleshooting
"CUDA available but tensor test failed"
This is usually a dynamic linker sync issue immediately after a uv install. The verifier retries automatically. If it persists, run gpu-installer --doctor a few seconds later — it will re-verify without reinstalling.
Llama-CPP installs but the GPU check segfaults
Some prebuilt llama-cpp-python wheels crash on specific GPUs. Run gpu-installer llamacpp --build to compile from source with your GPU's CUDA architecture and cuBLAS. If the build needs extra cmake flags, pass them with --cmake-args "...". The installer also attempts this rebuild automatically when a prebuilt wheel fails verification and a compiler toolchain is present.
"CPU-only version detected"
Your installed torch was built without CUDA. Run gpu-installer --force-reinstall to trigger an auto-upgrade.
CuPy skipped on a CUDA machine CuPy requires NVIDIA or AMD GPUs. It is intentionally skipped on macOS/MPS and CPU-only environments.
uv not found
The script falls back to pip automatically. Install uv for significantly faster installs: pip install uv.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpu_installer-4.5.2.tar.gz.
File metadata
- Download URL: gpu_installer-4.5.2.tar.gz
- Upload date:
- Size: 139.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb64224f7cb907dd88e01add4b770f11054b9cc2c9d71effec47b57ee3c5d380
|
|
| MD5 |
5c9c0cdf3f28fae0a5f53384c613c4df
|
|
| BLAKE2b-256 |
87396cca76a0a4242c85a756dbb068cdd64b52f385f47358a37b620895ea748c
|
File details
Details for the file gpu_installer-4.5.2-py3-none-any.whl.
File metadata
- Download URL: gpu_installer-4.5.2-py3-none-any.whl
- Upload date:
- Size: 53.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83d26b31e890c7a54fe86a5dea9efc782ac31a4c53d6a1295c88c5d8aa8f4846
|
|
| MD5 |
5a430dbb195f779807d415eef7653a81
|
|
| BLAKE2b-256 |
64a4a4a8fc88a09394328995c4b5b8cf8dbd3fba96a97efd3a0c8b3575ca4e62
|