Skip to main content

Intelligent GPU-accelerated ML package installer for PyTorch, CuPy, and llama-cpp-python

Project description

Smart GPU Package Installer

An intelligent, autonomous installer that detects your system, GPU, and CUDA/ROCm configuration to install the optimal ML ecosystem (PyTorch, CuPy, JAX, llama-cpp-python) with zero guesswork — and can optionally provision the underlying system GPU stack (CUDA toolkit + cuDNN, or ROCm) for you. I've made this from https://gist.github.com/kipavy/2c5acabbff81a410b340464a667a12c4/93a56a0cc6bcca4868bcb7724e0f7f3331b32f02 + https://pypi.org/project/torch-installer-coff33ninja/


✨ Features

  • 🔍 Automatic hardware detection — identifies GPU vendor (NVIDIA / AMD / Apple Silicon), driver, and CUDA/ROCm version with no flags.
  • 🧩 Synchronized version matrix — pins torch, torchvision, and torchaudio to compatible triplets, eliminating ABI mismatches.
  • 🌐 Cross-platform — Linux (CUDA & ROCm), Windows (CUDA, with optional auto-install), macOS (MPS), and CPU-only fallback.
  • 📦 Multi-package — installs PyTorch, CuPy, JAX, and llama-cpp-python individually or all at once, with GPU-aware skipping.
  • 🛠️ System stack auto-install (opt-in) — installs the GPU userspace stack (CUDA toolkit + cuDNN, or ROCm) from official vendor repos via apt/dnf/winget/chocolatey. Never touches kernel drivers, and previews the full plan before any sudo step.
  • 🗑️ Clean uninstall--uninstall removes the selected ecosystem, including stray nvidia-* fat-wheels, for a sterile environment.
  • 🧹 Sterile cross-grades — purges stale nvidia-* fat-wheels and CPU-only builds before reinstalling.
  • Isolated verification — pre- and post-install checks run in a subprocess to defeat sys.modules caching, with retry/backoff for linker sync lag.
  • ♻️ Smart, idempotent reinstall — skips packages already working and only acts when the environment actually needs it.
  • 🚀 Zero-install usage — run on the fly via uvx, or embed programmatically through the dependency-free ensure() API.
  • Faster installs with uv — automatically uses uv when available for significantly faster installs, falling back to pip transparently.
  • 🔬 Dry-run & diagnostics — preview exact commands, inspect detected GPU info, and list supported CUDA wheel versions without touching your environment.

🚀 Quick Start

The fastest path — no install, no manual steps. With uv installed, one command fetches the tool on the fly and installs the right GPU wheels into your active environment:

# Detect GPU/CUDA and install the full ecosystem (torch + cupy + jax + llamacpp)
uvx gpu-installer

# Just PyTorch — space- or comma-separated for a subset
uvx gpu-installer torch
uvx gpu-installer torch,cupy

# Preview only — print the exact commands without running them
uvx gpu-installer --dry-run

# To use the latest unreleased version from GitLab instead of PyPI:
# uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer

uvx installs into the currently active venv/conda env. If none is active, add --python /path/to/python so it knows where to install.

Prefer a traditional install? Install the CLI once, then call it directly:

pip install gpu-installer   # or: uv pip install gpu-installer

gpu-installer                 # full auto-detect + install
gpu-installer --cpu-only      # force CPU-only builds
gpu-installer torch cupy      # a subset (space- or comma-separated)
gpu-installer --dry-run       # preview without executing

Why This Exists

Installing GPU-accelerated ML packages is notoriously fragile. The wrong wheel, a stale nvidia-* package, or a uv/pip index race can silently install a CPU-only build or brick your environment. This script handles all of that:

  • Detects your GPU vendor (NVIDIA / AMD / Apple Silicon) and driver version automatically
  • Pins all three packages to a mathematically synchronized release matrix — no more torch and torchvision ABI mismatches
  • Purges stale nvidia-* fat-wheel residue before any cross-grade
  • Runs pre- and post-install verification in an isolated subprocess to defeat sys.modules caching
  • Retries dynamic linker sync after fast installs (uv) with exponential backoff

Supported Platforms

Platform GPU Backend Notes
Linux NVIDIA CUDA Auto-detected via nvcc / nvidia-smi; optional toolkit+cuDNN auto-install via apt/dnf
Linux AMD ROCm Auto-detected via rocminfo / rocm-smi; optional ROCm userspace auto-install via apt/dnf
Windows NVIDIA CUDA Optional auto-install via winget / chocolatey
macOS Apple MPS (Metal) Auto-detected, no CUDA needed
Any CPU-only Use --cpu-only

Requirements

  • Python 3.8+
  • uv (recommended, falls back to pip automatically)
  • NVIDIA drivers / ROCm stack already installed for GPU builds

Shell Completion (optional)

Tab-completion for package names and flags is provided via argcomplete. It's an optional extra — the core tool stays dependency-free.

# Install with the completion extra
pip install "gpu-installer[completion]"

# Activate for the current shell (bash / zsh)
eval "$(register-python-argcomplete gpu-installer)"

# Fish
register-python-argcomplete --shell fish gpu-installer | source

Add the eval line to your ~/.bashrc / ~/.zshrc to make it permanent. For fish, write it to a completions file instead:

register-python-argcomplete --shell fish gpu-installer > ~/.config/fish/completions/gpu-installer.fish

Then gpu-installer <TAB> completes torch, cupy, jax, and llamacpp.

Colored Help (optional)

Colorized --help output is provided via rich-argparse. Like completion, it's an optional extra — the core tool stays dependency-free and falls back to plain help when it isn't installed.

pip install "gpu-installer[color]"

Use in Another Project

Other projects can install their GPU dependencies through gpu-installer without hooking into pip install (wheels have no reliable install-time hook). Trigger it explicitly — from a setup command, a first-run guard, or your app's startup.

Zero permanent dependency (recommended). Shell out via uvx, passing your own interpreter so the target is unambiguous. gpu-installer is fetched and run on the fly — never added to your dependency tree:

import subprocess, sys

subprocess.run(
    [
        "uvx", "gpu-installer", "--python", sys.executable, "torch", "cupy",
    ],
    check=True,
)

Programmatic API (if you accept the tiny zero-dep dependency). Add gpu-installer to your deps and call ensure() — it is idempotent (skips packages already working) and returns a structured result:

from gpu_installer import ensure

result = ensure(["torch", "cupy"])   # installs into the active environment
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.failed}")

ensure() accepts a list or comma string, plus python=, cpu_only=, force_cuda=, force_reinstall=, dry_run=, and quiet=. It returns an EnsureResult with target_python, cuda, installed, skipped, failed, errors (a {package: reason} map for anything that failed — carrying the installer's captured error text or the GPU probe's real failure message, not just a category), and an ok flag.

Pass quiet=True to keep gpu-installer's own status output off your stdout — it's routed to the gpu_installer logger instead, so configure that logger to capture it (the underlying uv/pip install still streams its progress):

import logging
logging.getLogger("gpu_installer").addHandler(logging.StreamHandler())

result = ensure(["torch", "cupy"], quiet=True)
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.errors}")

CUDA / PyTorch Compatibility Matrix

The script uses a strict pinned matrix to prevent rolling-release index desyncs where torch and torchvision resolve to incompatible builds:

CUDA Version PyTorch TorchVision TorchAudio
11.6 2.1.2 0.16.2 2.1.2
11.7 2.2.2 0.17.2 2.2.2
11.8 2.5.1 0.20.1 2.5.1
12.1 2.5.1 0.20.1 2.5.1
12.4 2.5.1 0.20.1 2.5.1
12.6 2.11.0 0.26.0 2.11.0
12.8 2.11.0 0.26.0 2.11.0
13.0 2.11.0 0.26.0 2.11.0

If your exact CUDA version isn't listed, the script automatically picks the highest compatible version below yours.


CLI Reference

Install Options

Argument Default Description
[PACKAGE ...] all Packages to install, space- or comma-separated (torch, cupy, jax, llamacpp)
-p, --python PATH active env Target interpreter to install into (active venv/conda env by default)
--cpu-only off Force CPU-only builds for all packages
--force-cuda VER auto Override detected CUDA version (e.g. 121, cu12.1)
--force-reinstall off Force reinstall even if already detected as working
-u, --uninstall off Uninstall the selected packages' distributions (default: all). Mutually exclusive with --force-reinstall
-n, --dry-run off Print all commands that would run, without executing
--build off Compile llama-cpp-python from source (CUDA/Metal). Other packages ignore it. Auto-detects the GPU's CUDA arch and forces cuBLAS
--cmake-args STR none Extra cmake defines appended to a --build (e.g. "-DGGML_CUDA_FORCE_MMQ=on"); a pre-set CMAKE_ARGS env var is also honored
-l, --log off Tee all output to a timestamped log file

Diagnostic / Info Commands

Flag Description
--gpu-info Show detected GPU model, VRAM, CUDA version, and upgrade guidance
-d, --doctor Diagnose the active environment — run GPU verification probes on each package
--list-cuda List all supported CUDA wheel versions
--show-matching Show which wheel version your detected CUDA maps to

System Stack Auto-Install (opt-in)

Installs the GPU userspace stack from official vendor repos. Never touches kernel drivers. Privileged steps run via sudo after showing the full plan.

Flag Description
--auto-install-system Install the system stack: CUDA toolkit + cuDNN (NVIDIA) or ROCm (AMD) on Linux; CUDA via winget/chocolatey on Windows
--cuda-version VER Toolkit version for auto-install (e.g. 12.4)
--yes / -y Skip the confirmation prompt (scripted / CI use)

Linux coverage: apt (Ubuntu/Debian) and dnf (RHEL/Fedora/Rocky). Always preview with --dry-run first.


Packages Installed

PyTorch Ecosystem (torch)

Installs torch, torchvision, and torchaudio as a pinned, synchronized triplet from the official PyTorch wheel index. Handles CUDA, ROCm, MPS, and CPU targets.

CuPy (cupy)

Installs the appropriate cupy-cuda{major}x wheel based on your detected CUDA major version. Skipped automatically on CPU-only or MPS systems.

JAX (jax)

Installs jax[cuda12] or jax[cuda13] keyed to your detected CUDA major version (the wheels bundle their own CUDA redistributables, so only a recent NVIDIA driver is required). Fully standalone — it does not depend on your installed torch. Plain jax is a real CPU build, so --cpu-only (and macOS, where JAX has no Metal backend) installs it like llama-cpp rather than skipping. CUDA wheels are Linux-only, so the GPU path is skipped on Windows (use WSL2 or --cpu-only) and on ROCm (experimental, local-build-only upstream).

Llama-CPP-Python (llamacpp)

Installs llama-cpp-python with GPU offload support via the abetlen pre-built wheel index. Falls back to a CPU build if no CUDA/ROCm is detected.

When a prebuilt wheel is broken on your machine (a common symptom is a clean install that segfaults on the GPU check), pass --build to compile from source instead. The build auto-detects your GPU's CUDA architecture (-DCMAKE_CUDA_ARCHITECTURES), forces cuBLAS, and sets FORCE_CMAKE=1; it needs nvcc and a C++ compiler on PATH. Add extra cmake defines with --cmake-args or a pre-set CMAKE_ARGS environment variable. Even without --build, a prebuilt wheel that fails GPU verification is automatically rebuilt from source when the toolchain is available.


Adding a Package

Each GPU package is a GpuPackage subclass in src/gpu_installer/packages/. To add one:

  1. Create src/gpu_installer/packages/<name>.py with a class implementing install(self, plan) -> Outcome and verify(self, python=None) -> bool (optionally override preflight and declare a purge_names list).
  2. Register it in the _PACKAGES tuple in packages/__init__.py.

That's the whole change — the CLI, ensure(), and the purge logic pick it up automatically.


How Detection Works

nvidia-smi / nvcc          →  CUDA version
rocminfo / rocm-smi        →  ROCm version
platform.system() Darwin   →  Apple MPS
(none found)               →  CPU-only

GPU model and VRAM are also parsed from nvidia-smi / rocminfo output for upgrade guidance.


Smart Reinstall Logic

The script avoids unnecessary reinstalls by checking the current state before acting:

Detected State Action
Not installed Install
CPU-only build, GPU requested Auto-purge + upgrade
CUDA broken / ABI mismatch Auto-purge + reinstall
Already working (CUDA tensor test passes) Skip
--force-reinstall passed Always purge + reinstall

The purge step removes torch, torchvision, torchaudio, cupy, jax, jaxlib (plus its jax-cuda* plugins), llama-cpp-python, and all nvidia-* fat-wheel packages scraped from pip freeze to ensure a sterile environment before reinstalling.


Post-Install Verification

After installation, the script runs isolated subprocess checks for each package:

  • PyTorch: imports torch, checks cuda.is_available(), runs a .cuda() tensor allocation test. Retries up to 4 times with backoff (3s / 5s / 8s) to handle dynamic linker sync lag after uv installs.
  • CuPy: imports cupy, calls cupy.cuda.runtime.getDeviceCount().
  • JAX: imports jax, inspects jax.devices(), and — when a gpu backend is present — runs a small device computation (block_until_ready()) to confirm acceleration (a CPU build passes too). Retries with backoff like PyTorch/CuPy to ride out linker sync lag.
  • Llama-CPP: calls llama_supports_gpu_offload() to confirm hardware acceleration.

Examples

# Check what GPU and CUDA are detected
gpu-installer --gpu-info

# Diagnose the environment without reinstalling anything
gpu-installer --doctor

# Install only PyTorch and Llama-CPP, forcing CUDA 12.1
gpu-installer torch llamacpp --force-cuda 121

# Simulate a full install on an AMD ROCm system (dry run)
gpu-installer --dry-run

# Force a clean reinstall of everything with logging
gpu-installer --force-reinstall --log

# Uninstall just PyTorch (torch, torchvision, torchaudio)
gpu-installer --uninstall torch

# Preview removal of the whole ecosystem without executing
gpu-installer --uninstall --dry-run

# Auto-install the system GPU stack (preview first!)
gpu-installer --auto-install-system --dry-run
gpu-installer --auto-install-system --cuda-version 12.4

# Then install the Python packages
gpu-installer

Troubleshooting

"CUDA available but tensor test failed" This is usually a dynamic linker sync issue immediately after a uv install. The verifier retries automatically. If it persists, run gpu-installer --doctor a few seconds later — it will re-verify without reinstalling.

Llama-CPP installs but the GPU check segfaults Some prebuilt llama-cpp-python wheels crash on specific GPUs. Run gpu-installer llamacpp --build to compile from source with your GPU's CUDA architecture and cuBLAS. If the build needs extra cmake flags, pass them with --cmake-args "...". The installer also attempts this rebuild automatically when a prebuilt wheel fails verification and a compiler toolchain is present.

"CPU-only version detected" Your installed torch was built without CUDA. Run gpu-installer --force-reinstall to trigger an auto-upgrade.

CuPy skipped on a CUDA machine CuPy requires NVIDIA or AMD GPUs. It is intentionally skipped on macOS/MPS and CPU-only environments.

uv not found The script falls back to pip automatically. Install uv for significantly faster installs: pip install uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_installer-4.5.1.tar.gz (139.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpu_installer-4.5.1-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file gpu_installer-4.5.1.tar.gz.

File metadata

  • Download URL: gpu_installer-4.5.1.tar.gz
  • Upload date:
  • Size: 139.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.1.tar.gz
Algorithm Hash digest
SHA256 d6b99c626bf30be257a76fd722f619445b9634156652b883add62f2fb8498fce
MD5 ff02cc6ff24bd1e33e76850c177772bc
BLAKE2b-256 b63ce02f9aff25979301c6ee85572b31f968fe6480890eac52158569c0214414

See more details on using hashes here.

File details

Details for the file gpu_installer-4.5.1-py3-none-any.whl.

File metadata

  • Download URL: gpu_installer-4.5.1-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 52c5f863e5b45e8eb5c2652ac6af54f142ffefc6fdae94a82f9de1059f1c6846
MD5 96eb4d9edb2498b5e600ebaa9b69c65f
BLAKE2b-256 1c2127e9721f0293dc8352655890d8a7eacf19f295f251fe0650f87b8ebcea7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page