Skip to main content

Intelligent GPU-accelerated ML package installer for PyTorch, CuPy, and llama-cpp-python

Project description

Smart GPU Package Installer

An intelligent, autonomous installer that detects your system, GPU, and CUDA/ROCm configuration to install the optimal ML ecosystem (PyTorch, CuPy, JAX, llama-cpp-python) with zero guesswork — and can optionally provision the underlying system GPU stack (CUDA toolkit + cuDNN, or ROCm) for you. I've made this from https://gist.github.com/kipavy/2c5acabbff81a410b340464a667a12c4/93a56a0cc6bcca4868bcb7724e0f7f3331b32f02 + https://pypi.org/project/torch-installer-coff33ninja/


✨ Features

  • 🔍 Automatic hardware detection — identifies GPU vendor (NVIDIA / AMD / Apple Silicon), driver, and CUDA/ROCm version with no flags.
  • 🧩 Synchronized version matrix — pins torch, torchvision, and torchaudio to compatible triplets, eliminating ABI mismatches.
  • 🌐 Cross-platform — Linux (CUDA & ROCm), Windows (CUDA, with optional auto-install), macOS (MPS), and CPU-only fallback.
  • 📦 Multi-package — installs PyTorch, CuPy, JAX, and llama-cpp-python individually or all at once, with GPU-aware skipping.
  • 🛠️ System stack auto-install (opt-in) — installs the GPU userspace stack (CUDA toolkit + cuDNN, or ROCm) from official vendor repos via apt/dnf/winget/chocolatey. Never touches kernel drivers, and previews the full plan before any sudo step.
  • 🗑️ Clean uninstall--uninstall removes the selected ecosystem, including stray nvidia-* fat-wheels, for a sterile environment.
  • 🧹 Sterile cross-grades — purges stale nvidia-* fat-wheels and CPU-only builds before reinstalling.
  • Isolated verification — pre- and post-install checks run in a subprocess to defeat sys.modules caching, with retry/backoff for linker sync lag.
  • ♻️ Smart, idempotent reinstall — skips packages already working and only acts when the environment actually needs it.
  • 🚀 Zero-install usage — run on the fly via uvx, or embed programmatically through the dependency-free ensure() API.
  • Faster installs with uv — automatically uses uv when available for significantly faster installs, falling back to pip transparently.
  • 🔬 Dry-run & diagnostics — preview exact commands, inspect detected GPU info, and list supported CUDA wheel versions without touching your environment.

🚀 Quick Start

The fastest path — no install, no manual steps. With uv installed, one command fetches the tool on the fly and installs the right GPU wheels into your active environment:

# Detect GPU/CUDA and install the full ecosystem (torch + cupy + jax + llamacpp)
uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer

# Just PyTorch — space- or comma-separated for a subset
uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer torch
uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer torch,cupy

# Preview only — print the exact commands without running them
uvx --from git+https://gitlab.com/yoanncure/gpu_installer gpu-installer --dry-run

Not on PyPI yet, hence --from git+…. Once published this shortens to just uvx gpu-installer torch. uvx installs into the currently active venv/conda env. If none is active, add --python /path/to/python so it knows where to install.

Prefer a traditional install? Install the CLI once, then call it directly:

pip install git+https://gitlab.com/yoanncure/gpu_installer   # or: uv pip install git+…

gpu-installer                 # full auto-detect + install
gpu-installer --cpu-only      # force CPU-only builds
gpu-installer torch cupy      # a subset (space- or comma-separated)
gpu-installer --dry-run       # preview without executing

Why This Exists

Installing GPU-accelerated ML packages is notoriously fragile. The wrong wheel, a stale nvidia-* package, or a uv/pip index race can silently install a CPU-only build or brick your environment. This script handles all of that:

  • Detects your GPU vendor (NVIDIA / AMD / Apple Silicon) and driver version automatically
  • Pins all three packages to a mathematically synchronized release matrix — no more torch and torchvision ABI mismatches
  • Purges stale nvidia-* fat-wheel residue before any cross-grade
  • Runs pre- and post-install verification in an isolated subprocess to defeat sys.modules caching
  • Retries dynamic linker sync after fast installs (uv) with exponential backoff

Supported Platforms

Platform GPU Backend Notes
Linux NVIDIA CUDA Auto-detected via nvcc / nvidia-smi; optional toolkit+cuDNN auto-install via apt/dnf
Linux AMD ROCm Auto-detected via rocminfo / rocm-smi; optional ROCm userspace auto-install via apt/dnf
Windows NVIDIA CUDA Optional auto-install via winget / chocolatey
macOS Apple MPS (Metal) Auto-detected, no CUDA needed
Any CPU-only Use --cpu-only

Requirements

  • Python 3.8+
  • uv (recommended, falls back to pip automatically)
  • NVIDIA drivers / ROCm stack already installed for GPU builds

Shell Completion (optional)

Tab-completion for package names and flags is provided via argcomplete. It's an optional extra — the core tool stays dependency-free.

# Install with the completion extra
pip install "gpu-installer[completion]"

# Activate for the current shell (bash / zsh)
eval "$(register-python-argcomplete gpu-installer)"

# Fish
register-python-argcomplete --shell fish gpu-installer | source

Add the eval line to your ~/.bashrc / ~/.zshrc to make it permanent. For fish, write it to a completions file instead:

register-python-argcomplete --shell fish gpu-installer > ~/.config/fish/completions/gpu-installer.fish

Then gpu-installer <TAB> completes torch, cupy, jax, and llamacpp.

Colored Help (optional)

Colorized --help output is provided via rich-argparse. Like completion, it's an optional extra — the core tool stays dependency-free and falls back to plain help when it isn't installed.

pip install "gpu-installer[color]"

Use in Another Project

Other projects can install their GPU dependencies through gpu-installer without hooking into pip install (wheels have no reliable install-time hook). Trigger it explicitly — from a setup command, a first-run guard, or your app's startup.

Zero permanent dependency (recommended). Shell out via uvx, passing your own interpreter so the target is unambiguous. gpu-installer is fetched and run on the fly — never added to your dependency tree:

import subprocess, sys

subprocess.run(
    [
        "uvx", "--from", "git+https://gitlab.com/yoanncure/gpu_installer",
        "gpu-installer", "--python", sys.executable, "torch", "cupy",
    ],
    check=True,
)

Programmatic API (if you accept the tiny zero-dep dependency). Add gpu-installer to your deps and call ensure() — it is idempotent (skips packages already working) and returns a structured result:

from gpu_installer import ensure

result = ensure(["torch", "cupy"])   # installs into the active environment
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.failed}")

ensure() accepts a list or comma string, plus python=, cpu_only=, force_cuda=, force_reinstall=, dry_run=, and quiet=. It returns an EnsureResult with target_python, cuda, installed, skipped, failed, errors (a {package: reason} map for anything that failed — carrying the installer's captured error text or the GPU probe's real failure message, not just a category), and an ok flag.

Pass quiet=True to keep gpu-installer's own status output off your stdout — it's routed to the gpu_installer logger instead, so configure that logger to capture it (the underlying uv/pip install still streams its progress):

import logging
logging.getLogger("gpu_installer").addHandler(logging.StreamHandler())

result = ensure(["torch", "cupy"], quiet=True)
if not result.ok:
    raise RuntimeError(f"GPU deps failed: {result.errors}")

CUDA / PyTorch Compatibility Matrix

The script uses a strict pinned matrix to prevent rolling-release index desyncs where torch and torchvision resolve to incompatible builds:

CUDA Version PyTorch TorchVision TorchAudio
11.6 2.1.2 0.16.2 2.1.2
11.7 2.2.2 0.17.2 2.2.2
11.8 2.5.1 0.20.1 2.5.1
12.1 2.5.1 0.20.1 2.5.1
12.4 2.5.1 0.20.1 2.5.1
12.6 2.11.0 0.26.0 2.11.0
12.8 2.11.0 0.26.0 2.11.0
13.0 2.11.0 0.26.0 2.11.0

If your exact CUDA version isn't listed, the script automatically picks the highest compatible version below yours.


CLI Reference

Install Options

Argument Default Description
[PACKAGE ...] all Packages to install, space- or comma-separated (torch, cupy, jax, llamacpp)
-p, --python PATH active env Target interpreter to install into (active venv/conda env by default)
--cpu-only off Force CPU-only builds for all packages
--force-cuda VER auto Override detected CUDA version (e.g. 121, cu12.1)
--force-reinstall off Force reinstall even if already detected as working
-u, --uninstall off Uninstall the selected packages' distributions (default: all). Mutually exclusive with --force-reinstall
-n, --dry-run off Print all commands that would run, without executing
-l, --log off Tee all output to a timestamped log file

Diagnostic / Info Commands

Flag Description
--gpu-info Show detected GPU model, VRAM, CUDA version, and upgrade guidance
-d, --doctor Diagnose the active environment — run GPU verification probes on each package
--list-cuda List all supported CUDA wheel versions
--show-matching Show which wheel version your detected CUDA maps to

System Stack Auto-Install (opt-in)

Installs the GPU userspace stack from official vendor repos. Never touches kernel drivers. Privileged steps run via sudo after showing the full plan.

Flag Description
--auto-install-system Install the system stack: CUDA toolkit + cuDNN (NVIDIA) or ROCm (AMD) on Linux; CUDA via winget/chocolatey on Windows
--cuda-version VER Toolkit version for auto-install (e.g. 12.4)
--yes / -y Skip the confirmation prompt (scripted / CI use)

Linux coverage: apt (Ubuntu/Debian) and dnf (RHEL/Fedora/Rocky). Always preview with --dry-run first.


Packages Installed

PyTorch Ecosystem (torch)

Installs torch, torchvision, and torchaudio as a pinned, synchronized triplet from the official PyTorch wheel index. Handles CUDA, ROCm, MPS, and CPU targets.

CuPy (cupy)

Installs the appropriate cupy-cuda{major}x wheel based on your detected CUDA major version. Skipped automatically on CPU-only or MPS systems.

JAX (jax)

Installs jax[cuda12] or jax[cuda13] keyed to your detected CUDA major version (the wheels bundle their own CUDA redistributables, so only a recent NVIDIA driver is required). Fully standalone — it does not depend on your installed torch. Plain jax is a real CPU build, so --cpu-only (and macOS, where JAX has no Metal backend) installs it like llama-cpp rather than skipping. CUDA wheels are Linux-only, so the GPU path is skipped on Windows (use WSL2 or --cpu-only) and on ROCm (experimental, local-build-only upstream).

Llama-CPP-Python (llamacpp)

Installs llama-cpp-python with GPU offload support via the abetlen pre-built wheel index. Falls back to a CPU build if no CUDA/ROCm is detected.


Adding a Package

Each GPU package is a GpuPackage subclass in src/gpu_installer/packages/. To add one:

  1. Create src/gpu_installer/packages/<name>.py with a class implementing install(self, plan) -> Outcome and verify(self, python=None) -> bool (optionally override preflight and declare a purge_names list).
  2. Register it in the _PACKAGES tuple in packages/__init__.py.

That's the whole change — the CLI, ensure(), and the purge logic pick it up automatically.


How Detection Works

nvidia-smi / nvcc          →  CUDA version
rocminfo / rocm-smi        →  ROCm version
platform.system() Darwin   →  Apple MPS
(none found)               →  CPU-only

GPU model and VRAM are also parsed from nvidia-smi / rocminfo output for upgrade guidance.


Smart Reinstall Logic

The script avoids unnecessary reinstalls by checking the current state before acting:

Detected State Action
Not installed Install
CPU-only build, GPU requested Auto-purge + upgrade
CUDA broken / ABI mismatch Auto-purge + reinstall
Already working (CUDA tensor test passes) Skip
--force-reinstall passed Always purge + reinstall

The purge step removes torch, torchvision, torchaudio, cupy, jax, jaxlib (plus its jax-cuda* plugins), llama-cpp-python, and all nvidia-* fat-wheel packages scraped from pip freeze to ensure a sterile environment before reinstalling.


Post-Install Verification

After installation, the script runs isolated subprocess checks for each package:

  • PyTorch: imports torch, checks cuda.is_available(), runs a .cuda() tensor allocation test. Retries up to 4 times with backoff (3s / 5s / 8s) to handle dynamic linker sync lag after uv installs.
  • CuPy: imports cupy, calls cupy.cuda.runtime.getDeviceCount().
  • JAX: imports jax, inspects jax.devices(), and — when a gpu backend is present — runs a small device computation (block_until_ready()) to confirm acceleration (a CPU build passes too). Retries with backoff like PyTorch/CuPy to ride out linker sync lag.
  • Llama-CPP: calls llama_supports_gpu_offload() to confirm hardware acceleration.

Examples

# Check what GPU and CUDA are detected
gpu-installer --gpu-info

# Diagnose the environment without reinstalling anything
gpu-installer --doctor

# Install only PyTorch and Llama-CPP, forcing CUDA 12.1
gpu-installer torch llamacpp --force-cuda 121

# Simulate a full install on an AMD ROCm system (dry run)
gpu-installer --dry-run

# Force a clean reinstall of everything with logging
gpu-installer --force-reinstall --log

# Uninstall just PyTorch (torch, torchvision, torchaudio)
gpu-installer --uninstall torch

# Preview removal of the whole ecosystem without executing
gpu-installer --uninstall --dry-run

# Auto-install the system GPU stack (preview first!)
gpu-installer --auto-install-system --dry-run
gpu-installer --auto-install-system --cuda-version 12.4

# Then install the Python packages
gpu-installer

Troubleshooting

"CUDA available but tensor test failed" This is usually a dynamic linker sync issue immediately after a uv install. The verifier retries automatically. If it persists, run gpu-installer --doctor a few seconds later — it will re-verify without reinstalling.

"CPU-only version detected" Your installed torch was built without CUDA. Run gpu-installer --force-reinstall to trigger an auto-upgrade.

CuPy skipped on a CUDA machine CuPy requires NVIDIA or AMD GPUs. It is intentionally skipped on macOS/MPS and CPU-only environments.

uv not found The script falls back to pip automatically. Install uv for significantly faster installs: pip install uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_installer-4.5.0.tar.gz (122.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpu_installer-4.5.0-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file gpu_installer-4.5.0.tar.gz.

File metadata

  • Download URL: gpu_installer-4.5.0.tar.gz
  • Upload date:
  • Size: 122.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.0.tar.gz
Algorithm Hash digest
SHA256 f68b2db0760ecb036caa14da91a3d9189996816ac6169e456b4d19a08de7079a
MD5 8c06ecbe3fb38e61fbe433ad95bc0200
BLAKE2b-256 02e94698d32484d4d7b3e882eb81235a62608c75be2ae6262cd83fd6afc5dfd7

See more details on using hashes here.

File details

Details for the file gpu_installer-4.5.0-py3-none-any.whl.

File metadata

  • Download URL: gpu_installer-4.5.0-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gpu_installer-4.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e195c075e6b16611fa6e75ef2a9b5e618de22988d98931321736f065f1c4afa
MD5 23d3e03db79ee99fdb645316cecb97a2
BLAKE2b-256 de99adb111fb59be0af26040d8ec814f327387ec1a99a50fe9d0309c009c2174

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page