GPU Service Manager for ML workloads
Project description
gpumod
GPU Service Manager for ML workloads on Linux/NVIDIA systems.
gpumod manages vLLM, llama.cpp, FastAPI, and Docker-based inference services on NVIDIA GPUs. It tracks VRAM allocation, supports mode-based service switching, provides VRAM simulation before deployment, and exposes an MCP server for AI assistant integration.
Features
- Service Management -- Register, start, stop, and monitor GPU services with support for vLLM, llama.cpp, FastAPI, and Docker drivers
- Mode Switching -- Define named modes (e.g., "chat", "coding") that bundle services together and switch between them
- VRAM Simulation -- Simulate VRAM for any configuration before deployment, with alternative suggestions when capacity is exceeded
- Model Registry -- Track ML models with metadata from HuggingFace Hub or GGUF files, with automatic VRAM estimation
- MCP Server -- Expose GPU management as an MCP server for Claude Code, Cursor, Claude Desktop, and other MCP-compatible AI assistants
- Template Engine -- Generate and install systemd unit files from Jinja2 templates, customized per driver type
- AI Planning -- LLM-assisted VRAM allocation suggestions (advisory only)
- Interactive TUI -- Terminal dashboard with live GPU status
- Rich CLI -- Beautiful output with tables, VRAM bar charts, and JSON mode
Installation
Requires uv, Python >= 3.12, Linux with
NVIDIA GPU, and nvidia-smi in PATH.
git clone https://github.com/jaigouk/gpumod.git
cd gpumod
uv sync
# Install globally so `gpumod` is always on your PATH
uv tool install -e .
Quick Start
# Initialize database and load presets
gpumod init
# Check GPU status
gpumod status
# List services
gpumod service list
Deploying a Service
gpumod auto-generates systemd unit files from presets — no manual unit files needed.
# Enable user-level systemd lingering (one-time setup)
sudo loginctl enable-linger $USER
# Preview the generated unit file
gpumod template generate vllm-chat
# Install it to ~/.config/systemd/user/
gpumod template install vllm-chat --yes
# Start the service (uses systemctl --user, no sudo needed)
gpumod service start vllm-chat
See the Getting Started guide for full setup instructions.
Mode Switching
Modes bundle services together and fit them within your VRAM budget.
# Simulate VRAM usage before switching
gpumod simulate mode coding-mode
# Switch modes (starts/stops services automatically)
gpumod mode switch coding-mode
# Launch interactive TUI
gpumod tui
MCP Integration
gpumod exposes 16 tools and 8 resources via the Model Context Protocol. Add it to your IDE to let AI assistants query GPU status, simulate VRAM, switch modes, discover models on HuggingFace, and consult an RLM-based reasoning engine for complex questions like "Can I run Qwen3-235B on 24GB?".
{
"mcpServers": {
"gpumod": {
"command": "uv",
"args": ["--directory", "/path/to/gpumod", "run", "python", "-m", "gpumod.mcp_main"],
"env": {
"OTEL_SDK_DISABLED": "true"
}
}
}
}
Important: gpumod depends on opentelemetry. Without
OTEL_SDK_DISABLED=true, the SDK may print a startup message to stdout, which corrupts the JSON-RPC stream and causes MCP clients (Hermes, Claude Code, etc.) to fail withFailed to parse JSONRPC message from server.
See MCP Integration for setup instructions for Claude Code, Cursor, Claude Desktop, and Antigravity.
Configuration
All settings are configurable via environment variables with the GPUMOD_
prefix. A .env.example file is included in the repository root — copy it to
.env and uncomment the variables you want to override.
Key settings include preflight thresholds (RAM/VRAM), LLM backend configuration, database path, and MCP rate limits. See Configuration for the full list.
Security
Input validation at every boundary, error sanitization, rate limiting,
parameterized queries, sandboxed templates, and no shell=True. See
Security for the full threat model.
Documentation
| Document | Description |
|---|---|
| CLI Reference | All commands: status, service, mode, simulate, model, template, plan, tui |
| MCP Integration | MCP server setup for Claude Code, Cursor, Claude Desktop, Antigravity |
| Configuration | Environment variables, LLM backends, settings |
| AI Planning | LLM-assisted VRAM allocation planning |
| Architecture | System design and component overview |
| Security | Threat model, input validation, security controls |
| Benchmarks | LLM benchmark framework and results |
| Contributing | Development setup, tests, code quality, PR process |
License
Apache License 2.0. See LICENSE for details.
Copyright 2026 Jaigouk Kim
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpumod-0.1.7.tar.gz.
File metadata
- Download URL: gpumod-0.1.7.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da1e801e953361bffd23c22b62aa2478bc1acc7241a7af45c3f350a5d121dc9
|
|
| MD5 |
1be3a0b963ae6df296ab04397dbf229a
|
|
| BLAKE2b-256 |
4d28785bc3a67164a9ba5a63e2f9f109bb9555427635911e782c56d99b2bbe5f
|
Provenance
The following attestation bundles were made for gpumod-0.1.7.tar.gz:
Publisher:
publish.yml on jaigouk/gpumod
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gpumod-0.1.7.tar.gz -
Subject digest:
0da1e801e953361bffd23c22b62aa2478bc1acc7241a7af45c3f350a5d121dc9 - Sigstore transparency entry: 1368323402
- Sigstore integration time:
-
Permalink:
jaigouk/gpumod@3bbf074938d89a214ac9d0f5fff7d3ab1eb8dc59 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/jaigouk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bbf074938d89a214ac9d0f5fff7d3ab1eb8dc59 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gpumod-0.1.7-py3-none-any.whl.
File metadata
- Download URL: gpumod-0.1.7-py3-none-any.whl
- Upload date:
- Size: 213.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b4e3938895b5e66c107f270cbbb71b904d2b94992838fb32ffca56dfe1a20a0
|
|
| MD5 |
5b79266c650268f166c0f19b09071e79
|
|
| BLAKE2b-256 |
f6b395ad39048c5047039b4859d0ec4629c5ff0f1779b9d14e60ec07d38f8a05
|
Provenance
The following attestation bundles were made for gpumod-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on jaigouk/gpumod
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gpumod-0.1.7-py3-none-any.whl -
Subject digest:
8b4e3938895b5e66c107f270cbbb71b904d2b94992838fb32ffca56dfe1a20a0 - Sigstore transparency entry: 1368323412
- Sigstore integration time:
-
Permalink:
jaigouk/gpumod@3bbf074938d89a214ac9d0f5fff7d3ab1eb8dc59 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/jaigouk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bbf074938d89a214ac9d0f5fff7d3ab1eb8dc59 -
Trigger Event:
release
-
Statement type: