gpu-container

Model-aware inference memory-placement planner for single-GPU rigs — profile, plan, prove.

These details have not been verified by PyPI

Project description

A GPU-enabled container exposes the device. A model-aware runtime decides what lives in VRAM, pinned RAM, and NVMe.

Run the largest useful local model your machine can honestly support, with explicit placement plans, benchmark receipts, and refusal when the plan would thrash.

Architecture

Windows / WSL2 / Linux host
  └─ GPU-enabled Docker container
      └─ Inference runtime
          ├─ VRAM: hot weights, active layers, activations, KV working set
          ├─ pinned RAM: CPU-offloaded weights, MoE experts, KV spill/reuse
          └─ NVMe: mmap shards, disk offload, cold experts, cold KV

Product Boundary

Docker         = packaging + GPU exposure
CUDA/runtime   = compute backend
Planner        = memory law
Inference engine = execution

Core Features

Hardware profiler — Detect VRAM, RAM, GPU type, WSL/native Linux, NVMe speed, CUDA availability
Model profiler — Detect dense vs MoE, largest layer, total weights, quantization, KV growth by context length
Runtime planner — Generate launch plans for llama.cpp, vLLM, Accelerate, TensorRT-LLM, or DeepSpeed-style offload
Placement receipt — Show what is in VRAM, what is in RAM, what is on disk, expected bottleneck, measured tokens/sec
MoE-specialized path — Keep always-active layers on GPU, route experts to CPU/RAM, NVMe for cold fallback
Routing de-risk — Measure whether a model's MoE routing is skewed enough that a per-expert cache would help, before building for it (gpu-container-concentration)
Rig-safety watchdog — Poll GPU power/temperature/VRAM + host memory against configurable thresholds; an AI agent or an autonomous loop aborts a run before it endangers the machine (gpu-container-watchdog)

Key Constraint

On Windows/WSL, CUDA Unified Memory oversubscription is not the path. CUDA treats Windows/WSL as limited unified-memory support — no fine-grained GPU page-fault migration, no GPU-memory oversubscription beyond physical VRAM. This product is explicit inference memory placement, not "Docker VRAM overflow."

Status

Built and working today: gpu-container-profile, gpu-container-plan, gpu-container-receipt (with the recalibration loop), gpu-container-concentration (routing de-risk), and gpu-container-watchdog (supervise a GPU job safely). llama.cpp is the integrated backend; the placement math is backend-agnostic. Start with the quickstart.

Privacy & safety

gpu-container is a local, offline tool — it makes no network calls and collects no telemetry, by default or otherwise. It reads GPU metrics (nvidia-smi / NVML) and host memory (psutil), the model config.json you supply, and the JSON files you point it at; it writes only to the output paths you specify. It does not read or transmit model weights, credentials, or tokens. Host-level actions (wsl --shutdown, docker stop, kill) run only when you explicitly opt in via the watchdog's --on-breach; the defaults never touch your machine beyond the job they supervise. Full policy: SECURITY.md.

Documentation

docs/quickstart.md — end-to-end walkthrough: profile → plan → launch under the watchdog → receipt → recalibrate
docs/cli.md — the five commands: synopsis, flags, exit codes, worked examples
docs/architecture.md — memory-tier model, data flow, MoE expert routing, the recalibration loop
docs/features.md — the seven core features in depth
docs/moe-lane-architecture.md — the flagship MoE lane in depth
docs/derisk-concentration.md — the per-expert-cache de-risk gate (routing concentration)
docs/decisions/0001-per-expert-cache-build-vs-upstream.md — ADR-0001: consume the cache mechanism, contribute the policy
docs/constraints.md — non-goals + the Windows/WSL CUDA Unified-Memory correction
docs/prior-art.md — runtimes we orchestrate, and the gap this product fills
docs/feasibility.md — feasibility assessment, research grounding, and what's confirmed live

Built by MCP Tool Shop · MIT Licensed

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.3

Jun 4, 2026

0.1.2

Jun 4, 2026

0.1.1

Jun 4, 2026

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_container-0.1.3.tar.gz (1.3 MB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpu_container-0.1.3-py3-none-any.whl (65.4 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file gpu_container-0.1.3.tar.gz.

File metadata

Download URL: gpu_container-0.1.3.tar.gz
Upload date: Jun 4, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gpu_container-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`2267f0b127b040a04c2bb783b1302754c584339e27c33473116a13b7a75f081f`
MD5	`4fc3d1acc7e772702b4fa768c98d4325`
BLAKE2b-256	`fa065c2b1f98b1537a22a3f6990a8db6d89af0649134ff74d071cbb30209c879`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpu_container-0.1.3.tar.gz:

Publisher: release.yml on mcp-tool-shop-org/gpu-container

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gpu_container-0.1.3.tar.gz
- Subject digest: 2267f0b127b040a04c2bb783b1302754c584339e27c33473116a13b7a75f081f
- Sigstore transparency entry: 1724724957
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: mcp-tool-shop-org/gpu-container@d16a0229b78b2e8dbeebd8a0f3280bc440fb1ad5
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/mcp-tool-shop-org
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d16a0229b78b2e8dbeebd8a0f3280bc440fb1ad5
- Trigger Event: release

File details

Details for the file gpu_container-0.1.3-py3-none-any.whl.

File metadata

Download URL: gpu_container-0.1.3-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 65.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gpu_container-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c546471c3af00c36c30daca9d20c0ff2564dd8956014e8a0eaab94b7998cb021`
MD5	`c60074dd231923e971c2e8994521b971`
BLAKE2b-256	`cd11635b0294168e4409410affdd9416ea241b1aa9e4564d1ad48c809eb953ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpu_container-0.1.3-py3-none-any.whl:

Publisher: release.yml on mcp-tool-shop-org/gpu-container

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gpu_container-0.1.3-py3-none-any.whl
- Subject digest: c546471c3af00c36c30daca9d20c0ff2564dd8956014e8a0eaab94b7998cb021
- Sigstore transparency entry: 1724725055
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: mcp-tool-shop-org/gpu-container@d16a0229b78b2e8dbeebd8a0f3280bc440fb1ad5
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/mcp-tool-shop-org
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d16a0229b78b2e8dbeebd8a0f3280bc440fb1ad5
- Trigger Event: release

gpu-container 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Architecture

Product Boundary

Core Features

Key Constraint

Status

Privacy & safety

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance