Lightweight web admin panel for llama.cpp server management
Project description
llama-wrangler
Lightweight web admin panel for llama.cpp server management.
Features
- Model Browser — Scan local directory for
.gguffiles, view name/size/modified - Model Download — Search HuggingFace for GGUF models, download with progress tracking
- Server Lifecycle — Start/stop/restart
llama-serversubprocess from the browser - Parameter Config — Visual editor for llama-server flags (context size, GPU layers, batch size, flash attention, etc.)
- System Monitoring — Real-time GPU (VRAM, temp, utilization, power), CPU, RAM, and disk usage
- Log Viewer — Stream llama-server stdout/stderr via Server-Sent Events
- Health Monitoring — Poll
/healthendpoint, show status badge - i18n — English and Chinese interface, switchable at runtime
Install
pip install llama-wrangler
Prerequisites
llama-wrangler manages a llama-server process on the host machine. Make sure you have:
- llama.cpp compiled with
llama-serverbinary (build instructions) - NVIDIA GPU driver installed (for GPU inference and monitoring)
Quick Start
# Start the admin panel
llama-wrangler --host 0.0.0.0 --port 7860
# With custom config
llama-wrangler --config /path/to/config.json
Then open http://localhost:7860 in your browser.
Configuration
Config is stored at ~/.config/llama-wrangler/config.json:
{
"llama_server_path": "/path/to/llama-server",
"models_dir": "/path/to/models",
"default_args": {
"host": "0.0.0.0",
"port": 8080,
"n_gpu_layers": 99,
"ctx_size": 8192,
"flash_attn": true,
"batch_size": 2048,
"ubatch_size": 512,
"threads": 0,
"parallel": 1,
"cont_batching": true,
"metrics": true
}
}
Docker
Host prerequisites
The following must be set up on the host machine before running the container:
- NVIDIA GPU driver — install from NVIDIA or your distro's package manager
- NVIDIA Container Toolkit — required for
--gpusflag to work:# Ubuntu/Debian curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
See the official install guide for other distros. - llama.cpp compiled on the host with
llama-serverbinary - Verify everything works:
docker run --rm --gpus all ubuntu nvidia-smi
Build and run
# Build
docker build -t llama-wrangler .
# Run
docker run --gpus all -p 7860:7860 \
-v /path/to/models:/mnt/data/models \
-v /path/to/llama-server:/opt/llama-server:ro \
-v /sys:/sys:ro \
-v ~/.config/llama-wrangler:/root/.config/llama-wrangler \
llama-wrangler
Volume mounts explained:
| Mount | Purpose |
|---|---|
-v /path/to/models:/mnt/data/models |
GGUF model files (read/write for downloads) |
-v /path/to/llama-server:/opt/llama-server:ro |
llama-server binary from host |
-v /sys:/sys:ro |
Sensor data (disk/NVMe temperatures via psutil) |
-v ~/.config/llama-wrangler:... |
Persist configuration across restarts |
--gpus all |
GPU access (nvidia-smi, CUDA for llama-server) |
Note: CPU and RAM metrics work out of the box in Docker — psutil reads
/procwhich is shared from the host. GPU monitoring requires--gpus allvia nvidia-container-toolkit.
Without GPU
llama-wrangler works without a GPU (CPU-only inference). Simply omit --gpus all:
docker run -p 7860:7860 \
-v /path/to/models:/mnt/data/models \
-v /path/to/llama-server:/opt/llama-server:ro \
llama-wrangler
The GPU section on the dashboard will be hidden automatically.
Tech Stack
- Backend: Python asyncio (zero-framework, vendored HTTP server)
- Frontend: Single-file vanilla HTML/CSS/JS
- Dependencies:
huggingface-hub,psutilonly - No: Flask, FastAPI, React, npm, database
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_wrangler-0.1.0.tar.gz.
File metadata
- Download URL: llama_wrangler-0.1.0.tar.gz
- Upload date:
- Size: 46.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98a575b6c63a151f6a655d26e2f3a13630176e744047c34a24212fa62e295eae
|
|
| MD5 |
34ac255de0a44568dc2721bc631c95ec
|
|
| BLAKE2b-256 |
aed64db01d030315141be027eb687d992d76c4886d2f44b5d64a9bf4780cc715
|
Provenance
The following attestation bundles were made for llama_wrangler-0.1.0.tar.gz:
Publisher:
release.yml on Oaklight/llama-wrangler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_wrangler-0.1.0.tar.gz -
Subject digest:
98a575b6c63a151f6a655d26e2f3a13630176e744047c34a24212fa62e295eae - Sigstore transparency entry: 2066004357
- Sigstore integration time:
-
Permalink:
Oaklight/llama-wrangler@413e2f5a1c9fd371ee11d0e38e78605f44e37673 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Oaklight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@413e2f5a1c9fd371ee11d0e38e78605f44e37673 -
Trigger Event:
push
-
Statement type:
File details
Details for the file llama_wrangler-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_wrangler-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bc3a625df2029f0d165edb32286aad7cf9a75cd2962256f30b6c531d7b1c9e6
|
|
| MD5 |
2ce19825bc404c8d266989e76d7cd87c
|
|
| BLAKE2b-256 |
0b54dfe8ea13295face4af1236221f19f3ad605f701500050fff8bc3193af695
|
Provenance
The following attestation bundles were made for llama_wrangler-0.1.0-py3-none-any.whl:
Publisher:
release.yml on Oaklight/llama-wrangler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_wrangler-0.1.0-py3-none-any.whl -
Subject digest:
7bc3a625df2029f0d165edb32286aad7cf9a75cd2962256f30b6c531d7b1c9e6 - Sigstore transparency entry: 2066004488
- Sigstore integration time:
-
Permalink:
Oaklight/llama-wrangler@413e2f5a1c9fd371ee11d0e38e78605f44e37673 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Oaklight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@413e2f5a1c9fd371ee11d0e38e78605f44e37673 -
Trigger Event:
push
-
Statement type: