Skip to main content

CLI wrapper for llama.cpp providing an ollama-like experience

Project description

llama-buddy

A friendly CLI wrapper for llama.cpp

Manage, download, and serve local LLMs with a single command. Think of it as an ollama-like experience built on top of llama-server.

Python 3.10+ License: MIT PyPI


Features

  • Background server — start/stop/restart llama-server as a daemon
  • Multi-model routing — preset-based configuration with automatic model load/unload
  • Interactive downloads — search HuggingFace, pick a quant, download with progress and resume
  • Rich terminal UI — tables, panels, interactive selectors, and live search
  • GGUF inspector — view model metadata, architecture, and sampling parameters
  • Server props — inspect active sampling parameters on loaded models
  • Sampling sync — automatically applies GGUF-recommended sampling params to your preset
  • Per-model settings — context size, GPU layers, flash attention, and more
  • Idle model unloading — background watchdog automatically unloads models after configurable idle timeout
  • VRAM tracking — automatically parses server logs to show memory usage per model
  • Auto-sync — preset file stays in sync with the llama.cpp cache automatically

Screenshots

Model listingllb models

llb models

Interactive downloadllb download

llb download

llb download quantization

Model infollb info

llb info

Installation

pipx install llama-buddy

Or with uv:

uv tool install llama-buddy

This installs the llb command into an isolated environment and adds it to your PATH.

Prerequisites

  • Python 3.10+
  • llama.cpp installed and llama-server on your PATH

Quick start

# Download a model (interactive search)
llb download

# Or specify directly
llb download mistralai/Ministral-3-3B-Instruct-2512-GGUF:Q4_K_M

# Start the server
llb start

# List all models
llb models

# Chat with a model (uses llama-cli)
llb chat

# Inspect model metadata
llb info

# Show active sampling params for a loaded model
llb props

# Apply GGUF-recommended sampling params to all models
llb info --apply-sampling

# Configure settings (interactive TUI)
llb settings

# Open the web UI in your browser
llb open

# Stop the server
llb stop

Commands

Command Description
llb start Start llama-server in the background. Extra args are forwarded.
llb stop Stop the running server.
llb restart Restart the server.
llb status Show whether the server is running.
llb models List all models with status, size, VRAM usage, and grouping. Supports --sort size.
llb download [model] Download a model. Interactive HF search when no model given.
llb remove [model] Remove a model with confirmation dialog. --keep-files to preserve GGUFs.
llb info [model] Show GGUF metadata. Interactive selector when no model given.
llb info --apply-sampling [model] Write GGUF sampling params into the preset. All models when no model given.
llb props [model] Show active server sampling params for a loaded model.
llb settings Interactive editor for global and per-model settings.
llb chat [model] Interactive chat via llama-cli. Model selector when no model given.
llb open Open the llama-server web UI in your browser.
llb logs Tail the server log file.

Configuration

Config files live in ~/.config/llama/:

File Purpose
models.ini Model preset file — sections are HF repo IDs, auto-synced with cache
settings.json Global server settings (port, context size, GPU layers, etc.)
vram.json Cached per-model VRAM usage (parsed from server logs)
server.pid PID of the running server
server.log Server stdout/stderr

Per-model settings

Run llb settings and select Model Settings to configure per-model overrides:

  • Context size, GPU layers, flash attention
  • Custom aliases
  • Any llama-server parameter

Development

# Clone and install
git clone https://github.com/thilomichael/llama-buddy.git
cd llama-buddy
uv sync

# Run
uv run llb <command>

# Test
uv run pytest

# Lint
uv run ruff check src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_buddy-0.1.9.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_buddy-0.1.9-py3-none-any.whl (40.2 kB view details)

Uploaded Python 3

File details

Details for the file llama_buddy-0.1.9.tar.gz.

File metadata

  • Download URL: llama_buddy-0.1.9.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llama_buddy-0.1.9.tar.gz
Algorithm Hash digest
SHA256 b99cc7dae5c8e0cc7dd612c9875dad2383fad56a8cbd3a8a007c173dc3098133
MD5 99fd71cc9cd24652c6f62e1b869bd907
BLAKE2b-256 495ed167fa91f4e99844311eaaa2ec98a73f5f30a57b666863d8ed11b62a2fc6

See more details on using hashes here.

File details

Details for the file llama_buddy-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: llama_buddy-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 40.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llama_buddy-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ac79dccc9b7f96d07a20c1486c3cbb557ed5f024699f1237c63279baa168c2fa
MD5 daf83b232adef60dfc9b3d563956d039
BLAKE2b-256 51e6cbe381c4aa2fb714d40fbefc857e9e2fe3b24063cc4bb2a9672d66bfd4f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page