CLI wrapper for llama.cpp providing an ollama-like experience

These details have not been verified by PyPI

Project links

Project description

llama-buddy

A friendly CLI wrapper for llama.cpp

Manage, download, and serve local LLMs with a single command. Think of it as an ollama-like experience built on top of llama-server.

Features

Background server — start/stop/restart llama-server as a daemon
Multi-model routing — preset-based configuration with automatic model load/unload
Interactive downloads — search HuggingFace, pick a quant, download with progress and resume
Rich terminal UI — tables, panels, interactive selectors, and live search
GGUF inspector — view model metadata, architecture, and sampling parameters
Server props — inspect active sampling parameters on loaded models
Sampling sync — automatically applies GGUF-recommended sampling params to your preset
Per-model settings — context size, GPU layers, flash attention, and more
Idle model unloading — background watchdog automatically unloads models after configurable idle timeout
VRAM tracking — automatically parses server logs to show memory usage per model
Auto-sync — preset file stays in sync with the llama.cpp cache automatically

Screenshots

Model listing — llb models

llb models

Interactive download — llb download

llb download

llb download quantization

Model info — llb info

llb info

Installation

pipx install llama-buddy

Or with uv:

uv tool install llama-buddy

This installs the llb command into an isolated environment and adds it to your PATH.

Prerequisites

Python 3.10+
llama.cpp installed and llama-server on your PATH

Quick start

# Download a model (interactive search)
llb download

# Or specify directly
llb download mistralai/Ministral-3-3B-Instruct-2512-GGUF:Q4_K_M

# Start the server
llb start

# List all models
llb models

# Chat with a model (uses llama-cli)
llb chat

# Inspect model metadata
llb info

# Show active sampling params for a loaded model
llb props

# Apply GGUF-recommended sampling params to all models
llb info --apply-sampling

# Configure settings (interactive TUI)
llb settings

# Open the web UI in your browser
llb open

# Stop the server
llb stop

Commands

Command	Description
`llb start`	Start `llama-server` in the background. Extra args are forwarded.
`llb stop`	Stop the running server.
`llb restart`	Restart the server.
`llb status`	Show whether the server is running.
`llb models`	List all models with status, size, VRAM usage, and grouping. Supports `--sort size`.
`llb download [model]`	Download a model. Interactive HF search when no model given.
`llb remove [model]`	Remove a model with confirmation dialog. `--keep-files` to preserve GGUFs.
`llb info [model]`	Show GGUF metadata. Interactive selector when no model given.
`llb info --apply-sampling [model]`	Write GGUF sampling params into the preset. All models when no model given.
`llb props [model]`	Show active server sampling params for a loaded model.
`llb settings`	Interactive editor for global and per-model settings.
`llb chat [model]`	Interactive chat via `llama-cli`. Model selector when no model given.
`llb open`	Open the `llama-server` web UI in your browser.
`llb logs`	Tail the server log file.

Configuration

Config files live in ~/.config/llama/:

File	Purpose
`models.ini`	Model preset file — sections are HF repo IDs, auto-synced with cache
`settings.json`	Global server settings (port, context size, GPU layers, etc.)
`vram.json`	Cached per-model VRAM usage (parsed from server logs)
`server.pid`	PID of the running server
`server.log`	Server stdout/stderr

Per-model settings

Run llb settings and select Model Settings to configure per-model overrides:

Context size, GPU layers, flash attention
Custom aliases
Any llama-server parameter

Development

# Clone and install
git clone https://github.com/thilomichael/llama-buddy.git
cd llama-buddy
uv sync

# Run
uv run llb <command>

# Test
uv run pytest

# Lint
uv run ruff check src/ tests/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.13

Apr 27, 2026

This version

0.1.12

Apr 27, 2026

0.1.11

Apr 17, 2026

0.1.10

Apr 9, 2026

0.1.9

Apr 9, 2026

0.1.8

Apr 8, 2026

0.1.7

Apr 7, 2026

0.1.6

Mar 23, 2026

0.1.5

Mar 23, 2026

0.1.4

Mar 23, 2026

0.1.3

Mar 23, 2026

0.1.2

Mar 22, 2026

0.1.1

Mar 20, 2026

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_buddy-0.1.12.tar.gz (32.5 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_buddy-0.1.12-py3-none-any.whl (40.4 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file llama_buddy-0.1.12.tar.gz.

File metadata

Download URL: llama_buddy-0.1.12.tar.gz
Upload date: Apr 27, 2026
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llama_buddy-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`7fe07a9d23c3e20c61a3e6f6ae9412d50ad4f78fa7c44843a948d5eb30a0272c`
MD5	`82bc29829736afecfa1735d122a2429f`
BLAKE2b-256	`d7824541a6efa4b88ca35e23a2202d1e1fed6f54068458ec6cc771317409dca1`

See more details on using hashes here.

File details

Details for the file llama_buddy-0.1.12-py3-none-any.whl.

File metadata

Download URL: llama_buddy-0.1.12-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llama_buddy-0.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4648d61ca813884e1ce3cb6809991e3b929f76571364392c0a16293b066c27d`
MD5	`4465f8586561ddde0a435b6f79ceb5c9`
BLAKE2b-256	`ca63e6c0dbb0329adaefa47b332408411582f298d9e56a1094c065f5d6fc317c`

See more details on using hashes here.

llama-buddy 0.1.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llama-buddy

Features

Screenshots

Installation

Prerequisites

Quick start

Commands

Configuration

Per-model settings

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes