Ollama-like CLI wrapper around llama.cpp

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

joeyjiaojg

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Project description

llamacpp-cli

Ollama-like CLI wrapper around llama.cpp. Provides a simple command-line interface that mirrors Ollama's subcommands but powered by llama.cpp as the backend inference engine.

Features

pull - Download GGUF models from Hugging Face
run - Run models interactively using llama.cpp
serve - Start the llama.cpp server
lb-proxy - Multi-backend load balancer proxy (NEW!)
list - List downloaded models
ps - Show running llama.cpp processes
rm - Remove a downloaded model
search - Search Hugging Face for GGUF models
install - Install/update llama.cpp binaries

Installation

From PyPI

pip install llamacpp-cli

From Source

pip install -e .

Quick Start

1. Install llama.cpp binaries

llamacpp install

This downloads the latest llama.cpp release to ~/.llamacpp/bin/.

2. Pull a model

llamacpp pull unsloth/gemma-3-270m-it-GGUF:Q4_K_M

Or use a short alias:

llamacpp pull gemma3:270m

3. Run interactively

llamacpp run gemma3:270m

4. Start the server

llamacpp serve -m gemma3:270m

The server runs at http://0.0.0.0:8080 with OpenAI-compatible API.

CPU-Optimized Presets

For CPU-only servers, use presets optimized for different workloads:

# Code tasks (default): 16K context, 2-4 parallel requests
llamacpp serve --preset code

# Chat/conversational: 8K context, 4-6 parallel requests
llamacpp serve --preset chat

# Fast queries: 4K context, 6-8 parallel requests
llamacpp serve --preset fast

# Large codebases: 32K context, 1 parallel request (slower)
llamacpp serve --preset max-context

See CPU_OPTIMIZATION.md for detailed tuning guide.

Commands

llamacpp pull <model>      Download GGUF model from Hugging Face
llamacpp run <model>       Run a model interactively
llamacpp serve             Start the llama.cpp server
llamacpp lb-proxy          Start multi-backend load balancer (see LB_PROXY.md)
llamacpp list              List downloaded models
llamacpp ps                Show running processes
llamacpp rm <model>        Remove a model
llamacpp search <query>    Search for models on Hugging Face
llamacpp install           Install/update llama.cpp binaries

Load Balancer Proxy

For distributing requests across multiple machines, use the load balancer:

# Auto-discover backends on your network
llamacpp lb-proxy --discover-subnet 192.168.1.0/24

# Or specify backends manually
llamacpp lb-proxy -b http://machine1:8000 -b http://machine2:8000

See LB_PROXY.md for detailed documentation on:

Model-aware routing
Least-connections load balancing
Auto-discovery and health checks
Configuration options

Model Names

Model names can be specified in multiple ways:

Full Hugging Face path: unsloth/gemma-3-270m-it-GGUF:Q4_K_M
Short format: namespace/model:quantization (e.g., gemma3:270m)
Short name: gemma3:270m, qwen3, llama3:8b

Alias support is planned for future releases.

Configuration

Models are stored in ~/.llamacpp/models/
Binaries are installed to ~/.llamacpp/bin/
Database (SQLite) is at ~/.llamacpp/llamacpp.db

Environment Variables

Variable	Description	Default
`LLAMACPP_BIN_DIR`	Directory for llama.cpp binaries	`~/.llamacpp/bin`
`LLAMACPP_MODEL_DIR`	Directory for models	`~/.llamacpp/models`

Usage with LLM CLI

This package also registers as an LLM plugin for the llm CLI:

# Install the plugin (requires llm and llama-cpp-python)
pip install llm-llama-cpp llama-cpp-python

# Register a model
llm llama-cpp add-model ~/.llamacpp/models/gemma-3-270m-it-Q4_K_M.gguf --alias gemma3:270m

# Use with llm
llm -m gemma3:270m "Your prompt here"

Development

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run a single test file
pytest tests/test_foo.py

# Lint
ruff check .

# Format
ruff format .

Publishing to PyPI

Prerequisites

Create a PyPI account at https://pypi.org/
Install build tools:

pip install build twine

Build and Publish

Update version in pyproject.toml:

[project]
version = "0.1.0"

Build the package:

python -m build

This creates distributable archives in dist/.

Upload to PyPI:

twine upload dist/*

You'll be prompted for your PyPI username and password.

For Test PyPI (testing first):

twine upload --repository testpypi dist/*

Using uv (Alternative)

# Install uv if not already
pip install uv

# Build
uv build

# Publish to PyPI
uv publish

# Or Test PyPI
uv publish --test

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

joeyjiaojg

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

0.1.8

May 29, 2026

0.1.7

May 29, 2026

0.1.6

May 29, 2026

0.1.5

May 28, 2026

0.1.4

May 27, 2026

0.1.3

May 27, 2026

0.1.2

Apr 27, 2026

0.1.1

Apr 25, 2026

0.1.0

Apr 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamacpp_cli-0.1.8.tar.gz (127.1 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llamacpp_cli-0.1.8-py3-none-any.whl (89.2 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file llamacpp_cli-0.1.8.tar.gz.

File metadata

Download URL: llamacpp_cli-0.1.8.tar.gz
Upload date: May 29, 2026
Size: 127.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llamacpp_cli-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`5e9d7c10373f11132475b34a9e1b1347389e81c60a0b98797af3af98178c6c30`
MD5	`23cd7520163817eba851d823287a58cd`
BLAKE2b-256	`024dac91ab14d1e8370c725940181015d3856c5f152e8c6c9548875a6db49116`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llamacpp_cli-0.1.8.tar.gz:

Publisher: publish.yml on joeyjiaojg/llamacpp-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llamacpp_cli-0.1.8.tar.gz
- Subject digest: 5e9d7c10373f11132475b34a9e1b1347389e81c60a0b98797af3af98178c6c30
- Sigstore transparency entry: 1671070564
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: joeyjiaojg/llamacpp-cli@ff63053290b69bf009ceff698f812e5fe49e0eb1
- Branch / Tag: refs/tags/v0.1.8
- Owner: https://github.com/joeyjiaojg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ff63053290b69bf009ceff698f812e5fe49e0eb1
- Trigger Event: push

File details

Details for the file llamacpp_cli-0.1.8-py3-none-any.whl.

File metadata

Download URL: llamacpp_cli-0.1.8-py3-none-any.whl
Upload date: May 29, 2026
Size: 89.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llamacpp_cli-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`baafc5c432e4757c1a7ae70cb1b525f98a2bf1e69a235e79f6e8cc8e8d7096ff`
MD5	`ca43f9eee60c21b8313fa6dc903b8521`
BLAKE2b-256	`0797706b7621d44cc09b81d637fc796ad9e6f2b0261f9caad1e180b0bf554525`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llamacpp_cli-0.1.8-py3-none-any.whl:

Publisher: publish.yml on joeyjiaojg/llamacpp-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llamacpp_cli-0.1.8-py3-none-any.whl
- Subject digest: baafc5c432e4757c1a7ae70cb1b525f98a2bf1e69a235e79f6e8cc8e8d7096ff
- Sigstore transparency entry: 1671070673
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: joeyjiaojg/llamacpp-cli@ff63053290b69bf009ceff698f812e5fe49e0eb1
- Branch / Tag: refs/tags/v0.1.8
- Owner: https://github.com/joeyjiaojg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ff63053290b69bf009ceff698f812e5fe49e0eb1
- Trigger Event: push

llamacpp-cli 0.1.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llamacpp-cli

Features

Installation

From PyPI

From Source

Quick Start

1. Install llama.cpp binaries

2. Pull a model

3. Run interactively

4. Start the server

CPU-Optimized Presets

Commands

Load Balancer Proxy

Model Names

Configuration

Environment Variables

Usage with LLM CLI

Development

Publishing to PyPI

Prerequisites

Build and Publish

Using uv (Alternative)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance