Lightweight CLI client for a remote Ollama server
Project description
ollama-client
A lightweight CLI client for a remote Ollama server. Designed to be a drop-in replacement for the ollama CLI when you need to talk to a server running on a different machine — output formats, column names, and flag names match the official client character-for-character wherever possible.
ollama-client --host http://my-server:11434 list
ollama-client -H http://my-server:11434 run qwen3.5:4b
- Download size for ollama (which includes server): ~1.9Gb
- Download size for ollama-client (no server): < 50k
Why use this instead of the full ollama?
The official ollama binary bundles a complete inference server — CUDA runtimes, model management daemons, the works. That makes sense if you want to run models locally. It's overkill if you don't.
You already have a server. If Ollama runs on a desktop with a GPU, a home server, or a remote machine, every other device — laptops, CI runners, WSL terminals, Raspberry Pis — only needs the client. There's no reason to download 1.9 GB of server software onto machines that will never serve a model.
Constrained environments. Bandwidth-limited machines, shared CI infrastructure, network-restricted environments, and minimal containers all benefit from a < 50 KB install that has no compiled extensions and no native dependencies.
Faster iteration. uv tool install ollama-client completes in seconds and works in a plain Python environment — no separate install script, no PATH surgery, no sudo. Useful in scripts and automation where you want the tool available without ceremony.
Separation of concerns. Keeping the server on dedicated hardware and the client on workstations is a cleaner architecture: one place to manage models, one place to restart the service, one place to monitor GPU memory. The client follows the server's address wherever it moves.
Compatible static output. Tabular output (list, ps, show), size formatting, relative time strings, and error prefixes are verified to match the official client character-for-character — so scripts and tooling that parse ollama output work without changes. Terminal animations (spinners, pull progress) look similar but are not identical.
Quick Start
Get started quickly with these common operations:
# Install the client
uv tool install ollama-client
# Point at your server (--host / -H, or set OLLAMA_HOST)
export OLLAMA_HOST=http://my-server:11434
# List available models on your Ollama server
ollama-client list
ollama-client -H http://my-server:11434 list # one-off override
# Run a model interactively (REPL)
ollama-client run qwen3.5:4b
# Run a single prompt
ollama-client run qwen3.5:4b "What is the capital of France?"
# Pull a model from the registry
ollama-client pull qwen3.5:4b
Installation
Requires Python 3.11+.
uv tool install ollama-client
Or with pip:
pip install ollama-client
Once installed, the ollama-client command is available globally. To run directly from source:
uv sync
uv run ollama-client
Usage
ollama-client [command] [flags]
Commands:
run Run a model
pull Pull a model from a registry
list List models
ps List running models
stop Stop a running model
show Show information for a model
rm Remove a model
cp Copy a model
launch Launch an AI integration backed by this server
signin Sign in to ollama.com
signout Sign out of ollama.com
help Help about any command
run
# Interactive REPL (maintains conversation history)
ollama-client run qwen3.5:4b
# One-shot generation
ollama-client run qwen3.5:4b "Why is the sky blue?"
# With a system prompt
ollama-client run qwen3.5:4b "Summarise this." --system "You are a concise assistant."
# Show timing and token statistics after the response
ollama-client run qwen3.5:4b "Hello" --verbose
The REPL accepts /help, /clear, and /bye. Ctrl+D and Ctrl+C also exit cleanly.
launch
Configure and launch an AI coding tool backed by the local Ollama server.
# Interactive menu — pick integration and model
ollama-client launch
# Launch directly
ollama-client launch claude
ollama-client launch claude --model qwen3.5:4b
# Configure only (write config files / print env vars, don't launch)
ollama-client launch codex --config --model qwen3.5:4b
# Pass extra arguments through to the integration
ollama-client launch codex -- --sandbox workspace-write
Supported integrations:
| Name | Description | Configuration method |
|---|---|---|
claude |
Claude Code | Environment variables (ANTHROPIC_BASE_URL, model overrides) |
copilot |
Copilot CLI | Environment variables (COPILOT_PROVIDER_BASE_URL) |
codex |
Codex | ~/.codex/config.toml (merged, not overwritten) |
hermes |
Hermes Agent | ~/.hermes/config.yaml (merged, legacy ollama provider removed) |
opencode |
OpenCode | OPENCODE_CONFIG_CONTENT environment variable |
pi |
Pi | ~/.pi/agent/models.json and ~/.pi/agent/settings.json |
vscode |
VS Code | Launches code (no Ollama config; use the Ollama VS Code extension) |
Aliases: copilot-cli → copilot, code → vscode.
help
# Show top-level help
ollama-client help
# Show help for a specific command
ollama-client help run
ollama-client help launch
pull / list / ps / stop / show / rm / cp
These match the official client's invocation exactly:
ollama-client pull qwen3.5:4b
ollama-client list
ollama-client ps
ollama-client stop qwen3.5:4b
ollama-client show qwen3.5:4b
ollama-client rm qwen3.5:4b
ollama-client cp qwen3.5:4b my-qwen
Setting the host
The host is resolved in this order, stopping at the first match:
--hostflag —ollama-client --host http://192.168.1.10:11434 list(also accepted as-Hon any subcommand)OLLAMA_CLIENT_HOSTenvironment variable —export OLLAMA_CLIENT_HOST=http://192.168.1.10:11434(useful when you want to target a different remote host while leavingOLLAMA_HOSTunchanged)OLLAMA_HOSTenvironment variable —export OLLAMA_HOST=http://192.168.1.10:11434(0.0.0.0is automatically rewritten tolocalhost, so a server-side bind address works as-is)- Config file —
~/.config/ollama-client/config.toml - Default —
http://localhost:11434
Config file
Create ~/.config/ollama-client/config.toml:
[ollama]
host = "http://192.168.1.10:11434"
The http:// scheme is optional — bare hostname:port is accepted and normalised automatically.
Compatibility with the official ollama CLI
The goal is that output piped from ollama-client is indistinguishable from ollama output. The following have been verified to match character-for-character:
| Feature | Compatible |
|---|---|
list column names and spacing (NAME, ID, SIZE, MODIFIED) |
Yes |
ps column names and spacing (NAME, ID, SIZE, PROCESSOR, UNTIL) |
Yes |
show section layout (Model, Capabilities, Parameters, License) |
Yes |
pull success message (success) |
Yes |
| Size formatting (SI units, ÷1000) | Yes |
| Relative time strings ("2 hours ago", "3 days ago") | Yes |
--verbose stats layout (total/load/eval duration, token rates) |
Yes |
Error prefix format (Error: ... to stderr) |
Yes |
Divergencies
These are deliberate omissions or differences:
Missing commands
serve— this tool is a client only; it does not start an Ollama server.create— building new models from a Modelfile is not supported.push— pushing models to a registry is not supported.
launch differences
The official launch command includes a full TUI, model capability detection (vision, reasoning, context length), auto-install of integrations, cloud model support, and --yes auto-confirmation. This client implements the core configuration and launch flow only — no TUI, no capability probing, no auto-install.
Missing flags on run
| ollama flag | Status |
|---|---|
--format FORMAT |
Implemented |
--nowordwrap |
Implemented |
--keepalive DURATION |
Implemented |
--think [VALUE] |
Implemented (true/false or high/medium/low) |
--verbose |
Implemented |
Image path argument (run llava image.jpg) |
Not implemented |
REPL differences
The official REPL supports extended slash commands (/save, /load, /show, /set, /unset) and multiline input via """. This client supports only /help, /clear, and /bye.
The REPL uses a custom input handler on Windows so that Ctrl+D behaves as EOF (the standard input() call does not support this on Windows).
run MODEL PROMPT vs REPL
A one-shot run MODEL PROMPT call uses /api/generate. The interactive REPL uses /api/chat with full message history. This matches the official client's behaviour.
Dependencies
This package has three runtime dependencies. httpx is self-evidently required. The other two:
rich — used for all terminal output: the list/ps tables, the pull progress bar, live streaming text, and the --verbose stats block. Replicating that output with stdlib print calls would require hundreds of lines of manual ANSI escape handling to match the column alignment and formatting that the official client produces. Rich handles it in a few declarative calls and stays out of the way when stdout is not a TTY.
pyyaml — used only by the launch hermes subcommand, which must read, merge into, and write back ~/.hermes/config.yaml without destroying the user's existing configuration. Python's stdlib has no YAML parser; a hand-rolled round-trip would be riskier than the dependency.
Development
uv sync
uv run pytest # unit tests (fast, no server required)
uv run mypy src/ollama_client
uv run ruff check src/ollama_client
Compatibility tests
The compat suite runs the client against a live Ollama server and verifies output matches the real ollama CLI. It requires:
- A running Ollama server (default
http://localhost:11434) - The
ollamabinary onPATH tmuxfor terminal interaction tests
# Run with the default model (rnj-1:latest)
uv run pytest -m compat
# Run with a specific model
OLLAMA_COMPAT_MODEL=qwen3.5:4b uv run pytest -m compat
# Run only the help-text flag checks (no server or tmux needed, just the ollama binary)
uv run pytest -m compat tests/test_compat.py::TestHelp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_client-0.1.0.tar.gz.
File metadata
- Download URL: ollama_client-0.1.0.tar.gz
- Upload date:
- Size: 79.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfce1e21312301ad639ea07448ccc067d4f9b74d8ce12941376aa24aa2705468
|
|
| MD5 |
70d2964daa0dd38fc8ec09db01a6fbc5
|
|
| BLAKE2b-256 |
cb802290b8b2c19e5a36d00de6d4111e84746f4cfdabae43d3e4165cc0b55824
|
Provenance
The following attestation bundles were made for ollama_client-0.1.0.tar.gz:
Publisher:
pypi.yaml on rhiza-fr/ollama-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ollama_client-0.1.0.tar.gz -
Subject digest:
cfce1e21312301ad639ea07448ccc067d4f9b74d8ce12941376aa24aa2705468 - Sigstore transparency entry: 1408748642
- Sigstore integration time:
-
Permalink:
rhiza-fr/ollama-client@d20c283ed3e8dcf24e2ede567f11e0a649c5b861 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rhiza-fr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@d20c283ed3e8dcf24e2ede567f11e0a649c5b861 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ollama_client-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ollama_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1ba38b6b60e09918ced35bdcef981c30e2faf14189cef93acee2c3b80aea927
|
|
| MD5 |
4ca91a8c2a01c459ab3b3120e11c6bef
|
|
| BLAKE2b-256 |
6621798a2f1f13b3f77fb53288ca87aabb18bc5d37aff08fd00329fd09c2ced6
|
Provenance
The following attestation bundles were made for ollama_client-0.1.0-py3-none-any.whl:
Publisher:
pypi.yaml on rhiza-fr/ollama-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ollama_client-0.1.0-py3-none-any.whl -
Subject digest:
e1ba38b6b60e09918ced35bdcef981c30e2faf14189cef93acee2c3b80aea927 - Sigstore transparency entry: 1408748750
- Sigstore integration time:
-
Permalink:
rhiza-fr/ollama-client@d20c283ed3e8dcf24e2ede567f11e0a649c5b861 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rhiza-fr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@d20c283ed3e8dcf24e2ede567f11e0a649c5b861 -
Trigger Event:
release
-
Statement type: