Skip to main content

Lightweight CLI client for a remote Ollama server

Project description

ollama-client

A lightweight CLI client for a remote Ollama server. Designed to be a drop-in replacement for the ollama CLI when you need to talk to a server running on a different machine — output formats, column names, and flag names match the official client character-for-character wherever possible.

ollama-client --host http://my-server:11434 list
ollama-client -H http://my-server:11434 run qwen3.5:4b
  • Download size for ollama (which includes server): ~1.9Gb
  • Download size for ollama-client (no server): < 50k

Why use this instead of the full ollama?

The official ollama binary bundles a complete inference server — CUDA runtimes, model management daemons, the works. That makes sense if you want to run models locally. It's overkill if you don't.

You already have a server. If Ollama runs on a desktop with a GPU, a home server, or a remote machine, every other device — laptops, CI runners, WSL terminals, Raspberry Pis — only needs the client. There's no reason to download 1.9 GB of server software onto machines that will never serve a model.

Constrained environments. Bandwidth-limited machines, shared CI infrastructure, network-restricted environments, and minimal containers all benefit from a < 50 KB install that has no compiled extensions and no native dependencies.

Faster iteration. uv tool install ollama-client completes in seconds and works in a plain Python environment — no separate install script, no PATH surgery, no sudo. Useful in scripts and automation where you want the tool available without ceremony.

Separation of concerns. Keeping the server on dedicated hardware and the client on workstations is a cleaner architecture: one place to manage models, one place to restart the service, one place to monitor GPU memory. The client follows the server's address wherever it moves.

Compatible static output. Tabular output (list, ps, show), size formatting, relative time strings, and error prefixes are verified to match the official client character-for-character — so scripts and tooling that parse ollama output work without changes. Terminal animations (spinners, pull progress) look similar but are not identical.

Quick Start

Get started quickly with these common operations:

# Install the client
uv tool install ollama-client

# Point at your server (--host / -H, or set OLLAMA_HOST)
export OLLAMA_HOST=http://my-server:11434

# List available models on your Ollama server
ollama-client list
ollama-client -H http://my-server:11434 list   # one-off override

# Run a model interactively (REPL)
ollama-client run qwen3.5:4b

# Run a single prompt
ollama-client run qwen3.5:4b "What is the capital of France?"

# Pull a model from the registry
ollama-client pull qwen3.5:4b

Installation

Requires Python 3.11+.

uv tool install ollama-client

Or with pip:

pip install ollama-client

Once installed, the ollama-client command is available globally. To run directly from source:

uv sync
uv run ollama-client

Usage

ollama-client [command] [flags]

Commands:
  run     Run a model
  pull    Pull a model from a registry
  list    List models
  ps      List running models
  stop    Stop a running model
  show    Show information for a model
  rm      Remove a model
  cp      Copy a model
  launch  Launch an AI integration backed by this server
  signin  Sign in to ollama.com
  signout Sign out of ollama.com
  help    Help about any command

run

# Interactive REPL (maintains conversation history)
ollama-client run qwen3.5:4b

# One-shot generation
ollama-client run qwen3.5:4b "Why is the sky blue?"

# With a system prompt
ollama-client run qwen3.5:4b "Summarise this." --system "You are a concise assistant."

# Show timing and token statistics after the response
ollama-client run qwen3.5:4b "Hello" --verbose

The REPL accepts /help, /clear, and /bye. Ctrl+D and Ctrl+C also exit cleanly.

launch

Configure and launch an AI coding tool backed by the local Ollama server.

# Interactive menu — pick integration and model
ollama-client launch

# Launch directly
ollama-client launch claude
ollama-client launch claude --model qwen3.5:4b

# Configure only (write config files / print env vars, don't launch)
ollama-client launch codex --config --model qwen3.5:4b

# Pass extra arguments through to the integration
ollama-client launch codex -- --sandbox workspace-write

Supported integrations:

Name Description Configuration method
claude Claude Code Environment variables (ANTHROPIC_BASE_URL, model overrides)
copilot Copilot CLI Environment variables (COPILOT_PROVIDER_BASE_URL)
codex Codex ~/.codex/config.toml (merged, not overwritten)
hermes Hermes Agent ~/.hermes/config.yaml (merged, legacy ollama provider removed)
opencode OpenCode OPENCODE_CONFIG_CONTENT environment variable
pi Pi ~/.pi/agent/models.json and ~/.pi/agent/settings.json
vscode VS Code Launches code (no Ollama config; use the Ollama VS Code extension)

Aliases: copilot-clicopilot, codevscode.

help

# Show top-level help
ollama-client help

# Show help for a specific command
ollama-client help run
ollama-client help launch

pull / list / ps / stop / show / rm / cp

These match the official client's invocation exactly:

ollama-client pull qwen3.5:4b
ollama-client list
ollama-client ps
ollama-client stop qwen3.5:4b
ollama-client show qwen3.5:4b
ollama-client rm qwen3.5:4b
ollama-client cp qwen3.5:4b my-qwen

Setting the host

The host is resolved in this order, stopping at the first match:

  1. --host flagollama-client --host http://192.168.1.10:11434 list (also accepted as -H on any subcommand)
  2. OLLAMA_CLIENT_HOST environment variableexport OLLAMA_CLIENT_HOST=http://192.168.1.10:11434 (useful when you want to target a different remote host while leaving OLLAMA_HOST unchanged)
  3. OLLAMA_HOST environment variableexport OLLAMA_HOST=http://192.168.1.10:11434 (0.0.0.0 is automatically rewritten to localhost, so a server-side bind address works as-is)
  4. Config file~/.config/ollama-client/config.toml
  5. Defaulthttp://localhost:11434

Config file

Create ~/.config/ollama-client/config.toml:

[ollama]
host = "http://192.168.1.10:11434"

The http:// scheme is optional — bare hostname:port is accepted and normalised automatically.

Compatibility with the official ollama CLI

The goal is that output piped from ollama-client is indistinguishable from ollama output. The following have been verified to match character-for-character:

Feature Compatible
list column names and spacing (NAME, ID, SIZE, MODIFIED) Yes
ps column names and spacing (NAME, ID, SIZE, PROCESSOR, UNTIL) Yes
show section layout (Model, Capabilities, Parameters, License) Yes
pull success message (success) Yes
Size formatting (SI units, ÷1000) Yes
Relative time strings ("2 hours ago", "3 days ago") Yes
--verbose stats layout (total/load/eval duration, token rates) Yes
Error prefix format (Error: ... to stderr) Yes

Divergencies

These are deliberate omissions or differences:

Missing commands

  • serve — this tool is a client only; it does not start an Ollama server.
  • create — building new models from a Modelfile is not supported.
  • push — pushing models to a registry is not supported.

launch differences

The official launch command includes a full TUI, model capability detection (vision, reasoning, context length), auto-install of integrations, cloud model support, and --yes auto-confirmation. This client implements the core configuration and launch flow only — no TUI, no capability probing, no auto-install.

Missing flags on run

ollama flag Status
--format FORMAT Implemented
--nowordwrap Implemented
--keepalive DURATION Implemented
--think [VALUE] Implemented (true/false or high/medium/low)
--verbose Implemented
Image path argument (run llava image.jpg) Not implemented

REPL differences

The official REPL supports extended slash commands (/save, /load, /show, /set, /unset) and multiline input via """. This client supports only /help, /clear, and /bye.

The REPL uses a custom input handler on Windows so that Ctrl+D behaves as EOF (the standard input() call does not support this on Windows).

run MODEL PROMPT vs REPL

A one-shot run MODEL PROMPT call uses /api/generate. The interactive REPL uses /api/chat with full message history. This matches the official client's behaviour.

Dependencies

This package has three runtime dependencies. httpx is self-evidently required. The other two:

rich — used for all terminal output: the list/ps tables, the pull progress bar, live streaming text, and the --verbose stats block. Replicating that output with stdlib print calls would require hundreds of lines of manual ANSI escape handling to match the column alignment and formatting that the official client produces. Rich handles it in a few declarative calls and stays out of the way when stdout is not a TTY.

pyyaml — used only by the launch hermes subcommand, which must read, merge into, and write back ~/.hermes/config.yaml without destroying the user's existing configuration. Python's stdlib has no YAML parser; a hand-rolled round-trip would be riskier than the dependency.

Development

uv sync
uv run pytest          # unit tests (fast, no server required)
uv run mypy src/ollama_client
uv run ruff check src/ollama_client

Compatibility tests

The compat suite runs the client against a live Ollama server and verifies output matches the real ollama CLI. It requires:

  • A running Ollama server (default http://localhost:11434)
  • The ollama binary on PATH
  • tmux for terminal interaction tests
# Run with the default model (rnj-1:latest)
uv run pytest -m compat

# Run with a specific model
OLLAMA_COMPAT_MODEL=qwen3.5:4b uv run pytest -m compat

# Run only the help-text flag checks (no server or tmux needed, just the ollama binary)
uv run pytest -m compat tests/test_compat.py::TestHelp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_client-0.1.0.tar.gz (79.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_client-0.1.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file ollama_client-0.1.0.tar.gz.

File metadata

  • Download URL: ollama_client-0.1.0.tar.gz
  • Upload date:
  • Size: 79.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ollama_client-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cfce1e21312301ad639ea07448ccc067d4f9b74d8ce12941376aa24aa2705468
MD5 70d2964daa0dd38fc8ec09db01a6fbc5
BLAKE2b-256 cb802290b8b2c19e5a36d00de6d4111e84746f4cfdabae43d3e4165cc0b55824

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_client-0.1.0.tar.gz:

Publisher: pypi.yaml on rhiza-fr/ollama-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ollama_client-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ollama_client-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ollama_client-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1ba38b6b60e09918ced35bdcef981c30e2faf14189cef93acee2c3b80aea927
MD5 4ca91a8c2a01c459ab3b3120e11c6bef
BLAKE2b-256 6621798a2f1f13b3f77fb53288ca87aabb18bc5d37aff08fd00329fd09c2ced6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_client-0.1.0-py3-none-any.whl:

Publisher: pypi.yaml on rhiza-fr/ollama-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page