Speak the Ollama API protocol — route requests to OpenAI, Gemini, Anthropic, AWS Bedrock, Azure, Groq, xAI, Mistral, DeepSeek, Together, Perplexity, Kimi, and any OpenAI-compatible server.

These details have not been verified by PyPI

Project links

Project description

ollama-proxy

A lightweight server that speaks the Ollama API protocol and routes every request to a cloud or self-hosted LLM backend.

Distributed on PyPI as ollama-proxy-plus (the ollama-proxy name was taken). Install with pip install ollama-proxy-plus. Everything else — import path, CLI command, endpoints — stays as ollama-proxy.

Any tool that already works with Ollama — Open WebUI, Continue, Cursor, GitHub Copilot, LangChain, LlamaIndex, the ollama Python SDK — connects with zero code changes.

Your App  ---->  POST /api/chat            (Ollama protocol)
         ---->  POST /v1/chat/completions  (OpenAI protocol)
                        |
                 ollama-proxy
                        |  routes by model-name prefix
       +----------------+----------------+------- ... -------+
  openai/*        gemini/*        anthropic/*           custom/*
  OpenAI API    Gemini API      Anthropic API     your vLLM / Ray server

The model name prefix is the only routing key. The proxy speaks two wire formats simultaneously — the Ollama protocol on /api/* and the OpenAI protocol on /v1/* — so it works with both ecosystems.

Install

# Core (OpenAI / Gemini / xAI / Groq / Mistral / DeepSeek / Together / Perplexity / Kimi / vLLM / etc.)
pip install ollama-proxy-plus

# With Anthropic SDK
pip install "ollama-proxy-plus[anthropic]"

# With AWS Bedrock support
pip install "ollama-proxy-plus[bedrock]"

# Everything
pip install "ollama-proxy-plus[all]"

Or run without installing using uv:

uvx ollama-proxy-plus

Quick start

# 1. Set API keys for the providers you'll use
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=AIza...

# 2. Start the proxy on port 11434 (Ollama's default)
ollama-proxy

# 3. Use any Ollama-compatible client pointed at http://localhost:11434
curl http://localhost:11434/api/chat \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hi"}]}'

Optional: copy .env.example to .env and put keys there instead of exporting them.

CLI

ollama-proxy [serve]    # run the proxy (default subcommand)
ollama-proxy doctor     # check config + provider connectivity
ollama-proxy list-models
ollama-proxy test openai/gpt-4o-mini --prompt "Say hello"

serve flags:

--host 0.0.0.0 (default)
--port 11434 (default)
--debug — verbose request/response logging
--reload — auto-reload on code changes (dev)
--config path.yaml — load YAML config
--workers N — number of worker processes

Or python -m ollama_proxy ... works too.

Adding models

Three ways, in increasing permanence:

1. Direct call (no registration)

Any <prefix>/<model-id> works as long as the prefix is configured. The model just won't appear in /api/tags discovery.

curl http://localhost:11434/api/chat -d '{
  "model": "openai/gpt-5-preview",
  "messages": [{"role": "user", "content": "Hi"}]
}'

2. `EXTRA_MODELS` env var

export EXTRA_MODELS="openai/gpt-5,gemini/gemini-3-pro,kimi/kimi-k2.7"
ollama-proxy

3. YAML config file

# ollama-proxy.yaml
models:
  - openai/gpt-5
  - gemini/gemini-3-pro

ollama-proxy --config ollama-proxy.yaml

Adding providers

OpenAI-compatible (most providers, including self-hosted)

Option A — environment variables (auto-discovered):

export PROVIDERS_VLLM_API_KEY=EMPTY
export PROVIDERS_VLLM_BASE_URL=http://localhost:8000/v1
ollama-proxy

Use vllm/<model-id> in your client. No code changes needed.

Option B — YAML config:

providers:
  myvllm:
    api_key: EMPTY
    base_url: http://localhost:8000/v1
  ray:
    api_key: ${RAY_API_KEY}
    base_url: http://ray-head:8000/v1
    retries: 2
    fallback: openai/gpt-4o-mini

Non-compatible provider (Cohere, Vertex AI, etc.)

For wire formats incompatible with OpenAI:

Subclass BaseProvider in ollama_proxy/providers/myprovider_provider.py
Implement chat(), chat_stream(), optionally chat_full() and chat_stream_raw() for tool support, and embed()
Add a routing branch in ollama_proxy/server.py::get_provider()

Built-in providers

OpenAI-compatible (one adapter, many providers)

Prefix	Provider	Base URL	Env Variable
`openai/`	OpenAI	`https://api.openai.com/v1`	`OPENAI_API_KEY`
`gemini/`	Google Gemini	`https://generativelanguage.googleapis.com/v1beta/openai/`	`GEMINI_API_KEY`
`xai/`	xAI / Grok	`https://api.x.ai/v1`	`XAI_API_KEY`
`groq/`	Groq	`https://api.groq.com/openai/v1`	`GROQ_API_KEY`
`mistral/`	Mistral AI	`https://api.mistral.ai/v1`	`MISTRAL_API_KEY`
`deepseek/`	DeepSeek	`https://api.deepseek.com`	`DEEPSEEK_API_KEY`
`together/`	Together AI	`https://api.together.xyz/v1`	`TOGETHER_API_KEY`
`perplexity/`	Perplexity	`https://api.perplexity.ai`	`PERPLEXITY_API_KEY`
`kimi/`	Kimi / Moonshot	`https://api.moonshot.ai/v1`	`KIMI_API_KEY`

Endpoints can change. Verify against each provider's docs before relying on them in production.

Self-hosted servers

Any OpenAI-compatible REST server works the same way (typical defaults):

Server	Base URL
vLLM, Ray Serve (vLLM backend)	`http://localhost:8000/v1`
Hugging Face TGI, LocalAI, Llamafile	`http://localhost:8080/v1`
LM Studio	`http://localhost:1234/v1`
Ollama (real instance)	`http://localhost:11434/v1`
Jan	`http://localhost:1337/v1`

Native SDK providers

Prefix	Provider	Auth	Install
`anthropic/`	Anthropic / Claude	`ANTHROPIC_API_KEY`	`pip install ollama-proxy-plus[anthropic]`
`azure/`	Azure OpenAI	`AZURE_OPENAI_*` vars	(core)
`bedrock/`	AWS Bedrock	AWS credential chain	`pip install ollama-proxy-plus[bedrock]`

Azure OpenAI

AZURE_OPENAI_API_KEY=<key>
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-08-01-preview

The part after azure/ is your deployment name. For multi-resource setups, populate AZURE_DEPLOYMENTS in config.py.

AWS Bedrock

Uses the standard AWS credential chain — no API key. Just have AWS creds configured via env vars, profile, or IAM role:

AWS_REGION=us-east-1
# plus access keys, profile, or IAM role

Use the model ID or cross-region inference profile after bedrock/:

bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
bedrock/us.amazon.nova-pro-v1:0
bedrock/us.meta.llama3-1-70b-instruct-v1:0

Enable model access in the Bedrock console first.

YAML configuration

The proxy reads --config <file> or $PROXY_CONFIG_FILE. See ollama-proxy.example.yaml. Highlights:

providers:
  openai:
    api_key: ${OPENAI_API_KEY}
    retries: 2                            # transient-error retries
    fallback: groq/llama-3.3-70b-versatile  # used after retries exhausted

  myvllm:
    api_key: EMPTY
    base_url: http://localhost:8000/v1

models:
  - openai/gpt-4o
  - myvllm/meta-llama/Llama-3.1-8B-Instruct

${ENV_VAR} references are expanded. YAML config layers on top of env-var defaults.

Retry & fallback

Any provider can have retries and fallback. On transient failures (429, 5xx, timeouts, connection errors) the proxy retries with exponential backoff, then falls back to the configured model:

providers:
  kimi:
    retries: 2
    fallback: openai/gpt-4o-mini

Set per provider; client sees a successful response from the fallback if the primary is degraded.

Health checks

Endpoint	Description
`/health`	Lightweight: status, version, configured providers, model count
`/health/providers`	Pings each OpenAI-compat provider's `/models` endpoint

Useful for monitoring, load-balancer probes, and quickly seeing which provider is down right now.

Logging & observability

Every request gets a unique x-request-id (returned in response headers and included in every log line for that request).

# Verbose request/response logs
ollama-proxy --debug

# JSON-formatted logs (for log aggregators)
PROXY_LOG_JSON=1 ollama-proxy --debug

Sample debug output:

2026-06-04 12:30:15 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: REQ /v1/chat/completions model=gemini/gemini-2.5-pro stream=True messages=10 tools=74 format=None
2026-06-04 12:30:18 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: RES /v1/chat/completions status=200 finish=stop

Run ollama-proxy doctor to validate your configuration and check upstream connectivity in one shot.

Endpoints

Ollama protocol

Endpoint	Notes
`GET /`	"Ollama is running"
`GET /api/version`	Reports 0.6.5
`GET /api/tags`	All registered models
`GET /api/ps`	Always empty (no VRAM concept)
`POST /api/chat`	Streaming + non-streaming, `tools`, `format`
`POST /api/generate`	Streaming + non-streaming, `format`
`POST /api/embed` / `/api/embeddings`	Both formats supported
`POST /api/show`	Model details
`POST /api/pull`	Mocked (3 progress events)

OpenAI protocol (Copilot, OpenAI SDK, Continue)

Endpoint	Notes
`POST /v1/chat/completions`	Streaming SSE + non-streaming, `tools`, `response_format`
`GET /v1/models`	Lists registered models

Health

Endpoint	Notes
`GET /health`	Basic status
`GET /health/providers`	Connectivity check across all OpenAI-compat providers

Usage examples

ollama Python SDK

import ollama
client = ollama.Client(host="http://localhost:11434")
resp = client.chat(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi"}])
print(resp["message"]["content"])

OpenAI SDK

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="dummy")
resp = client.chat.completions.create(
    model="kimi/kimi-k2.6",
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

Open WebUI

Settings → Connections → Ollama URL: http://localhost:11434. All registered models appear in the dropdown.

GitHub Copilot

In VS Code settings, configure the Ollama provider with http://localhost:11434 and pick any registered model.

Development

git clone https://github.com/skamalj/ollama-proxy
cd ollama-proxy
uv sync --extra all
uv run ollama-proxy --debug

Run tests:

uv run pytest

Project structure

ollama_proxy/
├── __init__.py              # Package version + app re-export
├── __main__.py              # python -m ollama_proxy
├── cli.py                   # CLI entry point
├── server.py                # FastAPI app + endpoint handlers
├── config.py                # Config loading (env + YAML + auto-discovery)
├── logging_config.py        # Structured logging + correlation IDs
├── retry.py                 # Retry / fallback orchestration
├── commands/                # CLI subcommands
│   ├── doctor.py
│   ├── list_models.py
│   └── test.py
└── providers/
    ├── base.py
    ├── openai_compat_provider.py   # OpenAI, Gemini, Groq, Kimi, vLLM, ...
    ├── anthropic_provider.py       # Anthropic / Claude
    ├── azure_provider.py           # Azure OpenAI
    └── bedrock_provider.py         # AWS Bedrock (Converse API)

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_proxy_plus-0.1.0.tar.gz (24.6 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollama_proxy_plus-0.1.0-py3-none-any.whl (30.4 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file ollama_proxy_plus-0.1.0.tar.gz.

File metadata

Download URL: ollama_proxy_plus-0.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_proxy_plus-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24364000f48aa5b6885b42699057f80fa0e8a3d2412df974a0c927da6c592094`
MD5	`bc04a4ff1011f571a092096c012514e0`
BLAKE2b-256	`7188d689035eb23e425f2492b6aec285f80fda807b0fdc1a8cabfa4a1d37745d`

See more details on using hashes here.

File details

Details for the file ollama_proxy_plus-0.1.0-py3-none-any.whl.

File metadata

Download URL: ollama_proxy_plus-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_proxy_plus-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93c174e1819ecd46870c195d271192571a8ac3fcee0cd336a3372990fb899b16`
MD5	`ba0e073944766eed14c56b30879854e5`
BLAKE2b-256	`d77cde85f5f60d6a01b3197dbed7c0ecacf870a9e917631047f40d0c1295a7ff`

See more details on using hashes here.

ollama-proxy-plus 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ollama-proxy

Install

Quick start

CLI

Adding models

1. Direct call (no registration)

2. EXTRA_MODELS env var

3. YAML config file

Adding providers

OpenAI-compatible (most providers, including self-hosted)

Non-compatible provider (Cohere, Vertex AI, etc.)

Built-in providers

OpenAI-compatible (one adapter, many providers)

Self-hosted servers

Native SDK providers

Azure OpenAI

AWS Bedrock

YAML configuration

Retry & fallback

Health checks

Logging & observability

Endpoints

Ollama protocol

OpenAI protocol (Copilot, OpenAI SDK, Continue)

Health

Usage examples

ollama Python SDK

OpenAI SDK

Open WebUI

GitHub Copilot

Development

Project structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. `EXTRA_MODELS` env var