Speak the Ollama API protocol — route requests to OpenAI, Gemini, Anthropic, AWS Bedrock, Azure, Groq, xAI, Mistral, DeepSeek, Together, Perplexity, Kimi, and any OpenAI-compatible server.
Project description
ollama-proxy
A lightweight server that speaks the Ollama API protocol and routes every request to a cloud or self-hosted LLM backend.
Distributed on PyPI as
ollama-proxy-plus(theollama-proxyname was taken). Install withpip install ollama-proxy-plus. Everything else — import path, CLI command, endpoints — stays asollama-proxy.
Any tool that already works with Ollama — Open WebUI, Continue, Cursor,
GitHub Copilot, LangChain, LlamaIndex, the ollama Python SDK — connects with
zero code changes.
Your App ----> POST /api/chat (Ollama protocol)
----> POST /v1/chat/completions (OpenAI protocol)
|
ollama-proxy
| routes by model-name prefix
+----------------+----------------+------- ... -------+
openai/* gemini/* anthropic/* custom/*
OpenAI API Gemini API Anthropic API your vLLM / Ray server
The model name prefix is the only routing key. The proxy speaks two wire
formats simultaneously — the Ollama protocol on /api/* and the OpenAI
protocol on /v1/* — so it works with both ecosystems.
Install
# Core (OpenAI / Gemini / xAI / Groq / Mistral / DeepSeek / Together / Perplexity / Kimi / vLLM / etc.)
pip install ollama-proxy-plus
# With Anthropic SDK
pip install "ollama-proxy-plus[anthropic]"
# With AWS Bedrock support
pip install "ollama-proxy-plus[bedrock]"
# Everything
pip install "ollama-proxy-plus[all]"
Or run without installing using uv:
uvx ollama-proxy-plus
Quick start
# 1. Set API keys for the providers you'll use
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=AIza...
# 2. Start the proxy on port 11434 (Ollama's default)
ollama-proxy
# 3. Use any Ollama-compatible client pointed at http://localhost:11434
curl http://localhost:11434/api/chat \
-d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hi"}]}'
Optional: copy .env.example to .env and put keys there instead of exporting them.
CLI
ollama-proxy [serve] # run the proxy (default subcommand)
ollama-proxy doctor # check config + provider connectivity
ollama-proxy list-models
ollama-proxy test openai/gpt-4o-mini --prompt "Say hello"
serve flags:
--host 0.0.0.0(default)--port 11434(default)--debug— verbose request/response logging--reload— auto-reload on code changes (dev)--config path.yaml— load YAML config--workers N— number of worker processes
Or python -m ollama_proxy ... works too.
Adding models
Three ways, in increasing permanence:
1. Direct call (no registration)
Any <prefix>/<model-id> works as long as the prefix is configured. The model
just won't appear in /api/tags discovery.
curl http://localhost:11434/api/chat -d '{
"model": "openai/gpt-5-preview",
"messages": [{"role": "user", "content": "Hi"}]
}'
2. EXTRA_MODELS env var
export EXTRA_MODELS="openai/gpt-5,gemini/gemini-3-pro,kimi/kimi-k2.7"
ollama-proxy
3. YAML config file
# ollama-proxy.yaml
models:
- openai/gpt-5
- gemini/gemini-3-pro
ollama-proxy --config ollama-proxy.yaml
Adding providers
OpenAI-compatible (most providers, including self-hosted)
Option A — environment variables (auto-discovered):
export PROVIDERS_VLLM_API_KEY=EMPTY
export PROVIDERS_VLLM_BASE_URL=http://localhost:8000/v1
ollama-proxy
Use vllm/<model-id> in your client. No code changes needed.
Option B — YAML config:
providers:
myvllm:
api_key: EMPTY
base_url: http://localhost:8000/v1
ray:
api_key: ${RAY_API_KEY}
base_url: http://ray-head:8000/v1
retries: 2
fallback: openai/gpt-4o-mini
Non-compatible provider (Cohere, Vertex AI, etc.)
For wire formats incompatible with OpenAI:
- Subclass
BaseProviderinollama_proxy/providers/myprovider_provider.py - Implement
chat(),chat_stream(), optionallychat_full()andchat_stream_raw()for tool support, andembed() - Add a routing branch in
ollama_proxy/server.py::get_provider()
Built-in providers
OpenAI-compatible (one adapter, many providers)
| Prefix | Provider | Base URL | Env Variable |
|---|---|---|---|
openai/ |
OpenAI | https://api.openai.com/v1 |
OPENAI_API_KEY |
gemini/ |
Google Gemini | https://generativelanguage.googleapis.com/v1beta/openai/ |
GEMINI_API_KEY |
xai/ |
xAI / Grok | https://api.x.ai/v1 |
XAI_API_KEY |
groq/ |
Groq | https://api.groq.com/openai/v1 |
GROQ_API_KEY |
mistral/ |
Mistral AI | https://api.mistral.ai/v1 |
MISTRAL_API_KEY |
deepseek/ |
DeepSeek | https://api.deepseek.com |
DEEPSEEK_API_KEY |
together/ |
Together AI | https://api.together.xyz/v1 |
TOGETHER_API_KEY |
perplexity/ |
Perplexity | https://api.perplexity.ai |
PERPLEXITY_API_KEY |
kimi/ |
Kimi / Moonshot | https://api.moonshot.ai/v1 |
KIMI_API_KEY |
Endpoints can change. Verify against each provider's docs before relying on them in production.
Self-hosted servers
Any OpenAI-compatible REST server works the same way (typical defaults):
| Server | Base URL |
|---|---|
| vLLM, Ray Serve (vLLM backend) | http://localhost:8000/v1 |
| Hugging Face TGI, LocalAI, Llamafile | http://localhost:8080/v1 |
| LM Studio | http://localhost:1234/v1 |
| Ollama (real instance) | http://localhost:11434/v1 |
| Jan | http://localhost:1337/v1 |
Native SDK providers
| Prefix | Provider | Auth | Install |
|---|---|---|---|
anthropic/ |
Anthropic / Claude | ANTHROPIC_API_KEY |
pip install ollama-proxy-plus[anthropic] |
azure/ |
Azure OpenAI | AZURE_OPENAI_* vars |
(core) |
bedrock/ |
AWS Bedrock | AWS credential chain | pip install ollama-proxy-plus[bedrock] |
Azure OpenAI
AZURE_OPENAI_API_KEY=<key>
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-08-01-preview
The part after azure/ is your deployment name. For multi-resource setups,
populate AZURE_DEPLOYMENTS in config.py.
AWS Bedrock
Uses the standard AWS credential chain — no API key. Just have AWS creds configured via env vars, profile, or IAM role:
AWS_REGION=us-east-1
# plus access keys, profile, or IAM role
Use the model ID or cross-region inference profile after bedrock/:
bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
bedrock/us.amazon.nova-pro-v1:0
bedrock/us.meta.llama3-1-70b-instruct-v1:0
Enable model access in the Bedrock console first.
YAML configuration
The proxy reads --config <file> or $PROXY_CONFIG_FILE. See
ollama-proxy.example.yaml. Highlights:
providers:
openai:
api_key: ${OPENAI_API_KEY}
retries: 2 # transient-error retries
fallback: groq/llama-3.3-70b-versatile # used after retries exhausted
myvllm:
api_key: EMPTY
base_url: http://localhost:8000/v1
models:
- openai/gpt-4o
- myvllm/meta-llama/Llama-3.1-8B-Instruct
${ENV_VAR} references are expanded. YAML config layers on top of env-var
defaults.
Retry & fallback
Any provider can have retries and fallback. On transient failures
(429, 5xx, timeouts, connection errors) the proxy retries with exponential
backoff, then falls back to the configured model:
providers:
kimi:
retries: 2
fallback: openai/gpt-4o-mini
Set per provider; client sees a successful response from the fallback if the primary is degraded.
Health checks
| Endpoint | Description |
|---|---|
/health |
Lightweight: status, version, configured providers, model count |
/health/providers |
Pings each OpenAI-compat provider's /models endpoint |
Useful for monitoring, load-balancer probes, and quickly seeing which provider is down right now.
Logging & observability
Every request gets a unique x-request-id (returned in response headers and
included in every log line for that request).
# Verbose request/response logs
ollama-proxy --debug
# JSON-formatted logs (for log aggregators)
PROXY_LOG_JSON=1 ollama-proxy --debug
Sample debug output:
2026-06-04 12:30:15 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: REQ /v1/chat/completions model=gemini/gemini-2.5-pro stream=True messages=10 tools=74 format=None
2026-06-04 12:30:18 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: RES /v1/chat/completions status=200 finish=stop
Run ollama-proxy doctor to validate your configuration and check
upstream connectivity in one shot.
Endpoints
Ollama protocol
| Endpoint | Notes |
|---|---|
GET / |
"Ollama is running" |
GET /api/version |
Reports 0.6.5 |
GET /api/tags |
All registered models |
GET /api/ps |
Always empty (no VRAM concept) |
POST /api/chat |
Streaming + non-streaming, tools, format |
POST /api/generate |
Streaming + non-streaming, format |
POST /api/embed / /api/embeddings |
Both formats supported |
POST /api/show |
Model details |
POST /api/pull |
Mocked (3 progress events) |
OpenAI protocol (Copilot, OpenAI SDK, Continue)
| Endpoint | Notes |
|---|---|
POST /v1/chat/completions |
Streaming SSE + non-streaming, tools, response_format |
GET /v1/models |
Lists registered models |
Health
| Endpoint | Notes |
|---|---|
GET /health |
Basic status |
GET /health/providers |
Connectivity check across all OpenAI-compat providers |
Usage examples
ollama Python SDK
import ollama
client = ollama.Client(host="http://localhost:11434")
resp = client.chat(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi"}])
print(resp["message"]["content"])
OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="dummy")
resp = client.chat.completions.create(
model="kimi/kimi-k2.6",
messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)
Open WebUI
Settings → Connections → Ollama URL: http://localhost:11434. All registered
models appear in the dropdown.
GitHub Copilot
In VS Code settings, configure the Ollama provider with
http://localhost:11434 and pick any registered model.
Development
git clone https://github.com/skamalj/ollama-proxy
cd ollama-proxy
uv sync --extra all
uv run ollama-proxy --debug
Run tests:
uv run pytest
Project structure
ollama_proxy/
├── __init__.py # Package version + app re-export
├── __main__.py # python -m ollama_proxy
├── cli.py # CLI entry point
├── server.py # FastAPI app + endpoint handlers
├── config.py # Config loading (env + YAML + auto-discovery)
├── logging_config.py # Structured logging + correlation IDs
├── retry.py # Retry / fallback orchestration
├── commands/ # CLI subcommands
│ ├── doctor.py
│ ├── list_models.py
│ └── test.py
└── providers/
├── base.py
├── openai_compat_provider.py # OpenAI, Gemini, Groq, Kimi, vLLM, ...
├── anthropic_provider.py # Anthropic / Claude
├── azure_provider.py # Azure OpenAI
└── bedrock_provider.py # AWS Bedrock (Converse API)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_proxy_plus-0.1.0.tar.gz.
File metadata
- Download URL: ollama_proxy_plus-0.1.0.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24364000f48aa5b6885b42699057f80fa0e8a3d2412df974a0c927da6c592094
|
|
| MD5 |
bc04a4ff1011f571a092096c012514e0
|
|
| BLAKE2b-256 |
7188d689035eb23e425f2492b6aec285f80fda807b0fdc1a8cabfa4a1d37745d
|
File details
Details for the file ollama_proxy_plus-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ollama_proxy_plus-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93c174e1819ecd46870c195d271192571a8ac3fcee0cd336a3372990fb899b16
|
|
| MD5 |
ba0e073944766eed14c56b30879854e5
|
|
| BLAKE2b-256 |
d77cde85f5f60d6a01b3197dbed7c0ecacf870a9e917631047f40d0c1295a7ff
|