Skip to main content

Speak the Ollama API protocol — route requests to OpenAI, Gemini, Anthropic, AWS Bedrock, Azure, Groq, xAI, Mistral, DeepSeek, Together, Perplexity, Kimi, and any OpenAI-compatible server.

Project description

ollama-proxy

PyPI version Python versions License: MIT

A lightweight server that speaks the Ollama API protocol and routes every request to a cloud or self-hosted LLM backend.

Distributed on PyPI as ollama-proxy-plus (the ollama-proxy name was taken). Install with pip install ollama-proxy-plus. Everything else — import path, CLI command, endpoints — stays as ollama-proxy.

Any tool that already works with Ollama — Open WebUI, Continue, Cursor, GitHub Copilot, LangChain, LlamaIndex, the ollama Python SDK — connects with zero code changes.

Your App  ---->  POST /api/chat            (Ollama protocol)
         ---->  POST /v1/chat/completions  (OpenAI protocol)
                        |
                 ollama-proxy
                        |  routes by model-name prefix
       +----------------+----------------+------- ... -------+
  openai/*        gemini/*        anthropic/*           custom/*
  OpenAI API    Gemini API      Anthropic API     your vLLM / Ray server

The model name prefix is the only routing key. The proxy speaks two wire formats simultaneously — the Ollama protocol on /api/* and the OpenAI protocol on /v1/* — so it works with both ecosystems.


Install

# Core (OpenAI / Gemini / xAI / Groq / Mistral / DeepSeek / Together / Perplexity / Kimi / vLLM / etc.)
pip install ollama-proxy-plus

# With Anthropic SDK
pip install "ollama-proxy-plus[anthropic]"

# With AWS Bedrock support
pip install "ollama-proxy-plus[bedrock]"

# Everything
pip install "ollama-proxy-plus[all]"

Or run without installing using uv:

uvx ollama-proxy-plus

Quick start

# 1. Set API keys for the providers you'll use
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=AIza...

# 2. Start the proxy on port 11434 (Ollama's default)
ollama-proxy

# 3. Use any Ollama-compatible client pointed at http://localhost:11434
curl http://localhost:11434/api/chat \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role":"user","content":"Hi"}]}'

Optional: copy .env.example to .env and put keys there instead of exporting them.


CLI

ollama-proxy [serve]    # run the proxy (default subcommand)
ollama-proxy doctor     # check config + provider connectivity
ollama-proxy list-models
ollama-proxy test openai/gpt-4o-mini --prompt "Say hello"

serve flags:

  • --host 0.0.0.0 (default)
  • --port 11434 (default)
  • --debug — verbose request/response logging
  • --reload — auto-reload on code changes (dev)
  • --config path.yaml — load YAML config
  • --workers N — number of worker processes

Or python -m ollama_proxy ... works too.


Adding models

Three ways, in increasing permanence:

1. Direct call (no registration)

Any <prefix>/<model-id> works as long as the prefix is configured. The model just won't appear in /api/tags discovery.

curl http://localhost:11434/api/chat -d '{
  "model": "openai/gpt-5-preview",
  "messages": [{"role": "user", "content": "Hi"}]
}'

2. EXTRA_MODELS env var

export EXTRA_MODELS="openai/gpt-5,gemini/gemini-3-pro,kimi/kimi-k2.7"
ollama-proxy

3. YAML config file

# ollama-proxy.yaml
models:
  - openai/gpt-5
  - gemini/gemini-3-pro
ollama-proxy --config ollama-proxy.yaml

Adding providers

OpenAI-compatible (most providers, including self-hosted)

Option A — environment variables (auto-discovered):

export PROVIDERS_VLLM_API_KEY=EMPTY
export PROVIDERS_VLLM_BASE_URL=http://localhost:8000/v1
ollama-proxy

Use vllm/<model-id> in your client. No code changes needed.

Option B — YAML config:

providers:
  myvllm:
    api_key: EMPTY
    base_url: http://localhost:8000/v1
  ray:
    api_key: ${RAY_API_KEY}
    base_url: http://ray-head:8000/v1
    retries: 2
    fallback: openai/gpt-4o-mini

Non-compatible provider (Cohere, Vertex AI, etc.)

For wire formats incompatible with OpenAI:

  1. Subclass BaseProvider in ollama_proxy/providers/myprovider_provider.py
  2. Implement chat(), chat_stream(), optionally chat_full() and chat_stream_raw() for tool support, and embed()
  3. Add a routing branch in ollama_proxy/server.py::get_provider()

Built-in providers

OpenAI-compatible (one adapter, many providers)

Prefix Provider Base URL Env Variable
openai/ OpenAI https://api.openai.com/v1 OPENAI_API_KEY
gemini/ Google Gemini https://generativelanguage.googleapis.com/v1beta/openai/ GEMINI_API_KEY
xai/ xAI / Grok https://api.x.ai/v1 XAI_API_KEY
groq/ Groq https://api.groq.com/openai/v1 GROQ_API_KEY
mistral/ Mistral AI https://api.mistral.ai/v1 MISTRAL_API_KEY
deepseek/ DeepSeek https://api.deepseek.com DEEPSEEK_API_KEY
together/ Together AI https://api.together.xyz/v1 TOGETHER_API_KEY
perplexity/ Perplexity https://api.perplexity.ai PERPLEXITY_API_KEY
kimi/ Kimi / Moonshot https://api.moonshot.ai/v1 KIMI_API_KEY

Endpoints can change. Verify against each provider's docs before relying on them in production.

Self-hosted servers

Any OpenAI-compatible REST server works the same way (typical defaults):

Server Base URL
vLLM, Ray Serve (vLLM backend) http://localhost:8000/v1
Hugging Face TGI, LocalAI, Llamafile http://localhost:8080/v1
LM Studio http://localhost:1234/v1
Ollama (real instance) http://localhost:11434/v1
Jan http://localhost:1337/v1

Native SDK providers

Prefix Provider Auth Install
anthropic/ Anthropic / Claude ANTHROPIC_API_KEY pip install ollama-proxy-plus[anthropic]
azure/ Azure OpenAI AZURE_OPENAI_* vars (core)
bedrock/ AWS Bedrock AWS credential chain pip install ollama-proxy-plus[bedrock]

Azure OpenAI

AZURE_OPENAI_API_KEY=<key>
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-08-01-preview

The part after azure/ is your deployment name. For multi-resource setups, populate AZURE_DEPLOYMENTS in config.py.

AWS Bedrock

Uses the standard AWS credential chain — no API key. Just have AWS creds configured via env vars, profile, or IAM role:

AWS_REGION=us-east-1
# plus access keys, profile, or IAM role

Use the model ID or cross-region inference profile after bedrock/:

bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
bedrock/us.amazon.nova-pro-v1:0
bedrock/us.meta.llama3-1-70b-instruct-v1:0

Enable model access in the Bedrock console first.


YAML configuration

The proxy reads --config <file> or $PROXY_CONFIG_FILE. See ollama-proxy.example.yaml. Highlights:

providers:
  openai:
    api_key: ${OPENAI_API_KEY}
    retries: 2                            # transient-error retries
    fallback: groq/llama-3.3-70b-versatile  # used after retries exhausted

  myvllm:
    api_key: EMPTY
    base_url: http://localhost:8000/v1

models:
  - openai/gpt-4o
  - myvllm/meta-llama/Llama-3.1-8B-Instruct

${ENV_VAR} references are expanded. YAML config layers on top of env-var defaults.


Retry & fallback

Any provider can have retries and fallback. On transient failures (429, 5xx, timeouts, connection errors) the proxy retries with exponential backoff, then falls back to the configured model:

providers:
  kimi:
    retries: 2
    fallback: openai/gpt-4o-mini

Set per provider; client sees a successful response from the fallback if the primary is degraded.


Health checks

Endpoint Description
/health Lightweight: status, version, configured providers, model count
/health/providers Pings each OpenAI-compat provider's /models endpoint

Useful for monitoring, load-balancer probes, and quickly seeing which provider is down right now.


Logging & observability

Every request gets a unique x-request-id (returned in response headers and included in every log line for that request).

# Verbose request/response logs
ollama-proxy --debug

# JSON-formatted logs (for log aggregators)
PROXY_LOG_JSON=1 ollama-proxy --debug

Sample debug output:

2026-06-04 12:30:15 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: REQ /v1/chat/completions model=gemini/gemini-2.5-pro stream=True messages=10 tools=74 format=None
2026-06-04 12:30:18 [DEBUG] [a3f2b1c4d5e6] ollama-proxy: RES /v1/chat/completions status=200 finish=stop

Run ollama-proxy doctor to validate your configuration and check upstream connectivity in one shot.


Endpoints

Ollama protocol

Endpoint Notes
GET / "Ollama is running"
GET /api/version Reports 0.6.5
GET /api/tags All registered models
GET /api/ps Always empty (no VRAM concept)
POST /api/chat Streaming + non-streaming, tools, format
POST /api/generate Streaming + non-streaming, format
POST /api/embed / /api/embeddings Both formats supported
POST /api/show Model details
POST /api/pull Mocked (3 progress events)

OpenAI protocol (Copilot, OpenAI SDK, Continue)

Endpoint Notes
POST /v1/chat/completions Streaming SSE + non-streaming, tools, response_format
GET /v1/models Lists registered models

Health

Endpoint Notes
GET /health Basic status
GET /health/providers Connectivity check across all OpenAI-compat providers

Usage examples

ollama Python SDK

import ollama
client = ollama.Client(host="http://localhost:11434")
resp = client.chat(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi"}])
print(resp["message"]["content"])

OpenAI SDK

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="dummy")
resp = client.chat.completions.create(
    model="kimi/kimi-k2.6",
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

Open WebUI

Settings → Connections → Ollama URL: http://localhost:11434. All registered models appear in the dropdown.

GitHub Copilot

In VS Code settings, configure the Ollama provider with http://localhost:11434 and pick any registered model.


Development

git clone https://github.com/skamalj/ollama-proxy
cd ollama-proxy
uv sync --extra all
uv run ollama-proxy --debug

Run tests:

uv run pytest

Project structure

ollama_proxy/
├── __init__.py              # Package version + app re-export
├── __main__.py              # python -m ollama_proxy
├── cli.py                   # CLI entry point
├── server.py                # FastAPI app + endpoint handlers
├── config.py                # Config loading (env + YAML + auto-discovery)
├── logging_config.py        # Structured logging + correlation IDs
├── retry.py                 # Retry / fallback orchestration
├── commands/                # CLI subcommands
│   ├── doctor.py
│   ├── list_models.py
│   └── test.py
└── providers/
    ├── base.py
    ├── openai_compat_provider.py   # OpenAI, Gemini, Groq, Kimi, vLLM, ...
    ├── anthropic_provider.py       # Anthropic / Claude
    ├── azure_provider.py           # Azure OpenAI
    └── bedrock_provider.py         # AWS Bedrock (Converse API)

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_proxy_plus-0.1.0.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_proxy_plus-0.1.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file ollama_proxy_plus-0.1.0.tar.gz.

File metadata

  • Download URL: ollama_proxy_plus-0.1.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_proxy_plus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 24364000f48aa5b6885b42699057f80fa0e8a3d2412df974a0c927da6c592094
MD5 bc04a4ff1011f571a092096c012514e0
BLAKE2b-256 7188d689035eb23e425f2492b6aec285f80fda807b0fdc1a8cabfa4a1d37745d

See more details on using hashes here.

File details

Details for the file ollama_proxy_plus-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ollama_proxy_plus-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_proxy_plus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93c174e1819ecd46870c195d271192571a8ac3fcee0cd336a3372990fb899b16
MD5 ba0e073944766eed14c56b30879854e5
BLAKE2b-256 d77cde85f5f60d6a01b3197dbed7c0ecacf870a9e917631047f40d0c1295a7ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page