Skip to main content

OpenAI- and Anthropic-compatible API server that routes through Claude Code

Project description

claude-relay

CI PyPI License: MIT Python 3.10+

Drop-in OpenAI and Anthropic API server that routes through Claude Code.

Why

You have tools that speak the OpenAI or Anthropic API. You have Claude Code with its tools, MCP servers, and agentic capabilities. claude-relay bridges the two — point any compatible client at it and every request flows through claude -p under the hood.

  • Use Claude Code from any OpenAI or Anthropic client — Cursor, Continue, aider, LangChain, custom scripts
  • Keep Claude Code's superpowers — tool use, MCP servers, file access, shell execution
  • Zero config — if claude works on your machine, so does this
  • Real token usage — reports actual token counts from Claude (not zeros)
  • Token-level streaming — uses --include-partial-messages for true real-time deltas

Install

# With uv (recommended)
uvx claude-relay serve

# Or install globally
uv tool install claude-relay
claude-relay serve

# Or from source
git clone https://github.com/npow/claude-relay.git
cd claude-relay
uv sync
uv run claude-relay serve

Quick start

claude-relay serve
# Server starts on http://localhost:18082

Run as background service (macOS)

# Install and auto-start on login
claude-relay service install

The installer will offer to add these to your ~/.zshrc (or ~/.bashrc) so every SDK and agent picks up the relay automatically:

export ANTHROPIC_BASE_URL="http://127.0.0.1:18082"
export OPENAI_BASE_URL="http://127.0.0.1:18082/v1"
# Check status
claude-relay service status

# Update
uv tool upgrade claude-relay
claude-relay service restart

# Stop and remove
claude-relay service uninstall

Point any OpenAI-compatible client at it:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:18082/v1", api_key="unused")

# Streaming
for chunk in client.chat.completions.create(
    model="sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="")

# Non-streaming
resp = client.chat.completions.create(
    model="sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Anthropic SDK

import anthropic

# Just set the base URL — the SDK reads ANTHROPIC_BASE_URL automatically
# export ANTHROPIC_BASE_URL=http://localhost:18082
client = anthropic.Anthropic(base_url="http://localhost:18082")

# Streaming
with client.messages.stream(
    model="sonnet",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

# Non-streaming
resp = client.messages.create(
    model="sonnet",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.content[0].text)

LangChain

from langchain_anthropic import ChatAnthropic

# export ANTHROPIC_BASE_URL=http://localhost:18082
llm = ChatAnthropic(model="sonnet")
print(llm.invoke("Hello!").content)

curl

# OpenAI format
curl http://localhost:18082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","messages":[{"role":"user","content":"Hello"}],"stream":true}'

# OpenAI Responses format
curl http://localhost:18082/v1/responses \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","input":"Hello"}'

# Anthropic format
curl http://localhost:18082/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

Configuration

claude-relay serve [--host HOST] [--port PORT]
Flag Default Description
--host 0.0.0.0 Bind address
--port 18082 Bind port

API

Endpoint Method Description
/v1/chat/completions POST Chat completions (OpenAI-compatible)
/v1/responses POST Responses API (OpenAI-compatible)
/v1/messages POST Messages (Anthropic-compatible)
/v1/models GET List available models
/health GET Server and CLI status

All endpoints also work without the /v1 prefix. CORS is enabled for all origins.

Supported features

Feature Status
Streaming (SSE) Yes
System messages Yes (via --system-prompt)
Multi-turn conversations Yes
Multimodal (text parts) Yes
Model selection Yes
Token usage reporting Yes
CORS Yes

Models

Pass any model name — it goes directly to claude --model:

Model Description
opus Most capable
sonnet Balanced (default)
haiku Fastest

Limitations

  • temperature, max_tokens, top_p, and other sampling parameters are ignored (Claude Code CLI does not expose them)
  • No tool/function calling passthrough (Claude Code uses its own tools internally, but they aren't exposed via the OpenAI tool-calling protocol)
  • Each request spawns a new claude process (~2-3s overhead on top of API latency)
  • No image/audio content forwarding — only text parts of multimodal messages are extracted

How it works

OpenAI client     ─┐
                    ├→  claude-relay  →  claude -p  →  Anthropic API
Anthropic client  ─┘     (FastAPI)      (stream-json)

Each request spawns a claude -p process with --output-format stream-json --include-partial-messages. The proxy translates between the OpenAI or Anthropic wire format and Claude Code's streaming JSON protocol. Requests are stateless — no conversation history bleeds between calls.

Development

uv sync
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_relay-0.4.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_relay-0.4.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file claude_relay-0.4.0.tar.gz.

File metadata

  • Download URL: claude_relay-0.4.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_relay-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7ea9749bd20afdd7dc99e347d7a39ca341a5e22bd8761bd8d581a16289fa475d
MD5 1d1fb063263b4b982774a2df3140455b
BLAKE2b-256 1dc2912ea6070686444ce69effca1dbef3c9fe043691a00f379e4716c5b5b379

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_relay-0.4.0.tar.gz:

Publisher: publish.yml on npow/claude-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claude_relay-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: claude_relay-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_relay-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42fd00917504f4904812121fa34a6e4cdec599fcfa3d26ae3ca9a0a48f55f9b6
MD5 7f81a02819ba75cf95d4b440695c427e
BLAKE2b-256 e5b1797d1c306c218e8abee01810eff71fccafe6d0e8129391b9b80966b633dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_relay-0.4.0-py3-none-any.whl:

Publisher: publish.yml on npow/claude-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page