Skip to main content

OpenAI- and Anthropic-compatible API server that routes through agent CLIs

Project description

agent-relay

CI PyPI License: MIT Python 3.10+

Drop-in OpenAI and Anthropic API server that routes through agent CLIs (currently Claude Code).

Compatibility note: claude-relay remains available as a compatibility package/command alias.

Why

You have tools that speak the OpenAI or Anthropic API. You have Claude Code with its tools, MCP servers, and agentic capabilities. agent-relay bridges the two — point any compatible client at it and every request flows through claude -p under the hood.

  • Use Claude Code from any OpenAI or Anthropic client — Cursor, Continue, aider, LangChain, custom scripts
  • Keep Claude Code's superpowers — tool use, MCP servers, file access, shell execution
  • Zero config — if claude works on your machine, so does this
  • Real token usage — reports actual token counts from Claude (not zeros)
  • Token-level streaming — uses --include-partial-messages for true real-time deltas

Install

# With uv (recommended)
uvx agent-relay serve

# Or install globally
uv tool install agentrelay-cli
agent-relay serve

# Or from source
git clone https://github.com/npow/claude-relay.git
cd claude-relay
uv sync
uv run agent-relay serve

Quick start

agent-relay serve
# Server starts on http://localhost:18082

Run as background service (macOS)

# Install and auto-start on login
agent-relay service install

The installer will offer to add these to your ~/.zshrc (or ~/.bashrc) so every SDK and agent picks up the relay automatically:

export ANTHROPIC_BASE_URL="http://127.0.0.1:18082"
export OPENAI_BASE_URL="http://127.0.0.1:18082/v1"
# Check status
agent-relay service status

# Update
uv tool upgrade agentrelay-cli
agent-relay service restart

# Stop and remove
agent-relay service uninstall

Point any OpenAI-compatible client at it:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:18082/v1", api_key="unused")

# Streaming
for chunk in client.chat.completions.create(
    model="sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="")

# Non-streaming
resp = client.chat.completions.create(
    model="sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Anthropic SDK

import anthropic

# Just set the base URL — the SDK reads ANTHROPIC_BASE_URL automatically
# export ANTHROPIC_BASE_URL=http://localhost:18082
client = anthropic.Anthropic(base_url="http://localhost:18082")

# Streaming
with client.messages.stream(
    model="sonnet",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

# Non-streaming
resp = client.messages.create(
    model="sonnet",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.content[0].text)

LangChain

from langchain_anthropic import ChatAnthropic

# export ANTHROPIC_BASE_URL=http://localhost:18082
llm = ChatAnthropic(model="sonnet")
print(llm.invoke("Hello!").content)

curl

# OpenAI format
curl http://localhost:18082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","messages":[{"role":"user","content":"Hello"}],"stream":true}'

# OpenAI Responses format
curl http://localhost:18082/v1/responses \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","input":"Hello"}'

# Anthropic format
curl http://localhost:18082/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"sonnet","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

Configuration

agent-relay serve [--host HOST] [--port PORT]
Flag Default Description
--host 0.0.0.0 Bind address
--port 18082 Bind port

API

Endpoint Method Description
/v1/chat/completions POST Chat completions (OpenAI-compatible)
/v1/responses POST Responses API (OpenAI-compatible)
/v1/messages POST Messages (Anthropic-compatible)
/v1/models GET List available models
/health GET Server and CLI status

All endpoints also work without the /v1 prefix. CORS is enabled for all origins.

Supported features

Feature Status
Streaming (SSE) Yes
System messages Yes (via --system-prompt)
Multi-turn conversations Yes
Multimodal (text parts) Yes
Model selection Yes
Token usage reporting Yes
CORS Yes

Models

Pass any model name — it goes directly to claude --model:

Model Description
opus Most capable
sonnet Balanced (default)
haiku Fastest

Limitations

  • temperature, max_tokens, top_p, and other sampling parameters are ignored (Claude Code CLI does not expose them)
  • No tool/function calling passthrough (Claude Code uses its own tools internally, but they aren't exposed via the OpenAI tool-calling protocol)
  • Each request spawns a new claude process (~2-3s overhead on top of API latency)
  • No image/audio content forwarding — only text parts of multimodal messages are extracted

How it works

OpenAI client     ─┐
                    ├→  claude-relay  →  claude -p  →  Anthropic API
Anthropic client  ─┘     (FastAPI)      (stream-json)

Each request spawns a claude -p process with --output-format stream-json --include-partial-messages. The proxy translates between the OpenAI or Anthropic wire format and Claude Code's streaming JSON protocol. Requests are stateless — no conversation history bleeds between calls.

Development

uv sync
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentrelay_cli-0.5.3.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentrelay_cli-0.5.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file agentrelay_cli-0.5.3.tar.gz.

File metadata

  • Download URL: agentrelay_cli-0.5.3.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentrelay_cli-0.5.3.tar.gz
Algorithm Hash digest
SHA256 921c799d543082956d3f75e95c71bb5356c2694ca32b77af7ff66a2f5d51f253
MD5 d8c1ca4fb899744d30623089cc5cfd99
BLAKE2b-256 9a460593d89a182164266863e5e47183a727388b20ccdb1ab98f4807156678c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentrelay_cli-0.5.3.tar.gz:

Publisher: publish.yml on npow/claude-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentrelay_cli-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: agentrelay_cli-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentrelay_cli-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 63453c8cfe278fd2ffb69e4c1323ada3dbac6f9d74325e9ee3a8bbd404070857
MD5 f50743bba8b0ebda2f3836ea38ed3438
BLAKE2b-256 1fccfd55bb83834d12afa703230e44bc8eb5f18fe1de5cdb55178278236c8099

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentrelay_cli-0.5.3-py3-none-any.whl:

Publisher: publish.yml on npow/claude-relay

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page