WebSocket relay that bridges AI coding agent CLIs (Claude Code, Codex, Gemini CLI, Snowflake Cortex) to any web interface — stream reasoning, tool calls, and file changes in real time.
Project description
ai-relay
WebSocket relay that bridges AI coding agent CLIs (Claude Code, Codex, Gemini CLI, Snowflake Cortex, and more) to any web interface — stream reasoning, tool calls, and file changes in real time.
Install
pip install ai-relay
Quick start
One-shot mode (local dev)
# Start the relay server (default: ws://0.0.0.0:8765)
ai-relay --port 8765
Server mode (container / daemon)
# Persistent server — each WebSocket connection becomes one independent agent session
ai-relay serve --port 9000
Then connect from OhWise Lab (or any WebSocket client) and send a handshake:
{"tool": "claude", "folder": "/path/to/project", "model": "claude-sonnet-4-6"}
The relay streams structured JSON events over WebSocket and forwards your messages to the selected backend. Claude Code and Gemini use native JSONL process protocols, Codex uses the app-server JSON-RPC protocol, and Snowflake Cortex uses HTTP/SSE. The PTY bridge is retained only for generic/legacy CLI tools.
Running in a container
ai-relay serve is designed to run inside a Docker container as a persistent daemon:
FROM python:3.11-slim
RUN pip install ai-relay
# Install your AI CLI here (e.g. npm install -g @anthropic-ai/claude-code)
CMD ["ai-relay", "serve", "--port", "9000"]
Each incoming WebSocket connection spawns an independent agent session. Multiple clients can connect simultaneously.
Event types
| Type | Description |
|---|---|
session_start |
Process spawned |
session_end |
Process exited (includes exit_code) |
stdout / stderr |
Raw output lines |
reasoning |
Agent thinking/planning text |
tool_call |
Agent invoking a tool (Read, Edit, Bash…) |
tool_result |
Result of a tool call |
file_diff |
File created or edited |
response |
Final answer text |
assistant_message |
Native structured assistant message |
user_message |
Native structured user/tool-result message |
stream_event |
Native streaming event |
status |
Native status/control event |
permission_request |
Tool permission prompt from a structured backend |
permission_cancelled |
Pending permission prompt was cancelled |
control_response |
Native control response acknowledgment |
tool_progress |
Native tool progress event |
quota_warning |
API quota / rate limit detected |
context_warning |
Context window nearing limit (includes context_pct) |
context_compacted |
Context was compacted |
error |
Relay or process error |
input_ack |
Relay confirms your message was sent to the process |
Sending commands
Send JSON over WebSocket:
{"text": "refactor the authentication module to use JWT"}
Claude Code also accepts structured web-client messages:
{"type": "user_message", "content": "refactor the authentication module to use JWT"}
Permission responses:
{"type": "permission_response", "request_id": "req", "behavior": "allow", "updatedInput": {"command": "git status"}}
Codex permission responses can also use:
{"type": "permission_response", "request_id": "req", "allow": true}
Interrupt the active structured turn:
{"type": "interrupt"}
Codex uses codex app-server --listen stdio:// and keeps a persistent thread behind the WebSocket session:
{"tool": "codex", "folder": "/path/to/project", "model": "gpt-5.2"}
Gemini CLI uses headless stream-json mode. Each text message starts one Gemini turn:
{"tool": "gemini", "folder": "/path/to/project", "model": "gemini-2.5-flash"}
Snowflake Cortex uses API configuration in the handshake.
Cortex chat mode:
{
"tool": "cortex",
"mode": "chat",
"model": "claude-sonnet-4-5",
"snowflake": {
"account_url": "https://<account>.snowflakecomputing.com",
"token_env": "SNOWFLAKE_PAT"
}
}
Cortex Analyst mode:
{
"tool": "cortex",
"mode": "analyst",
"snowflake": {
"account_url": "https://<account>.snowflakecomputing.com",
"token_env": "SNOWFLAKE_PAT",
"semantic_view": "DB.SCHEMA.VIEW"
}
}
To send CLI commands (e.g. /compact, /clear):
{"text": "/compact"}
Supported tools
| Tool | Adapter | tool value |
|---|---|---|
| Claude Code | ClaudeCodeAdapter |
"claude" / "claude-code" |
| OpenAI Codex | CodexAdapter |
"codex" |
| Gemini CLI | GeminiAdapter |
"gemini" |
| Snowflake Cortex | CortexAdapter |
"cortex" |
| Any CLI | GenericAdapter |
"generic" |
Configuration reference (ohwise-lab-ctrl)
When using ohwise-lab alongside ai-relay, all settings are controlled via environment variables — nothing is hardcoded:
| Variable | Default | Description |
|---|---|---|
LAB_MODE |
single |
single = subprocesses in lab-ctrl; multi = per-user Docker containers |
LAB_WORKSPACE_ROOT |
/var/ohwise-lab-workspaces |
Host path for user workspace volumes |
LAB_IMAGE |
ohwise-lab-ctrl:local |
Docker image for user containers (must have ai-relay + CLIs installed) |
LAB_NETWORK |
lab-network |
Docker network user containers join. For Compose: <project>_default |
LAB_CONTAINER_PORT |
9000 |
Internal port ai-relay serve listens on inside user containers |
LAB_CONTAINER_USER |
labuser |
OS user inside user containers |
LAB_CONTAINER_HOME |
/home/<LAB_CONTAINER_USER> |
Home dir inside user containers |
LAB_CONTAINER_STARTUP_DELAY |
1.5 |
Seconds to wait for ai-relay to be ready after container start |
LAB_CONTAINER_WS_TIMEOUT |
15 |
Seconds to wait for WebSocket connection to user container |
LAB_IDLE_TIMEOUT_SECS |
1800 |
Seconds of inactivity before a user container is eligible for cleanup |
Python API
from ai_relay import RelayServer
server = RelayServer(host="0.0.0.0", port=8765)
server.run()
Changelog
See CHANGELOG.md for release notes.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_relay-0.4.2.tar.gz.
File metadata
- Download URL: ai_relay-0.4.2.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b58480fce466cd7e6ecf82764609e9311df1c5f6685ae98b97f921873d873428
|
|
| MD5 |
214791667b452cf5e4d27d769177663a
|
|
| BLAKE2b-256 |
de94486f04c034cbb40c55a1109367d3bb8ffcd3b6e5d0f77dae45d8ecaf2575
|
File details
Details for the file ai_relay-0.4.2-py3-none-any.whl.
File metadata
- Download URL: ai_relay-0.4.2-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edb0e29265fccaca576f3cfa434bef264041283adc28c19cfed13d07f0edc032
|
|
| MD5 |
7c31e542d39110d6c5ed21ef49b6ed99
|
|
| BLAKE2b-256 |
6885344e47cec6c14e07b678dafe1e1649c3238e1ba98fa9bf4389fdf2a13959
|