Skip to main content

Local OpenAI-compatible HTTP proxy backed by Codex CLI

Project description

codex-api-proxy

Local OpenAI-compatible HTTP proxy backed by local Codex credentials.

This project exposes a minimal /v1/chat/completions API for local automation. By default, requests are executed through codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral, using the local Codex installation and its existing authentication.

Safety

The proxy defaults to 127.0.0.1 and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.

Set CODEX_PROXY_API_KEY to require Authorization: Bearer <key> on API requests.

If you start with --host 0.0.0.0 or another non-loopback bind address without --api-key, codex-api-proxy prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.

With the default exec engine, Codex subprocesses are launched with --ignore-user-config and --ignore-rules. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.

Codex subprocesses also use --sandbox read-only and --ephemeral by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.

The experimental app-server engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in messages. The app-server process uses an isolated CODEX_HOME at ~/.codex-api-proxy/codex-home by default. codex-api-proxy symlinks only the current Codex auth.json into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's config.toml, MCP config, or plugins. The app-server process is also started with --disable apps, --disable plugins, --disable skill_mcp_dependency_install, and -c mcp_servers={}. To keep skills out of the model-visible prompt, codex-api-proxy generates a skills.config=[{name=...,enabled=false}] override for known system skills and locally discovered skill names. Each request uses an empty dynamicTools list, empty environments, approvalPolicy: never, sandbox: read-only, and ephemeral: true by default.

Install

pip3 install codex-api-proxy

For local development from this checkout:

python3 -m pip install -e '.[dev]'

Make targets are available for local build and release tasks:

make build-tools
make test
make build
make release-check
make publish VERSION=0.1.1

make publish VERSION=... first syncs that version into pyproject.toml and src/codex_api_proxy/__init__.py, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.

Run

Start in the background:

codex-api-proxy start

By default, the service listens on 127.0.0.1:8765. The default Codex working directory is an empty workspace at ~/.codex-api-proxy/workspace.

Bind to all interfaces:

codex-api-proxy start --host 0.0.0.0

Check status:

codex-api-proxy status

Show saved runtime settings:

codex-api-proxy status --verbose

Restart with the last successful start settings:

codex-api-proxy restart

Restart and override one setting:

codex-api-proxy restart --proxy=http://127.0.0.1:8118

Start with faster defaults:

codex-api-proxy start --fast

Start with experimental long-lived app-server workers:

codex-api-proxy start --engine app-server --workers 2

Start with an outbound proxy, faster defaults, and multiple app-server workers:

codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4

Stop:

codex-api-proxy stop

Run in the foreground for debugging:

codex-api-proxy start --foreground

Configuration

CLI options:

  • --host: bind host, default 127.0.0.1
  • --port: bind port, default 8765
  • --api-key: require bearer auth
  • --codex-bin: Codex executable, default codex
  • --proxy: proxy URL passed to Codex as http_proxy and https_proxy
  • --model: model passed to Codex
  • --engine: execution engine, exec or app-server, default exec
  • --workers: number of long-lived app-server workers, default 1
  • --max-queue-size: maximum queued app-server requests before returning 429, default 64
  • --queue-timeout-seconds: maximum time to wait for an app-server worker, default 30
  • --app-server-codex-home: isolated CODEX_HOME used by app-server workers, default ~/.codex-api-proxy/codex-home
  • --codex-config: Codex config override passed as -c key=value, repeatable
  • --ephemeral: run codex exec with --ephemeral, enabled by default
  • --fast: use fast defaults: --codex-config model_reasoning_effort="low"
  • --default-cwd: default Codex working directory, default ~/.codex-api-proxy/workspace
  • --allowed-root: allowed cwd root, repeatable, default --default-cwd
  • --timeout-seconds: per-request timeout, default 300
  • --max-concurrency: maximum concurrent Codex executions, default 1
  • --log-level: Uvicorn log level, one of debug, info, warning, or error, default info
  • --pid-file: daemon pid file, default ~/.codex-api-proxy/codex-api-proxy.pid
  • --log-file: daemon log file for start, default ~/.codex-api-proxy/codex-api-proxy.log
  • --state-file: daemon state file, default ~/.codex-api-proxy/codex-api-proxy.state.json

start prints the state file path and the effective startup parameters. The state file is written with 0600 permissions and is used by restart to reuse the previous start settings. If --api-key is used, the key is redacted in terminal output but stored in the state file so restart can reuse it.

Environment variables are also supported when running the FastAPI app directly:

  • CODEX_PROXY_HOST: bind host, default 127.0.0.1
  • CODEX_PROXY_PORT: bind port, default 8765
  • CODEX_PROXY_API_KEY: optional bearer token
  • CODEX_PROXY_CODEX_BIN: Codex executable, default codex
  • CODEX_PROXY_PROXY: proxy URL passed to Codex
  • CODEX_PROXY_MODEL: model passed to Codex
  • CODEX_PROXY_ENGINE: execution engine, exec or app-server, default exec
  • CODEX_PROXY_WORKERS: number of long-lived app-server workers, default 1
  • CODEX_PROXY_MAX_QUEUE_SIZE: maximum queued app-server requests, default 64
  • CODEX_PROXY_QUEUE_TIMEOUT_SECONDS: maximum time to wait for an app-server worker, default 30
  • CODEX_PROXY_APP_SERVER_CODEX_HOME: isolated CODEX_HOME used by app-server workers
  • CODEX_PROXY_CODEX_CONFIGS: ;;-separated Codex config overrides passed as repeated -c
  • CODEX_PROXY_EPHEMERAL: set to 1, true, or yes to run codex exec with --ephemeral; defaults to true
  • CODEX_PROXY_DEFAULT_CWD: default Codex working directory, default current directory
  • CODEX_PROXY_ALLOWED_ROOTS: colon-separated allowed cwd roots, default CODEX_PROXY_DEFAULT_CWD
  • CODEX_PROXY_TIMEOUT_SECONDS: per-request timeout, default 300
  • CODEX_PROXY_MAX_CONCURRENCY: maximum concurrent Codex executions, default 1
  • CODEX_PROXY_LOG_LEVEL: Uvicorn log level, default info

API

Health:

curl -sS http://127.0.0.1:8765/health

Models:

curl -sS http://127.0.0.1:8765/v1/models

Readiness:

curl -sS http://127.0.0.1:8765/ready

Local counters:

curl -sS http://127.0.0.1:8765/metrics

Chat completion:

curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'

Streaming chat completion:

curl -N http://127.0.0.1:8765/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'

Streaming responses use OpenAI-compatible SSE events:

  • data: {"object":"chat.completion.chunk",...} for assistant chunks
  • data: [DONE] when the response is complete

With the default exec engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through codex exec --json; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.

With --engine app-server, the proxy maps Codex item/agentMessage/delta notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.

Compatibility

codex-api-proxy is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.

Supported:

  • GET /v1/models
  • POST /v1/chat/completions
  • model
  • messages
  • stream
  • metadata.cwd for request-scoped working directory selection inside --allowed-root
  • OpenAI-compatible non-streaming response envelope
  • OpenAI-compatible SSE chunk envelope for streaming responses

Accepted but currently ignored:

  • temperature
  • top_p
  • max_tokens
  • presence_penalty
  • frequency_penalty

Not supported:

  • tools and tool_choice
  • response_format
  • n greater than one
  • stop
  • embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
  • accurate token usage; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path

The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in messages; codex-api-proxy does not preserve conversation state between API requests.

OpenAI Python SDK smoke test:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")

response = client.chat.completions.create(
    model="codex-local",
    messages=[{"role": "user", "content": "Reply with exactly: pong"}],
)
print(response.choices[0].message.content)

When no --api-key is configured, most OpenAI SDKs still require a placeholder api_key; any non-empty value is fine.

Operations

Use /health for a lightweight process check and /ready for a readiness check that includes the selected engine and Codex executable availability. Use /metrics for local JSON counters:

  • requests_total
  • requests_ok
  • requests_error
  • errors_by_status
  • engine
  • uptime_seconds
  • app_server_pool_started

Daemon logs are written to ~/.codex-api-proxy/codex-api-proxy.log by default. codex-api-proxy does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.

Latency logs:

Each chat completion writes a single-line JSON log with logger codex_api_proxy.latency and event chat_completion_latency. Streaming responses also write chat_completion_first_sse when the first SSE chunk is yielded.

For background daemon runs, inspect:

rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log

Important fields:

  • request_id: correlates latency lines for the same request
  • stream: whether the request used stream: true
  • engine: exec or app-server
  • phases_ms.cwd_resolve: cwd validation time
  • phases_ms.prompt_build: OpenAI messages to Codex prompt conversion time
  • phases_ms.queue_wait: time waiting for local admission before engine execution
  • phases_ms.codex_exec: time spent inside codex exec
  • phases_ms.app_server_exec: time spent inside the app-server worker turn
  • phases_ms.codex_command_build: Codex command construction time
  • phases_ms.codex_process_spawn: local subprocess spawn time
  • phases_ms.codex_stdin_write: prompt write and stdin close time
  • phases_ms.codex_first_stdout_event: elapsed time from Codex IO start until the first non-empty stdout JSONL line
  • phases_ms.codex_first_assistant_event: elapsed time from Codex IO start until the first assistant message event
  • phases_ms.codex_stdout_read: total time spent reading Codex stdout until EOF
  • phases_ms.codex_process_wait: time waiting for the Codex process after stdout EOF
  • phases_ms.codex_communicate: total Codex subprocess IO time
  • phases_ms.codex_output_parse: Codex JSONL final-message parse time
  • phases_ms.response_build: response object/SSE setup time
  • phases_ms.total: total server-side request time before response is ready
  • time_to_first_sse_ms: stream request time until the first SSE chunk is yielded
  • time_to_first_content_sse_ms: app-server stream request time until the first content chunk is yielded

With auth:

curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H 'Authorization: Bearer local-secret' \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_api_proxy-0.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codex_api_proxy-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file codex_api_proxy-0.1.0.tar.gz.

File metadata

  • Download URL: codex_api_proxy-0.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for codex_api_proxy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5aa1974f236715bbf9295e93f7ea217c0bbb1e3dbdb19b4e845ffe096cf36bb7
MD5 560b6bc7a9dd17376dae9fe1c53dd1a0
BLAKE2b-256 f7dbe61172495c5bc0e83a035cbc4e27bbbca5d10af4b2eeda0659540cf3c4da

See more details on using hashes here.

File details

Details for the file codex_api_proxy-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for codex_api_proxy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f2e16fe0b6db49946554a2382c532de5ce85e27c9bf4a03afd5ad23b39d4e6b
MD5 b162e5761e4b94824610eeb2d0203838
BLAKE2b-256 963efbb1f3b7a2db4552fc80eaf5bcf680693c13f3a44bfa8260c4257a56293f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page