Local OpenAI-compatible HTTP proxy backed by Codex CLI
Project description
codex-api-proxy
Local OpenAI-compatible HTTP proxy backed by local Codex credentials.
This project exposes a minimal /v1/chat/completions API for local automation. By default, requests are executed through codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral, using the local Codex installation and its existing authentication.
Safety
The proxy defaults to 127.0.0.1 and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.
Set CODEX_PROXY_API_KEY to require Authorization: Bearer <key> on API requests.
If you start with --host 0.0.0.0 or another non-loopback bind address without --api-key, codex-api-proxy prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.
With the default exec engine, Codex subprocesses are launched with --ignore-user-config and --ignore-rules. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.
Codex subprocesses also use --sandbox read-only and --ephemeral by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
The experimental app-server engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in messages. The app-server process uses an isolated CODEX_HOME at ~/.codex-api-proxy/codex-home by default. codex-api-proxy symlinks only the current Codex auth.json into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's config.toml, MCP config, or plugins. The app-server process is also started with --disable apps, --disable plugins, --disable skill_mcp_dependency_install, and -c mcp_servers={}. To keep skills out of the model-visible prompt, codex-api-proxy generates a skills.config=[{name=...,enabled=false}] override for known system skills and locally discovered skill names. Each request uses an empty dynamicTools list, empty environments, approvalPolicy: never, sandbox: read-only, and ephemeral: true by default.
Install
pip3 install codex-api-proxy
For local development from this checkout:
python3 -m pip install -e '.[dev]'
Make targets are available for local build and release tasks:
make build-tools
make test
make build
make release-check
make publish VERSION=0.1.1
make publish VERSION=... first syncs that version into pyproject.toml and src/codex_api_proxy/__init__.py, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.
Run
Start in the background:
codex-api-proxy start
By default, the service listens on 127.0.0.1:8765.
The default Codex working directory is an empty workspace at ~/.codex-api-proxy/workspace.
Bind to all interfaces:
codex-api-proxy start --host 0.0.0.0
Check status:
codex-api-proxy status
Show saved runtime settings:
codex-api-proxy status --verbose
Restart with the last successful start settings:
codex-api-proxy restart
Restart and override one setting:
codex-api-proxy restart --proxy=http://127.0.0.1:8118
Start with faster defaults:
codex-api-proxy start --fast
Start with experimental long-lived app-server workers:
codex-api-proxy start --engine app-server --workers 2
Start with an outbound proxy, faster defaults, and multiple app-server workers:
codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
Stop:
codex-api-proxy stop
Run in the foreground for debugging:
codex-api-proxy start --foreground
Configuration
CLI options:
--host: bind host, default127.0.0.1--port: bind port, default8765--api-key: require bearer auth--codex-bin: Codex executable, defaultcodex--proxy: proxy URL passed to Codex ashttp_proxyandhttps_proxy--model: model passed to Codex--engine: execution engine,execorapp-server, defaultexec--workers: number of long-livedapp-serverworkers, default1--max-queue-size: maximum queuedapp-serverrequests before returning429, default64--queue-timeout-seconds: maximum time to wait for anapp-serverworker, default30--app-server-codex-home: isolatedCODEX_HOMEused byapp-serverworkers, default~/.codex-api-proxy/codex-home--codex-config: Codex config override passed as-c key=value, repeatable--ephemeral: runcodex execwith--ephemeral, enabled by default--fast: use fast defaults:--codex-config model_reasoning_effort="low"--default-cwd: default Codex working directory, default~/.codex-api-proxy/workspace--allowed-root: allowed cwd root, repeatable, default--default-cwd--timeout-seconds: per-request timeout, default300--max-concurrency: maximum concurrent Codex executions, default1--log-level: Uvicorn log level, one ofdebug,info,warning, orerror, defaultinfo--pid-file: daemon pid file, default~/.codex-api-proxy/codex-api-proxy.pid--log-file: daemon log file forstart, default~/.codex-api-proxy/codex-api-proxy.log--state-file: daemon state file, default~/.codex-api-proxy/codex-api-proxy.state.json
start prints the state file path and the effective startup parameters. The state file is written with 0600 permissions and is used by restart to reuse the previous start settings. If --api-key is used, the key is redacted in terminal output but stored in the state file so restart can reuse it.
Environment variables are also supported when running the FastAPI app directly:
CODEX_PROXY_HOST: bind host, default127.0.0.1CODEX_PROXY_PORT: bind port, default8765CODEX_PROXY_API_KEY: optional bearer tokenCODEX_PROXY_CODEX_BIN: Codex executable, defaultcodexCODEX_PROXY_PROXY: proxy URL passed to CodexCODEX_PROXY_MODEL: model passed to CodexCODEX_PROXY_ENGINE: execution engine,execorapp-server, defaultexecCODEX_PROXY_WORKERS: number of long-livedapp-serverworkers, default1CODEX_PROXY_MAX_QUEUE_SIZE: maximum queuedapp-serverrequests, default64CODEX_PROXY_QUEUE_TIMEOUT_SECONDS: maximum time to wait for anapp-serverworker, default30CODEX_PROXY_APP_SERVER_CODEX_HOME: isolatedCODEX_HOMEused byapp-serverworkersCODEX_PROXY_CODEX_CONFIGS:;;-separated Codex config overrides passed as repeated-cCODEX_PROXY_EPHEMERAL: set to1,true, oryesto runcodex execwith--ephemeral; defaults totrueCODEX_PROXY_DEFAULT_CWD: default Codex working directory, default current directoryCODEX_PROXY_ALLOWED_ROOTS: colon-separated allowed cwd roots, defaultCODEX_PROXY_DEFAULT_CWDCODEX_PROXY_TIMEOUT_SECONDS: per-request timeout, default300CODEX_PROXY_MAX_CONCURRENCY: maximum concurrent Codex executions, default1CODEX_PROXY_LOG_LEVEL: Uvicorn log level, defaultinfo
API
Health:
curl -sS http://127.0.0.1:8765/health
Models:
curl -sS http://127.0.0.1:8765/v1/models
Readiness:
curl -sS http://127.0.0.1:8765/ready
Local counters:
curl -sS http://127.0.0.1:8765/metrics
Chat completion:
curl -sS http://127.0.0.1:8765/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
Streaming chat completion:
curl -N http://127.0.0.1:8765/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
Streaming responses use OpenAI-compatible SSE events:
data: {"object":"chat.completion.chunk",...}for assistant chunksdata: [DONE]when the response is complete
With the default exec engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through codex exec --json; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.
With --engine app-server, the proxy maps Codex item/agentMessage/delta notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.
Compatibility
codex-api-proxy is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.
Supported:
GET /v1/modelsPOST /v1/chat/completionsmodelmessagesstreammetadata.cwdfor request-scoped working directory selection inside--allowed-root- OpenAI-compatible non-streaming response envelope
- OpenAI-compatible SSE chunk envelope for streaming responses
Accepted but currently ignored:
temperaturetop_pmax_tokenspresence_penaltyfrequency_penalty
Not supported:
toolsandtool_choiceresponse_formatngreater than onestop- embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
- accurate token
usage; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path
The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in messages; codex-api-proxy does not preserve conversation state between API requests.
OpenAI Python SDK smoke test:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")
response = client.chat.completions.create(
model="codex-local",
messages=[{"role": "user", "content": "Reply with exactly: pong"}],
)
print(response.choices[0].message.content)
When no --api-key is configured, most OpenAI SDKs still require a placeholder api_key; any non-empty value is fine.
Operations
Use /health for a lightweight process check and /ready for a readiness check that includes the selected engine and Codex executable availability. Use /metrics for local JSON counters:
requests_totalrequests_okrequests_errorerrors_by_statusengineuptime_secondsapp_server_pool_started
Daemon logs are written to ~/.codex-api-proxy/codex-api-proxy.log by default. codex-api-proxy does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.
Latency logs:
Each chat completion writes a single-line JSON log with logger codex_api_proxy.latency and event chat_completion_latency. Streaming responses also write chat_completion_first_sse when the first SSE chunk is yielded.
For background daemon runs, inspect:
rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
Important fields:
request_id: correlates latency lines for the same requeststream: whether the request usedstream: trueengine:execorapp-serverphases_ms.cwd_resolve: cwd validation timephases_ms.prompt_build: OpenAI messages to Codex prompt conversion timephases_ms.queue_wait: time waiting for local admission before engine executionphases_ms.codex_exec: time spent insidecodex execphases_ms.app_server_exec: time spent inside the app-server worker turnphases_ms.codex_command_build: Codex command construction timephases_ms.codex_process_spawn: local subprocess spawn timephases_ms.codex_stdin_write: prompt write and stdin close timephases_ms.codex_first_stdout_event: elapsed time from Codex IO start until the first non-empty stdout JSONL linephases_ms.codex_first_assistant_event: elapsed time from Codex IO start until the first assistant message eventphases_ms.codex_stdout_read: total time spent reading Codex stdout until EOFphases_ms.codex_process_wait: time waiting for the Codex process after stdout EOFphases_ms.codex_communicate: total Codex subprocess IO timephases_ms.codex_output_parse: Codex JSONL final-message parse timephases_ms.response_build: response object/SSE setup timephases_ms.total: total server-side request time before response is readytime_to_first_sse_ms: stream request time until the first SSE chunk is yieldedtime_to_first_content_sse_ms: app-server stream request time until the first content chunk is yielded
With auth:
curl -sS http://127.0.0.1:8765/v1/chat/completions \
-H 'Authorization: Bearer local-secret' \
-H 'Content-Type: application/json' \
-d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codex_api_proxy-0.1.0.tar.gz.
File metadata
- Download URL: codex_api_proxy-0.1.0.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5aa1974f236715bbf9295e93f7ea217c0bbb1e3dbdb19b4e845ffe096cf36bb7
|
|
| MD5 |
560b6bc7a9dd17376dae9fe1c53dd1a0
|
|
| BLAKE2b-256 |
f7dbe61172495c5bc0e83a035cbc4e27bbbca5d10af4b2eeda0659540cf3c4da
|
File details
Details for the file codex_api_proxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codex_api_proxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f2e16fe0b6db49946554a2382c532de5ce85e27c9bf4a03afd5ad23b39d4e6b
|
|
| MD5 |
b162e5761e4b94824610eeb2d0203838
|
|
| BLAKE2b-256 |
963efbb1f3b7a2db4552fc80eaf5bcf680693c13f3a44bfa8260c4257a56293f
|