Skip to main content

Local Responses proxy for OpenAI Codex CLI: folds gpt-5.5 518n-2 reasoning truncation (516 degradation) via the official openai_base_url wiring — no provider change, WebSocket-first, no fallback noise.

Project description

codex-516-guard

PyPI Python License: MIT

English · 简体中文

A tiny local Responses proxy for the OpenAI Codex CLI that cures the gpt-5.5 "516" reasoning-truncation degradation — while leaving your model_provider untouched, so session grouping, remote compaction and remote-control keep working.

uv tool install codex-516-guard      # install
codex-516-guard                      # run (127.0.0.1:8787)
# then add one line to ~/.codex/config.toml:  openai_base_url = "http://127.0.0.1:8787/v1"

Credits. The detection-and-continue idea comes from neteroster/CodexCont (MIT) — thank you. This project is an independent, from-scratch implementation that keeps the built-in provider intact; see Differences.


The problem: gpt-5.5 "516" degradation

On the OpenAI Codex CLI, gpt-5.5's reasoning sometimes gets cut short at a very specific token count — reasoning_tokens == 518 * n − 2 (i.e. 516, 1034, 1552, …). When a turn lands on that fingerprint, the model stops thinking early and the answer quality drops sharply. It is an upstream issue with no official fix (openai/codex#30364).

codex-516-guard sits on 127.0.0.1 between Codex and the upstream Responses API. When it sees a turn truncate on the 518n−2 fingerprint, it makes the model keep thinking and folds the extra rounds into a single downstream response, so Codex sees one clean, complete answer.

How it works

The proxy streams every upstream round and runs a small state machine (guard/fold.py):

  1. Detect. At the end of each round it reads usage.output_tokens_details.reasoning_tokens. If it equals 518n − 2 (with 1 ≤ n ≤ 6, and at most 3 continuation rounds), the round was truncated.
  2. Continue. It discards that round's tentative output (the message / tool calls — they were produced on truncated thinking), then replays the round's reasoning items (including encrypted_content) plus a single phase:"commentary" assistant message ("Continue thinking...") as the next round's input. That nudges the model to resume reasoning where it left off.
  3. Fold. Reasoning is streamed live to Codex the whole time; only the clean final round's output is flushed. The terminal event is rebuilt as if the whole thing were one response — input/cached come from round 1 (so it never looks like a blown context window), reasoning is summed, and the true cumulative cost is recorded under metadata.proxy_billed_usage.

Wiring: why the built-in provider stays intact

Codex is pointed at the proxy with one top-level config key, not a new provider:

# ~/.codex/config.toml  (top level, before the first [table])
openai_base_url = "http://127.0.0.1:8787/v1"

openai_base_url overrides the base URL of the built-in openai provider in place. This is the officially supported key (openai/codex#16719; the same-name [model_providers.openai] override is rejected by the maintainers, and the OPENAI_BASE_URL env var was removed). Because the provider id stays openai:

  • your conversation history is not re-bucketed/hidden by provider,
  • remote compaction keeps working (supports_remote_compaction stays true),
  • remote-control is unaffected (it uses the separate chatgpt_base_url).

Differences from CodexCont

The 518n−2 detection + fold-continuation mechanism is CodexCont's idea; the implementation here is new and diverges on a few deliberate points:

codex-516-guard CodexCont
Codex wiring top-level openai_base_url (built-in provider unchanged) a new [model_providers] entry (history hidden per-provider, remote-control unusable, remote compaction lost)
Downstream transport WebSocket-first — full responses_websockets protocol, plus SSE fallback SSE only (Codex tries ws → 405 → ~5 reconnect warnings per session, then falls back)
zstd request bodies (0.142.x built-in provider) decompressed natively, no Codex config change needs [features] enable_request_compression = false
GET /v1/models (model-catalog refresh) passed through (/v1/*) not proxied (silently fails, relies on cache)
Continuation commentary method only commentary + legacy tool-pair + cross-turn repair, more knobs

Install

Requires uv (which manages Python for you) and the Codex CLI (ChatGPT OAuth login; tested on 0.142.x).

uv tool install codex-516-guard          # from PyPI
# or straight from source:
# uv tool install git+https://github.com/dzshzx/codex-516-guard

uv puts the executable in its bin dir (~/.local/bin on Unix/macOS; on Windows run where.exe codex-516-guard; uv tool update-shell adds it to PATH). Then:

codex-516-guard                          # run in foreground (default 127.0.0.1:8787)
codex-516-guard --port 8790 --log-level debug

Wire Codex to it (one line in ~/.codex/config.toml, see above), and you're done. Disable by commenting out the openai_base_url line and stopping the proxy. (If the key stays but the proxy is down, Codex errors on an unreachable upstream.)

Upgrade / uninstall: uv tool upgrade codex-516-guard / uv tool uninstall codex-516-guard.

Ports

The proxy's port must match the port in Codex's openai_base_url. If the default port (8787) is busy, the proxy exits with a clear message rather than drifting — a wired proxy that silently binds another port would just be unreachable. To use a different port, pass --port N and set openai_base_url to the same N.

--auto-port is for interactive one-off runs only: on a conflict it scans for the next free port and prints which openai_base_url to use. Don't use it for a wired service.

Autostart (optional, off by default)

Installing registers no autostart — it's entirely your choice.

codex-516-guard install-service     # register + start (current platform)
codex-516-guard uninstall-service   # remove

install-service picks the per-user, runs-in-your-session mechanism (a system service runs in a session with no user environment and can't reach the uv executable or your proxy settings under your profile):

  • Linux / WSL → a systemd user unit (~/.config/systemd/user/). Run loginctl enable-linger once to start it at boot without logging in. Manual equivalent: see systemd/codex-516-guard.service.example.
  • macOS → a launchd LaunchAgent in ~/Library/LaunchAgents/ (starts at login, in your GUI session). Load with launchctl bootstrap gui/$(id -u) <plist> / launchctl kickstart -k …; remove with launchctl bootout ….
  • Windowsprints manual steps, registers nothing (see below).

Windows autostart is manual — on purpose

A program that writes an autostart entry (Startup VBS / Run key / scheduled task) and launches a hidden process trips behavioral antivirus as trojan-like persistence — Kaspersky's proactive-defense module flags the launching python.exe as PDM:Trojan.Win32.Generic. A user-created Startup shortcut is trusted by the same AV.

So this package ships a windowless launcher, codex-516-guardw (a Windows GUI-subsystem exe — no console window at logon), and install-service just tells you how to point a shortcut at it:

  1. Win+Rshell:startup (opens the Startup folder).
  2. New → Shortcut → target = the path from where.exe codex-516-guardw (append --port N if you use a custom port).

Delete the shortcut to disable it.

Mirrored-networking shortcut (WSL ↔ Windows)

If your WSL2 uses networkingMode=mirrored, Windows and WSL share 127.0.0.1. Then you only need one proxy on either side — run it in WSL (as a systemd service), and on the Windows side just add the openai_base_url line to ~/.codex/config.toml pointing at the same 127.0.0.1:8787. No second proxy or Windows autostart needed (the only cost is that Windows Codex depends on the WSL proxy being up).

Verify

curl -sS http://127.0.0.1:8787/healthz            # {"ok":true,...}
journalctl --user -u codex-516-guard -f | grep -E 'round|done'   # Linux/WSL

A live fold looks like this (two chained 516s beaten, answer correct):

round 1: in=21550 out=664 reason=516 total=22214 | n=1 buffered=['function_call'] -> continue
round 2: in=22078 out=652 reason=516 total=22730 | n=1 buffered=['function_call'] -> continue
round 3: in=22606 out=566 reason=291 total=23172 | n=None buffered=[...] -> clean
done: 3 round(s) | ... | status=completed stop=natural

Develop

git clone https://github.com/dzshzx/codex-516-guard && cd codex-516-guard
uv sync
uv run python test_fold.py        # fold state-machine self-test → ALL PASS
uv run codex-516-guard            # run locally

Releases go out via PyPI Trusted Publishing (.github/workflows/release.yml, OIDC, no stored token): push a v* tag and it builds + publishes automatically.

Layout:

  • guard/fold.py — fingerprint detection + fold state machine (transport-agnostic; covered by test_fold.py).
  • guard/server.py — starlette transport: ws / SSE downstream, SSE upstream, zstd/gzip request decompression, /v1/* passthrough.
  • guard/cli.py — CLI entry (codex-516-guard; loopback only; auth passthrough, stores no credentials).

Security & disclaimer

  • The proxy is auth passthrough only: it forwards Codex's Authorization header and never reads, stores, or logs any credential.
  • It listens on the loopback address only — do not expose it on a non-loopback interface.
  • Unofficial: it depends on upstream behavior that isn't a public contract (the truncation fingerprint, the ws frame format). An OpenAI-side change may break it. Use at your own risk.
  • Continuation spends extra real tokens (see metadata.proxy_billed_usage); the guard bounds this with an n window and a 3-round cap.

Community

Built for and shared with the LINUX DO community, where the gpt-5.5 "516" degradation was diagnosed and discussed. Feedback and issues welcome there and on GitHub Issues.

License

MIT. Fully open source, no closed parts.

Mechanism credit: neteroster/CodexCont (MIT) — this project reuses its 518n−2 detect-and-continue idea with an independent, from-scratch implementation, and keeps the built-in provider intact (see Differences). CodexCont's MIT copyright notice is retained in LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_516_guard-0.2.8.tar.gz (56.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codex_516_guard-0.2.8-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file codex_516_guard-0.2.8.tar.gz.

File metadata

  • Download URL: codex_516_guard-0.2.8.tar.gz
  • Upload date:
  • Size: 56.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for codex_516_guard-0.2.8.tar.gz
Algorithm Hash digest
SHA256 71e34ffd95e0cf14751794bd8d0f163cdba31a4698f61192316dae3794f3cda9
MD5 1fc204f138f3bb7c29f4889f80d09cb2
BLAKE2b-256 149ab6127adcc785b7725337b6fb60b2e3d6fa1c8e2879bf382e58b2a56a6c1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for codex_516_guard-0.2.8.tar.gz:

Publisher: release.yml on dzshzx/codex-516-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codex_516_guard-0.2.8-py3-none-any.whl.

File metadata

File hashes

Hashes for codex_516_guard-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a7e6fd736ea647037ffb7b18124cec7d073708f9986ad0168f699a87c37c8760
MD5 f9896c4813939b04741bfd87759fe63d
BLAKE2b-256 6f8377dad1be58c38fdd295f2512a06dd5cfb849a09881bad1520876ce8fe939

See more details on using hashes here.

Provenance

The following attestation bundles were made for codex_516_guard-0.2.8-py3-none-any.whl:

Publisher: release.yml on dzshzx/codex-516-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page