Local Responses proxy for OpenAI Codex CLI: folds gpt-5.5 518n-2 reasoning truncation (516 degradation) via the official openai_base_url wiring — no provider change, WebSocket-first, no fallback noise.
Project description
codex-516-guard
English · 简体中文
A tiny local Responses proxy for the OpenAI Codex CLI that cures the gpt-5.5
"516" reasoning-truncation degradation — while leaving your model_provider
untouched, so session grouping, remote compaction and remote-control keep working.
uv tool install codex-516-guard # install
codex-516-guard # run (127.0.0.1:8787)
# then add one line to ~/.codex/config.toml: openai_base_url = "http://127.0.0.1:8787/v1"
Credits. The detection-and-continue idea comes from neteroster/CodexCont (MIT) — thank you. This project is an independent, from-scratch implementation that keeps the built-in provider intact; see Differences.
The problem: gpt-5.5 "516" degradation
On the OpenAI Codex CLI, gpt-5.5's reasoning sometimes gets cut short at a very
specific token count — reasoning_tokens == 518 * n − 2 (i.e. 516, 1034, 1552, …).
When a turn lands on that fingerprint, the model stops thinking early and the answer
quality drops sharply. It is an upstream issue with no official fix
(openai/codex#30364).
codex-516-guard sits on 127.0.0.1 between Codex and the upstream Responses API.
When it sees a turn truncate on the 518n−2 fingerprint, it makes the model keep
thinking and folds the extra rounds into a single downstream response, so Codex
sees one clean, complete answer.
How it works
The proxy streams every upstream round and runs a small state machine (guard/fold.py):
- Detect. At the end of each round it reads
usage.output_tokens_details.reasoning_tokens. If it equals518n − 2(with1 ≤ n ≤ 6, and at most 3 continuation rounds), the round was truncated. - Continue. It discards that round's tentative output (the message / tool calls —
they were produced on truncated thinking), then replays the round's reasoning items
(including
encrypted_content) plus a singlephase:"commentary"assistant message ("Continue thinking...") as the next round's input. That nudges the model to resume reasoning where it left off. - Fold. Reasoning is streamed live to Codex the whole time; only the clean final
round's output is flushed. The terminal event is rebuilt as if the whole thing were
one response —
input/cachedcome from round 1 (so it never looks like a blown context window), reasoning is summed, and the true cumulative cost is recorded undermetadata.proxy_billed_usage.
Wiring: why the built-in provider stays intact
Codex is pointed at the proxy with one top-level config key, not a new provider:
# ~/.codex/config.toml (top level, before the first [table])
openai_base_url = "http://127.0.0.1:8787/v1"
openai_base_url overrides the base URL of the built-in openai provider in place.
This is the officially supported key
(openai/codex#16719; the same-name
[model_providers.openai] override is rejected by the maintainers, and the
OPENAI_BASE_URL env var was removed). Because the provider id stays openai:
- your conversation history is not re-bucketed/hidden by provider,
- remote compaction keeps working (
supports_remote_compactionstays true), - remote-control is unaffected (it uses the separate
chatgpt_base_url).
Differences from CodexCont
The 518n−2 detection + fold-continuation mechanism is CodexCont's idea; the implementation here is new and diverges on a few deliberate points:
| codex-516-guard | CodexCont | |
|---|---|---|
| Codex wiring | top-level openai_base_url (built-in provider unchanged) |
a new [model_providers] entry (history hidden per-provider, remote-control unusable, remote compaction lost) |
| Downstream transport | WebSocket-first — full responses_websockets protocol, plus SSE fallback |
SSE only (Codex tries ws → 405 → ~5 reconnect warnings per session, then falls back) |
| zstd request bodies (0.142.x built-in provider) | decompressed natively, no Codex config change | needs [features] enable_request_compression = false |
GET /v1/models (model-catalog refresh) |
passed through (/v1/*) |
not proxied (silently fails, relies on cache) |
| Continuation | commentary method only | commentary + legacy tool-pair + cross-turn repair, more knobs |
Install
Requires uv (which manages Python for you) and the Codex CLI (ChatGPT OAuth login; tested on 0.142.x).
uv tool install codex-516-guard # from PyPI
# or straight from source:
# uv tool install git+https://github.com/dzshzx/codex-516-guard
uv puts the executable in its bin dir (~/.local/bin on Unix/macOS; on Windows run
where.exe codex-516-guard; uv tool update-shell adds it to PATH). Then:
codex-516-guard # run in foreground (default 127.0.0.1:8787)
codex-516-guard --port 8790 --log-level debug
Wire Codex to it (one line in ~/.codex/config.toml, see above), and you're done.
Disable by commenting out the openai_base_url line and stopping the proxy. (If the
key stays but the proxy is down, Codex errors on an unreachable upstream.)
Upgrade / uninstall: uv tool upgrade codex-516-guard / uv tool uninstall codex-516-guard.
Ports
The proxy's port must match the port in Codex's openai_base_url. If the default
port (8787) is busy, the proxy exits with a clear message rather than drifting — a
wired proxy that silently binds another port would just be unreachable. To use a
different port, pass --port N and set openai_base_url to the same N.
--auto-port is for interactive one-off runs only: on a conflict it scans for the next
free port and prints which openai_base_url to use. Don't use it for a wired service.
Autostart (optional, off by default)
Installing registers no autostart — it's entirely your choice.
codex-516-guard install-service # register + start (current platform)
codex-516-guard uninstall-service # remove
install-service picks the per-user, runs-in-your-session mechanism (a system service
runs in a session with no user environment and can't reach the uv executable or your
proxy settings under your profile):
- Linux / WSL → a systemd user unit (
~/.config/systemd/user/). Runloginctl enable-lingeronce to start it at boot without logging in. Manual equivalent: seesystemd/codex-516-guard.service.example. - macOS → a launchd LaunchAgent in
~/Library/LaunchAgents/(starts at login, in your GUI session). Load withlaunchctl bootstrap gui/$(id -u) <plist>/launchctl kickstart -k …; remove withlaunchctl bootout …. - Windows → prints manual steps, registers nothing (see below).
Windows autostart is manual — on purpose
A program that writes an autostart entry (Startup VBS / Run key / scheduled task) and
launches a hidden process trips behavioral antivirus as trojan-like persistence —
Kaspersky's proactive-defense module flags the launching python.exe as
PDM:Trojan.Win32.Generic. A user-created Startup shortcut is trusted by the same AV.
So this package ships a windowless launcher, codex-516-guardw (a Windows GUI-subsystem
exe — no console window at logon), and install-service just tells you how to point a
shortcut at it:
Win+R→shell:startup(opens the Startup folder).- New → Shortcut → target = the path from
where.exe codex-516-guardw(append--port Nif you use a custom port).
Delete the shortcut to disable it.
Mirrored-networking shortcut (WSL ↔ Windows)
If your WSL2 uses networkingMode=mirrored, Windows and WSL share 127.0.0.1. Then
you only need one proxy on either side — run it in WSL (as a systemd service), and on
the Windows side just add the openai_base_url line to ~/.codex/config.toml pointing at
the same 127.0.0.1:8787. No second proxy or Windows autostart needed (the only cost is
that Windows Codex depends on the WSL proxy being up).
Verify
curl -sS http://127.0.0.1:8787/healthz # {"ok":true,...}
journalctl --user -u codex-516-guard -f | grep -E 'round|done' # Linux/WSL
A live fold looks like this (two chained 516s beaten, answer correct):
round 1: in=21550 out=664 reason=516 total=22214 | n=1 buffered=['function_call'] -> continue
round 2: in=22078 out=652 reason=516 total=22730 | n=1 buffered=['function_call'] -> continue
round 3: in=22606 out=566 reason=291 total=23172 | n=None buffered=[...] -> clean
done: 3 round(s) | ... | status=completed stop=natural
Develop
git clone https://github.com/dzshzx/codex-516-guard && cd codex-516-guard
uv sync
uv run python test_fold.py # fold state-machine self-test → ALL PASS
uv run codex-516-guard # run locally
Releases go out via PyPI Trusted Publishing (.github/workflows/release.yml, OIDC, no
stored token): push a v* tag and it builds + publishes automatically.
Layout:
guard/fold.py— fingerprint detection + fold state machine (transport-agnostic; covered bytest_fold.py).guard/server.py— starlette transport: ws / SSE downstream, SSE upstream, zstd/gzip request decompression,/v1/*passthrough.guard/cli.py— CLI entry (codex-516-guard; loopback only; auth passthrough, stores no credentials).
Security & disclaimer
- The proxy is auth passthrough only: it forwards Codex's
Authorizationheader and never reads, stores, or logs any credential. - It listens on the loopback address only — do not expose it on a non-loopback interface.
- Unofficial: it depends on upstream behavior that isn't a public contract (the truncation fingerprint, the ws frame format). An OpenAI-side change may break it. Use at your own risk.
- Continuation spends extra real tokens (see
metadata.proxy_billed_usage); the guard bounds this with annwindow and a 3-round cap.
Community
Built for and shared with the LINUX DO community, where the gpt-5.5 "516" degradation was diagnosed and discussed. Feedback and issues welcome there and on GitHub Issues.
License
MIT. Fully open source, no closed parts.
Mechanism credit: neteroster/CodexCont (MIT) — this project reuses its 518n−2 detect-and-continue idea with an independent, from-scratch implementation, and keeps the built-in provider intact (see Differences). CodexCont's MIT copyright notice is retained in LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codex_516_guard-0.2.8.tar.gz.
File metadata
- Download URL: codex_516_guard-0.2.8.tar.gz
- Upload date:
- Size: 56.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71e34ffd95e0cf14751794bd8d0f163cdba31a4698f61192316dae3794f3cda9
|
|
| MD5 |
1fc204f138f3bb7c29f4889f80d09cb2
|
|
| BLAKE2b-256 |
149ab6127adcc785b7725337b6fb60b2e3d6fa1c8e2879bf382e58b2a56a6c1d
|
Provenance
The following attestation bundles were made for codex_516_guard-0.2.8.tar.gz:
Publisher:
release.yml on dzshzx/codex-516-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codex_516_guard-0.2.8.tar.gz -
Subject digest:
71e34ffd95e0cf14751794bd8d0f163cdba31a4698f61192316dae3794f3cda9 - Sigstore transparency entry: 2063865883
- Sigstore integration time:
-
Permalink:
dzshzx/codex-516-guard@c872b441d08322ab2a97e9322ecea47a0c883dec -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/dzshzx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c872b441d08322ab2a97e9322ecea47a0c883dec -
Trigger Event:
push
-
Statement type:
File details
Details for the file codex_516_guard-0.2.8-py3-none-any.whl.
File metadata
- Download URL: codex_516_guard-0.2.8-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7e6fd736ea647037ffb7b18124cec7d073708f9986ad0168f699a87c37c8760
|
|
| MD5 |
f9896c4813939b04741bfd87759fe63d
|
|
| BLAKE2b-256 |
6f8377dad1be58c38fdd295f2512a06dd5cfb849a09881bad1520876ce8fe939
|
Provenance
The following attestation bundles were made for codex_516_guard-0.2.8-py3-none-any.whl:
Publisher:
release.yml on dzshzx/codex-516-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codex_516_guard-0.2.8-py3-none-any.whl -
Subject digest:
a7e6fd736ea647037ffb7b18124cec7d073708f9986ad0168f699a87c37c8760 - Sigstore transparency entry: 2063865923
- Sigstore integration time:
-
Permalink:
dzshzx/codex-516-guard@c872b441d08322ab2a97e9322ecea47a0c883dec -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/dzshzx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c872b441d08322ab2a97e9322ecea47a0c883dec -
Trigger Event:
push
-
Statement type: