Skip to main content

Security-first self-hosted privacy masking and routing proxy for LLM traffic.

Project description

stronk-gateway

Security-first self-hosted masking and rehydration proxy for large language model traffic.

stronk-gateway sits between callers and upstream providers. It detects sensitive input, applies per-type policy, forwards only masked payloads, and rehydrates model output before returning it to the caller.

This repo now also includes a separate operator control plane:

  • proxy-api: the caller-facing masking proxy
  • admin-api: sanitized monitoring API
  • admin-ui: modern light/dark monitoring workspace
  • admin-gateway: authenticated reverse-proxy entrypoint for the control plane

Implemented Provider Surfaces

  • POST /v1/chat/completions
  • POST /v1/responses
  • GET /v1/responses websocket upgrade for OpenAI Responses-style realtime clients
  • POST /anthropic/v1/messages
  • GET /health

Supported behavior:

  • non-streaming request masking and response rehydration
  • streaming SSE rehydration for the supported provider families
  • downstream OpenAI Responses websocket compatibility for response.create and response.append
  • real previous_response_id continuations are preserved for upstream /responses requests
  • local generate:false prewarm stays local and uses memory-only turn-state recovery instead of leaking synthetic response IDs upstream
  • request-scoped opaque placeholders
  • config-driven policy per detector type: allow, mask, block, route_local
  • deny-by-default behavior when upstream egress or local routing is unsafe
  • sanitized audit-event persistence for the optional control plane

Websocket Deployment Contract

Supported deployment modes:

  • single-process deployment
  • multi-process deployment with sticky-session affinity for one websocket turn

Supported reconnect semantics:

  • any client harness that can speak OpenAI Responses-style JSON text events can use the websocket bridge; the current x-codex-turn-state header name is compatibility carry-over, but the value itself is treated as a generic bounded opaque turn key
  • x-codex-turn-state is validated locally, kept process-local and memory-only, and never forwarded upstream
  • the last committed previous_response_id is reused only after a terminal event has been delivered locally
  • if a socket disconnects before terminal completion, the next reconnect resumes from the last committed state rather than the aborted in-flight turn
  • same-turn collisions are rejected with deterministic 409 invalid_websocket_turn; they are not serialized
  • session-cap overflow is rejected; existing sessions are not evicted

Explicitly unsupported in this plan:

  • non-sticky multi-worker websocket continuity
  • external shared turn-state storage
  • generic websocket passthrough

Safety Guarantees

  • Raw sensitive values are never forwarded upstream when a detection is masked or blocked.
  • Raw request bodies, response bodies, rehydrated text, headers, provider credentials, and placeholder maps are not persisted by default.
  • Upstream egress is fixed to configured provider base URLs; the caller cannot choose arbitrary upstream targets.
  • Secret classes default to block rather than mask.
  • If policy requires route_local and no local handler exists, the request is rejected instead of falling back upstream.
  • The admin plane is separate from proxy routes and is disabled by default.
  • The admin plane now uses first-party session-cookie auth in stronk-gateway-admin, while raw content traces remain behind the private same-origin bridge path.

V1 Privacy Boundary

In-scope privacy actions on caller-controlled, upstream-visible surfaces:

  • mask instructions
  • mask developer and system message content
  • reject top-level tools when sensitive text appears in descriptions, examples, or schema defaults
  • strip x-codex-turn-metadata
  • do not forward caller-supplied x-codex-turn-state
  • preserve authorization, x-api-key, openai-*, anthropic-*, and x-responsesapi-include-timing-metrics as transport/auth headers rather than privacy-scoped text surfaces
  • preserve payload id and previous_response_id as bounded identifier surfaces
  • preserve caller x-request-id only when it is already a bounded opaque identifier; otherwise replace it with a proxy-issued opaque request id

Still out of scope in v1:

  • neutralizing the public x-codex-turn-state header name itself
  • masking arbitrary provider-defined identifier fields beyond the explicit in-scope surfaces above

Control Plane

The operator surface is intentionally read-only in this phase. Provisioning and materialization happen outside the browser through the active deployment profile (gateway-only or gateway+CLIProxyAPI).

What it shows:

  • request counts, mask/block/rehydration totals, and error counts
  • detector and policy action mix
  • sanitized per-request events with endpoint, model, latency, counts, and touched JSON paths
  • safe config posture and recent control-plane access logs

What it does not show:

  • raw request bodies
  • raw response bodies
  • rehydrated plaintext
  • upstream Authorization or X-API-Key headers
  • placeholder-to-original mappings

Detection Coverage

Deterministic detectors are implemented first and augmented with a local Presidio + spaCy layer by default. The detector interface stays pluggable so richer local NER can still be added later without redesigning the pipeline.

Current detector set:

  • English and Chinese person names
  • Company and organization names
  • English and Chinese addresses
  • Emails
  • US and China phone numbers
  • API keys, including sk-..., sk-proj-..., sk-ant-..., and common provider key prefixes
  • Bearer tokens
  • JWT-like tokens

Architecture

  • src/stronk_gateway/redaction/ - detection, masking, placeholder vault, and structured payload traversal
  • src/stronk_gateway/policy/ - per-detector policy resolution
  • src/stronk_gateway/providers/ - fixed provider endpoint specs
  • src/stronk_gateway/proxy/ - upstream transport, header controls, SSE rehydration, and audit writes
  • src/stronk_gateway/admin/ - first-party session auth, SQLite-backed sanitized event store, and UI lookup helpers
  • src/stronk_gateway/app.py - proxy app factory
  • src/stronk_gateway/admin_app.py - separate admin app factory
  • web/ - React/Vite operator UI with light and dark mode
  • compose/ - local proxy + admin + Caddy stack
  • docs/ - architecture, threat model, and execution plans

Quick Start

Local Python + frontend

python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
python -m spacy download en_core_web_sm
python -m spacy download zh_core_web_sm
make ui-install
make ui-build
make test

Run the proxy:

make run

Run the admin API/UI separately:

STRONK_GATEWAY_AUDIT_STORAGE_ENABLED=true \
STRONK_GATEWAY_AUDIT_DB_PATH=./data/stronk-gateway-audit.sqlite3 \
STRONK_GATEWAY_ENABLE_ADMIN_API=true \
STRONK_GATEWAY_ENABLE_ADMIN_UI=true \
STRONK_GATEWAY_ADMIN_BOOTSTRAP_USER=operator \
STRONK_GATEWAY_ADMIN_BOOTSTRAP_PASSWORD_HASH="$(python - <<'PY'
from stronk_gateway.admin import hash_admin_password
print(hash_admin_password('change-me'))
PY
)" \
STRONK_GATEWAY_ADMIN_UI_DIR=./web/dist \
make admin-run

The admin app now owns first-party web login with a bootstrap username plus scrypt password hash. Running make admin-run is useful for local session/auth/UI work, but content-trace inspection still depends on the same-origin admin gateway path described below because /api/debug/* is bridged to the proxy runtime on a separate private listener.

Local Docker Compose

  1. Choose an operator username:
export STRONK_GATEWAY_ADMIN_USER=operator
  1. Generate an admin scrypt password hash:
export STRONK_GATEWAY_ADMIN_PASSWORD_HASH="$(python - <<'PY'
from stronk_gateway.admin import hash_admin_password
print(hash_admin_password('change-me'))
PY
)"
  1. Start the stack:
docker compose -f compose/docker-compose.yml up --build

Default local endpoints:

  • proxy: http://127.0.0.1:8787
  • admin UI: http://127.0.0.1:8788

Default local behavior:

  • sign in through the admin web UI with STRONK_GATEWAY_ADMIN_USER plus the plaintext password you hashed locally
  • public :8787 stays inference-only
  • raw content traces are visible only after authenticated login through the admin gateway on :8788
  • the trace bridge is internal-only and does not mount raw trace routes on the public proxy listener

Release Publishing

The preferred release path now uses GitHub Actions for both GHCR image publishing and PyPI Trusted Publishing. The recorded release flow, plus the Bitwarden-backed local PyPI fallback, lives in docs/release-publishing.md.

The short version:

cd /Users/eyy/Documents/Work/Dev/repos/stronk-gateway
uv build
BWS_ACCESS_TOKEN="$BWS_STRONK_TERMINAL_ACCESS_TOKEN" \
bws run -- uv publish

Keep GHCR publishing in GitHub Actions; do not add local container-registry tokens to this flow.

Configuration

Core proxy environment variables:

  • STRONK_GATEWAY_OPENAI_UPSTREAM_BASE_URL
  • STRONK_GATEWAY_ANTHROPIC_UPSTREAM_BASE_URL
  • STRONK_GATEWAY_ALLOW_INSECURE_UPSTREAMS=false
  • STRONK_GATEWAY_ALLOW_NONSTANDARD_UPSTREAM_HOSTS=false
  • STRONK_GATEWAY_ENABLE_DEBUG_MASK_ENDPOINT=false
  • STRONK_GATEWAY_PRESIDIO_ENABLED=true
  • STRONK_GATEWAY_PRESIDIO_ENGLISH_MODEL=en_core_web_sm
  • STRONK_GATEWAY_PRESIDIO_CHINESE_MODEL=zh_core_web_sm
  • STRONK_GATEWAY_*_ACTION

Admin plane environment variables:

  • STRONK_GATEWAY_AUDIT_STORAGE_ENABLED=false
  • STRONK_GATEWAY_AUDIT_DB_PATH
  • STRONK_GATEWAY_AUDIT_MAX_EVENTS=2000
  • STRONK_GATEWAY_ENABLE_ADMIN_API=false
  • STRONK_GATEWAY_ENABLE_ADMIN_UI=false
  • STRONK_GATEWAY_ADMIN_AUTH_MODE=session
  • STRONK_GATEWAY_ADMIN_BOOTSTRAP_USER
  • STRONK_GATEWAY_ADMIN_BOOTSTRAP_PASSWORD_HASH
  • STRONK_GATEWAY_ADMIN_BOOTSTRAP_ROLES=admin,operator
  • STRONK_GATEWAY_ADMIN_TRACE_ALLOWED_ROLES=admin,operator
  • STRONK_GATEWAY_ADMIN_SESSION_COOKIE_NAME=stronk_admin_session
  • STRONK_GATEWAY_ADMIN_SESSION_IDLE_TTL_SECONDS=1800
  • STRONK_GATEWAY_ADMIN_SESSION_ABSOLUTE_TTL_SECONDS=43200
  • STRONK_GATEWAY_ADMIN_LOGIN_ATTEMPT_LIMIT=5
  • STRONK_GATEWAY_ADMIN_LOGIN_ATTEMPT_WINDOW_SECONDS=900
  • STRONK_GATEWAY_UNSAFE_DEBUG_GATEWAY_SECRET
  • STRONK_GATEWAY_ADMIN_ALLOWED_ROLES=admin,operator,auditor
  • STRONK_GATEWAY_ADMIN_ACCESS_LOG_MAX_ENTRIES=500
  • STRONK_GATEWAY_ADMIN_UI_DIR=./web/dist

Default policy:

  • email, phone, person_name, organization, address -> mask
  • api_key, bearer_token, jwt -> block

Tests

The suite includes:

  • unit coverage for detector behavior, overlap resolution, placeholder generation, policy parsing, audit summaries, event-store behavior, and SSE rehydration
  • integration coverage for all supported provider endpoints
  • websocket regression coverage for OpenAI Responses response.create, response.append, prewarm, invalid events, and incomplete upstream streams
  • websocket regression coverage for real previous_response_id continuations, fresh-chain resets, reconnect recovery via x-codex-turn-state, and binary-frame rejection
  • regression checks proving raw values do not leak into forwarded upstream payloads
  • admin-plane coverage for 401/403 auth enforcement and sanitized proxy-to-admin event flow
  • streaming tests for OpenAI chat, OpenAI responses, and Anthropic messages
  • bypass and collision canaries including zero-width-key variants and literal placeholder collisions

Benchmark Evidence

The local benchmark harness lives at python3 scripts/bench_proxy.py and writes row artifacts under docs/exec-plans/active/privacy-proxy-hardening-and-scalability-v1/artifacts/benchmarks/.

Measured local rows on 2026-03-28:

  • W1-http-chat: 15.27 requests/s, p95 latency 10649 ms, zero failures, blocked only because no historic pre-change baseline was captured
  • W2-http-responses: 14.33 requests/s, p95 latency 11030 ms, zero failures, blocked only because no historic pre-change baseline was captured
  • W3-sse-stream: 14.47 streams/s, p95 first-byte 4863 ms, zero leak counters, blocked only because no historic pre-change baseline was captured
  • W4-ws-sequential: 80 successful turns at 16 sessions x 5 turns, blocked because the black-box reconnect probe still ended with ConnectionClosedError instead of deterministically proving rollback semantics
  • W5-ws-abuse: all 24/24 scripted abusive requests were rejected and all 4/4 blocked-budget sockets closed, but the row remains blocked because black-box evidence cannot prove internal pre-parse and pre-redaction ordering
  • W6-audit-contention: 256.97 requests/s, p95 latency 1577 ms, zero database is locked failures, blocked on the missing historic baseline and because admin access logging means the read side is not a pure-read workload
  • W7-memory-bound: 9.80 requests/s, 40 truncation hits observed, blocked because the plan never encoded a numeric RSS envelope even though capture limits and truncation markers were exercised

These are local harness measurements, not an SLA. The repo does not claim a p95 regression comparison against pre-change behavior because that historic baseline was not captured before the hardening work.

Stronger Than PasteGuard

This repo now claims stronger behavior only where it is implemented and tested:

  • Explicit OpenAI responses endpoint coverage, not just chat completions.
  • OpenAI responses websocket compatibility on the same public path used by Codex-style clients.
  • Fixed upstream egress targets with request-header allowlists and redirect refusal.
  • Official upstream host pinning is on by default; non-standard compatible hosts require explicit opt-in.
  • Default secret handling is block, not best-effort masking.
  • Raw detection values are not reflected back through the debug path.
  • Streaming rehydration is covered for all three supported provider surfaces.
  • Rehydration is limited to human-readable assistant text paths; tool arguments stay masked by default.
  • Regression tests cover placeholder collisions, bypass attempts, and /responses as a first-class path.
  • The monitoring plane is separate from proxy routes, disabled by default, and stores sanitized telemetry only.

Threat Model Summary

  • Caller credentials for upstream providers are part of the data plane and must not be reused for admin authentication.
  • The admin plane now uses first-party session-cookie auth inside stronk-gateway-admin, backed by a bootstrap username plus scrypt password hash.
  • Raw content traces stay memory-only in the proxy runtime and are reachable only through the loopback/private admin gateway path plus the shared bridge secret.
  • The admin gateway should stay loopback-bound or privately networked by default. If you widen it, add real TLS and network controls first.
  • The SQLite event store persists sanitized events only. It is not a safe place for raw prompts, raw completions, or placeholder vault state.

Known Limitations

  • Name, organization, and address detection is heuristic. It is materially stronger than regex-only email and key detection, but it is not equivalent to a full local NER model.
  • Presidio + spaCy are enabled by default in this repo. Fresh environments must install en_core_web_sm and zh_core_web_sm or explicitly disable Presidio.
  • route_local is a clean policy boundary today, but the local-model execution path is still a scaffold and fails closed.
  • WebSocket support is intentionally scoped to OpenAI Responses-style JSON text events. stronk-gateway is not a generic websocket tunnel and does not currently support binary or audio frames.
  • The websocket bridge keeps the upstream side on HTTP plus Server-Sent Events (SSE). It does not yet proxy upstream websocket transports.
  • The public x-codex-turn-state header name is a compatibility carry-over. Its semantics are harness-neutral, but the header name itself is not yet neutralized.
  • Websocket continuity still depends on single-process deployment or sticky-session affinity for one turn. Non-sticky multi-worker continuity and shared external turn-state remain out of scope here.
  • The admin plane is read-only in this phase. There is no browser-based policy editor or request replay workflow.
  • The admin backend should stay on a private or loopback-bound network. The shared trace-bridge secret is a second trust signal, not a substitute for network boundaries and TLS.
  • The frontend build currently uses npm-managed assets and should be built as part of image creation or CI before enabling the admin UI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stronk_gateway-0.3.14.tar.gz (217.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stronk_gateway-0.3.14-py3-none-any.whl (76.8 kB view details)

Uploaded Python 3

File details

Details for the file stronk_gateway-0.3.14.tar.gz.

File metadata

  • Download URL: stronk_gateway-0.3.14.tar.gz
  • Upload date:
  • Size: 217.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stronk_gateway-0.3.14.tar.gz
Algorithm Hash digest
SHA256 876c5969c3dab29eb574731828c440ec54c6b899d95553ad7a89978015c06d4a
MD5 d9a6391ab847a24649f0975597c512cb
BLAKE2b-256 937a57c09985f63de4572b83c95c1d16b0d0cdfdc2d3f3be4a1cc402ac8ef34c

See more details on using hashes here.

Provenance

The following attestation bundles were made for stronk_gateway-0.3.14.tar.gz:

Publisher: publish-release.yml on EYYCHEEV/stronk-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stronk_gateway-0.3.14-py3-none-any.whl.

File metadata

File hashes

Hashes for stronk_gateway-0.3.14-py3-none-any.whl
Algorithm Hash digest
SHA256 c4f72f2723acb506c09c190a23001f9610ef6a9ef594c634756b59bd7972ff84
MD5 4a6a94cb60352b20b899537067b0341b
BLAKE2b-256 3b3a3998b555ce10f593eeb150ab8b85b3a807f948d0d9ee0063d64a7750e9b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for stronk_gateway-0.3.14-py3-none-any.whl:

Publisher: publish-release.yml on EYYCHEEV/stronk-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page