Negotiated session codebook compression for LLMs — cut 20-60% of tokens losslessly

These details have not been verified by PyPI

Project links

Project description

ContextPack

Negotiated session codebook compression for LLMs — cut tokens, keep answers.

ContextPack is an OpenAI-compatible proxy + library that compresses the context you send to any LLM (OpenAI, Anthropic, or any OpenAI-compatible API). Unlike one-sided compressors that simply throw bytes away, its signature feature is a negotiated session codebook: it negotiates a shared abbreviation dictionary with the model, so compression is lossless — the model confirms each symbol before it's used. Pure Python, no Rust or ML binaries, works out of the box.

Why ContextPack

Negotiated codebook (lossless). ContextPack proposes [CP_1] = <big chunk of context>, the model acknowledges it, and every later turn sends the symbol instead of the chunk. Because the model confirmed the mapping, nothing is lost. Nothing else does this.
Content-aware compression. Separate, format-specific compressors for JSON, code, logs, stacktraces, and query-aware prose — each strips redundancy the way that format allows.
Lazy references. Huge blobs (over a configurable token threshold) are replaced with a reference; the model retrieves the full content on demand instead of re-sending it every turn.
Token budget optimizer + semantic dedup. Fit a conversation into a target budget and drop near-duplicate content automatically.
4 ways to use it: HTTP proxy, Python library, CLI, or MCP server.
Live analytics dashboard. Watch token savings accumulate in real time at /dashboard.
Bring-your-own-key (BYOK). Each request can carry its own upstream key, so every user pays their own bill.
Pure Python. No native binaries, no downloaded ML models, no GPU. pip install -e . and go.

Benchmarks

The core claim — compression doesn't change the model's answers — is tested two ways against real datasets (GSM8K, SQuAD v2, TruthfulQA), with deterministic sampling (seed=42) and Wilson/normal confidence intervals. The two runs answer different questions and are reported separately (never blended):

Scale — does it hold across thousands of diverse inputs? (gpt-4o-mini, full datasets)
Strength — does it hold on a stronger model? (gpt-4o, N=100)

Scale — `gpt-4o-mini`, 6,557 cases (full datasets)

Benchmark	N	Baseline	Compressed	Δ	Compression	Tokens saved
Codebook	21	100%	100%	±0.0%	57%	6,852
Workload (code/JSON/log)	26	100%	100%	±0.0%	24%	499
SQuAD v2 (prose)	5,236	46.2%	46.7%	+0.6%	24%	228,976
GSM8K	1,029	79.6%	79.6%	±0.000	0%¹	0
TruthfulQA	245	48.6%	48.6%	±0.000	0%¹	0

Strength — `gpt-4o`, N=100

Benchmark	N	Baseline	Compressed	Δ	Compression
Codebook	21	100%	100%	±0.0%	57%
Workload	26	100%	100%	±0.0%	24%
SQuAD v2	100	70.7%	70.5%	-0.2%	20%
GSM8K	100	88.0%	88.0%	±0.000	0%¹
TruthfulQA	100	56.0%	56.0%	±0.000	0%¹

Codebook, per scenario (the unique angle — lossless by construction, the model confirms every symbol):

Scenario	Turns	Accuracy	Tokens saved
`auth_service`	6	100%	41–44%
`data_schema`	7	100%	58–59%
`api_spec`	8	100%	60%

Verdict: compression preserves accuracy — every delta is within ±0.6%, and the codebook path is exactly lossless on both models. On gpt-4o, GSM8K (88%) lands in the same league as published baselines (~87%), confirming the setup is sound.

¹ GSM8K/TruthfulQA are short prose with nothing to compress, so compression is a deliberate no-op — those rows prove non-interference, not savings.

Honest notes: the scale run hit the account's rate limit at full concurrency, so 1,527 cases exhausted retries (SQuAD landed at 5,236 of 5,928, TruthfulQA at 245 of 790) — the completed cases are valid and the SQuAD CI is still tight [46–48%]. The two runs use different N by design (scale vs. strength); they are reported as separate tables with their own confidence intervals and never averaged together.

Reproduce:

python -m evals.suite --tier 3 --n 200                 # scale-ish, mini, cheap
python -m evals.suite --tier 3 --n 100 --model gpt-4o  # strength

Quick start

git clone https://github.com/surya16122114/contextpack
cd contextpack
pip install -e .
cp .env.example .env        # add your upstream key (or use bring-your-own-key per request)
contextpack serve           # starts the proxy on :8000

Then point the OpenAI SDK at the proxy — no other code changes:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-...",   # your real upstream key; passed through as BYOK
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize the attached spec..."}],
    extra_headers={"x-session-id": "my-session"},   # reuse a session to build a codebook
)
print(resp.choices[0].message.content)

Every response includes X-ContextPack-* headers (Original-Tokens, Compressed-Tokens, Savings, Codebook-Size) so you can see exactly what was saved.

The 4 usage modes

1. HTTP Proxy

Drop-in OpenAI-compatible endpoint. Change base_url and you're done — works with Cursor, the OpenAI SDK, LangChain, or any OpenAI client.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-your-upstream-key",   # BYOK: this key is used for the upstream call
)

The Authorization: Bearer <key> header is treated as bring-your-own-key — ContextPack forwards it to the upstream provider instead of using the server's own key. Pass x-session-id to keep building the same codebook across calls.

2. Python library

Use the compression pipeline in-process, no server required:

from contextpack import ContextPackClient

client = ContextPackClient(
    upstream_provider="openai",       # or "anthropic"
    upstream_api_key="sk-...",
    session_id="my-session",          # optional; auto-generated if omitted
)

response = client.chat(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "..."}],
)

print("Response:", response.content)
print("Tokens saved:", response.tokens_saved)
print("Codebook size:", response.codebook_size)

3. CLI

contextpack serve                    # start the proxy (--port, --host, --reload)
contextpack stats                    # global compression stats
contextpack stats <session-id>       # per-session stats
contextpack codebook <session-id>    # show the negotiated codebook for a session
contextpack mcp-install              # auto-configure the MCP server in your clients

4. MCP server

Expose ContextPack's compression as tools (compress_text, analyze_tokens, get_stats) to any MCP client.

Let ContextPack configure it for you:

contextpack mcp-install                       # configures Claude Desktop, Cursor, and Claude Code
contextpack mcp-install --client cursor       # just one client
contextpack mcp-install --dry-run             # preview without writing anything

Or add it by hand to your client's MCP config (e.g. Claude Desktop / Cursor):

{
  "mcpServers": {
    "contextpack": {
      "command": "python",
      "args": ["-m", "contextpack.mcp_server"]
    }
  }
}

mcp-install writes exactly this block (using the active Python interpreter) to:

Claude Desktop — ~/Library/Application Support/Claude/claude_desktop_config.json (macOS), %APPDATA%/Claude/claude_desktop_config.json (Windows), ~/.config/Claude/claude_desktop_config.json (Linux)
Cursor — ~/.cursor/mcp.json
Claude Code — via claude mcp add contextpack -- <python> -m contextpack.mcp_server (if the claude CLI is on your PATH)

How it works

┌────────┐      ┌──────────────────────────────────────────────────────────┐      ┌──────────────┐
│        │      │                       ContextPack                         │      │              │
│ Client │─────▶│  ContentRouter ─▶ Compressors ─▶ Codebook Negotiator      │─────▶│ Upstream LLM │
│ (SDK / │      │  (JSON/code/log/  (format-aware)  (negotiates [CP_n] with  │      │ (OpenAI /    │
│  Cursor│      │   stacktrace/                       the model)             │      │  Anthropic)  │
│  /any) │◀─────│   prose)        ─▶ Lazy Refs ─▶ Budget Optimizer          │◀─────│              │
│        │      │                                                           │      │              │
└────────┘      │   ◀── response decompress (symbols → original content)    │      └──────────────┘
                └──────────────────────────────────────────────────────────┘

The negotiated codebook. When a chunk of context recurs (or is large enough to be worth it), ContextPack injects a one-time system message establishing a mapping — [CP_1] = <the full content> — and asks the model to confirm it. Once the model acknowledges, every subsequent turn sends just [CP_1] instead of the full chunk. The mapping lives for the session, so the savings compound the longer the conversation runs. On the way back, any symbols in the response are expanded to their original content before the client sees them. Because the dictionary is agreed with the model, this is lossless — the model knows exactly what each symbol means.

Configuration

Set these in .env (see .env.example) or as environment variables.

Variable	Default	Description
`UPSTREAM_PROVIDER`	`anthropic`	`anthropic` or `openai`
`UPSTREAM_API_KEY`	`""`	Default upstream key (overridable per-request via BYOK)
`UPSTREAM_BASE_URL`	`https://api.anthropic.com`	Upstream API base URL
`PROXY_PORT`	`8000`	Port the proxy listens on
`PROXY_HOST`	`0.0.0.0`	Host the proxy binds to
`DB_PATH`	`~/.contextpack/contextpack.db`	SQLite store for sessions, codebooks, analytics
`CODEBOOK_MIN_FREQ`	`3`	Times a pattern must recur before it's a codebook candidate
`CODEBOOK_NEGOTIATE_AFTER`	`2`	Recurrences after which negotiation is triggered
`CODEBOOK_MAX_ENTRIES`	`50`	Max codebook entries per session
`CODEBOOK_MIN_TOKEN_SAVINGS`	`10`	Minimum net token savings for an entry to be worth it
`ENABLE_CROSS_SESSION`	`true`	Allow codebook reuse across sessions
`REF_THRESHOLD_TOKENS`	`100`	Token size above which content becomes a lazy reference
`ENABLE_LAZY_REFS`	`true`	Enable lazy reference loading
`ENABLE_SUMMARIZER`	`false`	Auto-summarize long content (off — costs upstream tokens)
`SUMMARIZE_THRESHOLD`	`500`	Token size above which auto-summarize kicks in
`TOKEN_BUDGET`	`8000`	Default token budget for the optimizer
`LOG_LEVEL`	`INFO`	Logging level

Dashboard

While the proxy is running, open http://localhost:8000/dashboard for a live view of token savings, per-session compression ratios, and active codebooks. Append ?session_id=<id> to focus on a single session.

How it compares

	Doing nothing	Generic compressor	ContextPack
Token savings on repeated context	0%	partial	41–60% (codebook)
Lossless	n/a	no — drops bytes one-sidedly	yes — model confirms the dictionary
Content-aware (JSON/code/logs)	no	sometimes	yes
Lazy references for huge blobs	no	rarely	yes
Drop-in OpenAI-compatible proxy	n/a	varies	yes
Library / CLI / MCP server	n/a	varies	all three
Runtime footprint	none	often Rust/ML deps	pure Python

The honest summary: generic compressors decide unilaterally what to throw away. ContextPack's negotiated codebook is the unique angle — it reaches an explicit agreement with the model about what each symbol means, which is why accuracy stays at 100% on the codebook benchmark while still saving up to 60% of tokens.

Development / running evals

pip install -e .                  # core install
pip install -e ".[evals]"         # optional: pull datasets via HuggingFace instead of canonical URLs

# Tiered benchmark suite (datasets auto-download and cache under ~/.contextpack/eval_cache/)
python -m evals.suite --tier 1            # workload + codebook (fast)
python -m evals.suite --tier 2 --n 50     # + SQuAD context compression
python -m evals.suite --tier 3 --n 100    # full suite (SQuAD + GSM8K + TruthfulQA)

Results are printed as a rich table and written to evals/RESULTS.md.

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextpack_ai-0.1.0.tar.gz (77.4 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextpack_ai-0.1.0-py3-none-any.whl (58.9 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file contextpack_ai-0.1.0.tar.gz.

File metadata

Download URL: contextpack_ai-0.1.0.tar.gz
Upload date: Jun 22, 2026
Size: 77.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for contextpack_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4bf466f6e98d18f8d2da22b6e43bd2f36a2f45159929183f94cb64a2d58340f9`
MD5	`9be3a974c23f9e33fdf630cbacf210c4`
BLAKE2b-256	`f2315cff1bb7b44120db2cc3da2a54abf1620cf235262704f3eb33c0ccc354fc`

See more details on using hashes here.

File details

Details for the file contextpack_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: contextpack_ai-0.1.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 58.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for contextpack_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a97ba3bc0cfd9724081f7f53a638a688e515720f50a9f57ce8445a989a33c566`
MD5	`ece7e47109130a2c69901c41abdaa8ec`
BLAKE2b-256	`fc358ddd0f12dcbf1ad601124a417c6f34571b6b415412f680946ce559f252d2`

See more details on using hashes here.

contextpack-ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextPack

Why ContextPack

Benchmarks

Scale — gpt-4o-mini, 6,557 cases (full datasets)

Strength — gpt-4o, N=100

Quick start

The 4 usage modes

1. HTTP Proxy

2. Python library

3. CLI

4. MCP server

How it works

Configuration

Dashboard

How it compares

Development / running evals

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Scale — `gpt-4o-mini`, 6,557 cases (full datasets)

Strength — `gpt-4o`, N=100