The Context Optimization Layer for LLM Applications - Cut costs by 50-90%
Project description
Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal — losslessly, locally, and without touching accuracy.
100 logs. One FATAL error buried at position 67. Both runs found it. Baseline 10,144 tokens → Headroom 1,260 tokens — 87% fewer, identical answer.
python examples/needle_in_haystack_test.py
Quick start
Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.
Wrap your coding agent — one command:
pip install "headroom-ai[all]"
headroom wrap claude # Claude Code
headroom wrap codex # Codex
headroom wrap cursor # Cursor
headroom wrap aider # Aider
headroom wrap copilot # GitHub Copilot CLI
Drop it into your own code — Python or TypeScript:
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
Or run it as a proxy — zero code changes, any language:
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
Why Headroom
- Accuracy-preserving. GSM8K 0.870 → 0.870 (±0.000). TruthfulQA +0.030. SQuAD v2 and BFCL both 97% accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
- Runs on your machine. No cloud API, no data egress. Compression latency is milliseconds — faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
- Kompress-base on HuggingFace. Our open-source text compressor, fine-tuned on real agentic traces — tool outputs, logs, RAG chunks, code. Install with
pip install "headroom-ai[ml]". - Cross-agent memory and learning. Claude Code saves a fact, Codex reads it back.
headroom learnmines failed sessions and writes corrections straight toCLAUDE.md/AGENTS.md/GEMINI.md— reliability compounds over time. - Reversible (CCR). Compression is not deletion. The model can always call
headroom_retrieveto pull the original bytes. Nothing is thrown away.
Bundles the RTK binary for shell-output rewriting — full attribution below.
How it fits
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
│ prompts · tool outputs · logs · RAG results · files
▼
┌────────────────────────────────────────────────────┐
│ Headroom (runs locally — your data stays here) │
│ ─────────────────────────────────────────────── │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST) │
│ └─ Kompress-base (text, HF) │
│ │
│ Cross-agent memory · headroom learn · MCP │
└────────────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
→ Architecture · CCR reversible compression · Kompress-base model card
Proof
Savings on real agent workloads:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy preserved on standard benchmarks:
| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | — | 97% | 19% compression |
| BFCL | Tools | 100 | — | 97% | 32% compression |
Reproduce:
python -m headroom.evals suite --tier 1
Community, live:
→ Full benchmarks & methodology
Built for coding agents
| Agent | One-command wrap | Notes |
|---|---|---|
| Claude Code | headroom wrap claude |
--memory for cross-agent memory, --code-graph for codebase intel |
| Codex | headroom wrap codex --memory |
Shares the same memory store as Claude |
| Cursor | headroom wrap cursor |
Prints Cursor config — paste once, done |
| Aider | headroom wrap aider |
Starts proxy, launches Aider |
| Copilot CLI | headroom wrap copilot |
Starts proxy, launches Copilot |
| OpenClaw | headroom wrap openclaw |
Installs Headroom as ContextEngine plugin |
MCP-native too — headroom mcp install exposes headroom_compress, headroom_retrieve, and headroom_stats to any MCP client.
Integrations
Drop Headroom into any stack
| Your setup | Hook in with |
|---|---|
| Any Python app | compress(messages, model=…) |
| Any TypeScript app | await compress(messages, { model }) |
| Anthropic / OpenAI SDK | withHeadroom(new Anthropic()) · withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| Strands | Strands guide |
| ASGI apps | app.add_middleware(CompressionMiddleware) |
| Multi-agent | SharedContext().put / .get |
| MCP clients | headroom mcp install |
What's inside
- SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
- CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
- Kompress-base — our HuggingFace model, trained on agentic traces.
- Image compression — 40–90% reduction via trained ML router.
- CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- IntelligentContext — score-based context fitting with learned importance.
- CCR — reversible compression; LLM retrieves originals on demand.
- Cross-agent memory — shared store, agent provenance, auto-dedup.
- SharedContext — compressed context passing across multi-agent workflows.
headroom learn— plugin-based failure mining for Claude, Codex, Gemini.
Install
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.
→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.
Documentation
| Start here | Go deeper |
|---|---|
| Quickstart | Architecture |
| Proxy | How compression works |
| MCP tools | CCR — reversible compression |
| Memory | Cache optimization |
| Failure learning | Benchmarks |
| Configuration | Limitations |
Compared to
Headroom runs locally, covers every content type (not just CLI or text), works with every major framework, and is reversible.
| Scope | Deploy | Local | Reversible | |
|---|---|---|---|---|
| Headroom | All context — tools, RAG, logs, files, history | Proxy · library · middleware · MCP | Yes | Yes |
| RTK | CLI command outputs | CLI wrapper | Yes | No |
| Compresr, Token Co. | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |
Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting —
git show→git show --short, noisyls→ scoped, chatty installers → summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.
Contributing
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.
Community
- Live leaderboard — 60B+ tokens saved and counting.
- Discord — questions, feedback, war stories.
- Kompress-base on HuggingFace — the model behind our text compression.
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file headroom_ai-0.7.1.tar.gz.
File metadata
- Download URL: headroom_ai-0.7.1.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c27ce6f13e11d08c0737be80842322f9c049ee0ea04d0d9b40668e366d3c693
|
|
| MD5 |
46b81b20bb7cdf8bc48d6671c94511b9
|
|
| BLAKE2b-256 |
27478271735663f4be6a872b5bde87e2206462c30e717953725e00e2a14ed694
|
Provenance
The following attestation bundles were made for headroom_ai-0.7.1.tar.gz:
Publisher:
release.yml on chopratejas/headroom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
headroom_ai-0.7.1.tar.gz -
Subject digest:
1c27ce6f13e11d08c0737be80842322f9c049ee0ea04d0d9b40668e366d3c693 - Sigstore transparency entry: 1343735753
- Sigstore integration time:
-
Permalink:
chopratejas/headroom@5391761fe6512f936a0728c6e5010929aa95a387 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chopratejas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5391761fe6512f936a0728c6e5010929aa95a387 -
Trigger Event:
push
-
Statement type:
File details
Details for the file headroom_ai-0.7.1-py3-none-any.whl.
File metadata
- Download URL: headroom_ai-0.7.1-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f83fdf9aa9a631b111ddaa9bb318782258fcb6ce7a26a778021393b2713d6493
|
|
| MD5 |
02929c13a04334c55c19fd4ac014809b
|
|
| BLAKE2b-256 |
39da4f5b33b788ffefd0e87465f0173eb11d406d577cc7b0554667c4d24e9e84
|
Provenance
The following attestation bundles were made for headroom_ai-0.7.1-py3-none-any.whl:
Publisher:
release.yml on chopratejas/headroom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
headroom_ai-0.7.1-py3-none-any.whl -
Subject digest:
f83fdf9aa9a631b111ddaa9bb318782258fcb6ce7a26a778021393b2713d6493 - Sigstore transparency entry: 1343735757
- Sigstore integration time:
-
Permalink:
chopratejas/headroom@5391761fe6512f936a0728c6e5010929aa95a387 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chopratejas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5391761fe6512f936a0728c6e5010929aa95a387 -
Trigger Event:
push
-
Statement type: