The Context Optimization Layer for LLM Applications - Cut costs by 50-90%
Project description
Headroom
The Context Optimization Layer for LLM Applications
Tool outputs are 70-95% redundant boilerplate. Headroom compresses that away.
Demo
Quick Start
pip install "headroom-ai[all]"
Simplest: Proxy (zero code changes)
headroom proxy --port 8787
# Claude Code — just set the base URL
ANTHROPIC_BASE_URL=http://localhost:8787 claude
# Cursor, Continue, any OpenAI-compatible tool
OPENAI_BASE_URL=http://localhost:8787/v1 cursor
Works with any language, any tool, any framework. One env var. Proxy docs
Python: One function
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
Works with any Python LLM client — Anthropic, OpenAI, LiteLLM, httpx, anything.
Already have a proxy or gateway?
You don't need to replace it. Drop Headroom into your existing stack:
| Your setup | Add Headroom | One-liner |
|---|---|---|
| LiteLLM | Callback | litellm.callbacks = [HeadroomCallback()] |
| Any Python proxy | ASGI Middleware | app.add_middleware(CompressionMiddleware) |
| Any Python app | compress() |
result = compress(messages, model="gpt-4o") |
| Agno agents | Wrap model | HeadroomAgnoModel(your_model) |
| LangChain | Wrap model | HeadroomChatModel(your_llm) (experimental) |
Full Integration Guide — detailed setup for LiteLLM, ASGI middleware, compress(), and every framework.
Does It Actually Work?
100 production log entries. One critical error buried at position 67.
| Baseline | Headroom | |
|---|---|---|
| Input tokens | 10,144 | 1,260 |
| Correct answers | 4/4 | 4/4 |
Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."
87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py
What Headroom kept
From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.
Accuracy Benchmarks
| Benchmark | Metric | Result | Compression |
|---|---|---|---|
| Scrapinghub Extraction | Recall | 98.2% | 94.9% |
| Multi-Tool Agent (4 tools) | Accuracy | 100% | 76.3% |
| SmartCrusher (JSON) | Accuracy | 100% | 87.6% |
Full methodology: Benchmarks | Run yourself: python -m headroom.evals quick
How It Works
flowchart LR
App["Your App"] --> H["Headroom"] --> LLM["LLM Provider"]
LLM --> Resp["Response"]
Inside Headroom
flowchart TB
subgraph Pipeline["Transform Pipeline"]
CA["1. CacheAligner\nStabilizes prefix for KV cache"]
CR["2. ContentRouter\nDetects content type, picks compressor"]
IC["3. IntelligentContext\nScore-based token fitting"]
QE["4. Query Echo\nRe-injects user question"]
CA --> CR --> IC --> QE
end
subgraph Compressors["ContentRouter dispatches to"]
SC["SmartCrusher\nJSON arrays"]
CC["CodeCompressor\nAST-aware code"]
LL["LLMLingua\nML-based text"]
end
subgraph CCR["CCR: Compress-Cache-Retrieve"]
Store[("Compressed\nStore")]
Tool["headroom_retrieve"]
Tool <--> Store
end
CR --> Compressors
SC -. "stores originals +\nsummary of what's omitted" .-> Store
QE --> LLM["LLM Provider"]
LLM -. "retrieves when\nit needs more" .-> Tool
Headroom never throws data away. It compresses aggressively and retrieves precisely. When it compresses 500 items to 20, it tells the LLM what was omitted ("87 passed, 2 failed, 1 error") so the LLM knows when to ask for more.
Verified on Real Workloads
| Scenario | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| Codebase exploration | 78,502 | 41,254 | 47% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
Overhead: 1-5ms compression latency.
Integrations
| Integration | Status | Docs |
|---|---|---|
compress() — one function |
Stable | Integration Guide |
| LiteLLM callback | Stable | Integration Guide |
| ASGI middleware | Stable | Integration Guide |
| Proxy server | Stable | Proxy Docs |
| Agno | Stable | Agno Guide |
| MCP (Claude Code) | Stable | MCP Guide |
| Strands | Stable | Strands Guide |
| LangChain | Experimental | LangChain Guide |
Features
| Feature | What it does |
|---|---|
| Content Router | Auto-detects content type, routes to optimal compressor |
| SmartCrusher | Statistically compresses JSON arrays — preserves errors, anomalies, boundaries |
| CodeCompressor | AST-aware compression for Python, JS, Go, Rust, Java, C++ |
| LLMLingua-2 | ML-based 20x text compression |
| CCR | Reversible compression — LLM retrieves originals when needed |
| Compression Summaries | Tells the LLM what was omitted ("3 errors, 12 failures") |
| Query Echo | Re-injects user question after compressed data for better attention |
| CacheAligner | Stabilizes prefixes for provider KV cache hits |
| IntelligentContext | Score-based context management with learned importance |
| Image Compression | 40-90% token reduction via trained ML router |
| Memory | Persistent memory across conversations |
| Compression Hooks | Customize compression with pre/post hooks |
| Query Echo | Re-injects user question after compressed data for better attention |
Cloud Providers
headroom proxy --backend bedrock --region us-east-1 # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure # Azure OpenAI
headroom proxy --backend openrouter # OpenRouter (400+ models)
Installation
pip install headroom-ai # Core library
pip install "headroom-ai[all]" # Everything (recommended)
pip install "headroom-ai[proxy]" # Proxy server
pip install "headroom-ai[mcp]" # MCP for Claude Code
pip install "headroom-ai[agno]" # Agno integration
pip install "headroom-ai[langchain]" # LangChain (experimental)
pip install "headroom-ai[evals]" # Evaluation framework
Python 3.10+
Documentation
| Integration Guide | LiteLLM, ASGI, compress(), proxy |
| Proxy Docs | Proxy server configuration |
| Architecture | How the pipeline works |
| CCR Guide | Reversible compression |
| Benchmarks | Accuracy validation |
| Evals Framework | Prove compression preserves accuracy |
| Memory | Persistent memory |
| Agno | Agno agent framework |
| MCP | Claude Code subscriptions |
| Configuration | All options |
Contributing
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file headroom_ai-0.3.7.tar.gz.
File metadata
- Download URL: headroom_ai-0.3.7.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ff02b66f5a6b08a1e53a38409aa3c21ff4e97a30cb9180b285b2cdf7386562b
|
|
| MD5 |
473c4b663efd416c539c22a4797257e6
|
|
| BLAKE2b-256 |
744f9f0f44990cc1fec8f722b7f2312edefc97c4d3f70d426c26ccaba93b40cd
|
File details
Details for the file headroom_ai-0.3.7-py3-none-any.whl.
File metadata
- Download URL: headroom_ai-0.3.7-py3-none-any.whl
- Upload date:
- Size: 806.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a44417766da543017189e817ee2d734052400ffe2acf3debe5f79650da00a12b
|
|
| MD5 |
d6825c3a69e5f7258c10d31000080538
|
|
| BLAKE2b-256 |
f88dc77f7cb02ea3f77255ea14416181241829f0de497cce3b7aca195442f339
|