The Context Optimization Layer for LLM Applications - Cut costs by 50-90%
Project description
Headroom
The Context Optimization Layer for LLM Applications
Cut your LLM costs by 50-90% without losing accuracy
Why Headroom?
- Zero code changes - works as a transparent proxy
- 50-90% cost savings - verified on real workloads
- Reversible compression - LLM retrieves original data via CCR
- Content-aware - code, logs, JSON each handled optimally
- Provider caching - automatic prefix optimization for cache hits
- Persistent memory - remember across conversations with zero-latency extraction
- Framework native - LangChain, MCP, agents supported
Headroom vs Alternatives
| Approach | Token Reduction | Accuracy | Reversible | Latency |
|---|---|---|---|---|
| Headroom | 50-90% | No loss | Yes (CCR) | ~1-5ms |
| Truncation | Variable | Data loss | No | ~0ms |
| Summarization | 60-80% | Lossy | No | ~500ms+ |
| No optimization | 0% | Full | N/A | 0ms |
Headroom wins because it intelligently selects relevant content while keeping a retrieval path to the original data.
30-Second Quickstart
Option 1: Proxy (Zero Code Changes)
pip install "headroom-ai[proxy]"
headroom proxy --port 8787
Point your tools at the proxy:
# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude
# Any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 cursor
Option 2: LangChain Integration
pip install "headroom-ai[langchain]"
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel
# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
# Use exactly like before
response = llm.invoke("Hello!")
See the full LangChain Integration Guide for memory, retrievers, agents, and more.
Framework Integrations
| Framework | Integration | Docs |
|---|---|---|
| LangChain | HeadroomChatModel, memory, retrievers, agents |
Guide |
| MCP | Tool output compression for Claude | Guide |
| Any OpenAI Client | Proxy server | Guide |
Features
| Feature | Description | Docs |
|---|---|---|
| Memory | Persistent memory across conversations (zero-latency inline extraction) | Memory |
| Universal Compression | ML-based content detection + structure-preserving compression | Compression |
| SmartCrusher | Compresses JSON tool outputs statistically | Transforms |
| CacheAligner | Stabilizes prefixes for provider caching | Transforms |
| RollingWindow | Manages context limits without breaking tools | Transforms |
| CCR | Reversible compression with automatic retrieval | CCR Guide |
| LangChain | Memory, retrievers, agents, streaming | LangChain |
| Text Utilities | Opt-in compression for search/logs | Text Compression |
| LLMLingua-2 | ML-based 20x compression (opt-in) | LLMLingua |
| Code-Aware | AST-based code compression (tree-sitter) | Transforms |
Performance
| Scenario | Before | After | Savings |
|---|---|---|---|
| Search results (1000 items) | 45,000 tokens | 4,500 tokens | 90% |
| Log analysis (500 entries) | 22,000 tokens | 3,300 tokens | 85% |
| Long conversation (50 turns) | 80,000 tokens | 32,000 tokens | 60% |
| Agent with tools (10 calls) | 100,000 tokens | 15,000 tokens | 85% |
Overhead: ~1-5ms per request
Providers
| Provider | Token Counting | Cache Optimization |
|---|---|---|
| OpenAI | tiktoken (exact) | Automatic prefix caching |
| Anthropic | Official API | cache_control blocks |
| Official API | Context caching | |
| Cohere | Official API | - |
| Mistral | Official tokenizer | - |
New models auto-supported via naming pattern detection.
Safety Guarantees
- Never removes human content - user/assistant messages preserved
- Never breaks tool ordering - tool calls and responses stay paired
- Parse failures are no-ops - malformed content passes through unchanged
- Compression is reversible - LLM retrieves original data via CCR
Installation
pip install headroom-ai # SDK only
pip install "headroom-ai[proxy]" # Proxy server
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[code]" # AST-based code compression
pip install "headroom-ai[llmlingua]" # ML-based compression
pip install "headroom-ai[all]" # Everything
Requirements: Python 3.10+
Documentation
| Guide | Description |
|---|---|
| Memory Guide | Persistent memory for LLMs |
| Compression Guide | Universal compression with ML detection |
| LangChain Integration | Full LangChain support |
| SDK Guide | Fine-grained control |
| Proxy Guide | Production deployment |
| Configuration | All options |
| CCR Guide | Reversible compression |
| Metrics | Monitoring |
| Troubleshooting | Common issues |
Who's Using Headroom?
Add your project here! Open a PR or start a discussion.
Contributing
git clone https://github.com/chopratejas/headroom.git
cd headroom
pip install -e ".[dev]"
pytest
See CONTRIBUTING.md for details.
License
Apache License 2.0 - see LICENSE.
Built for the AI developer community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file headroom_ai-0.2.5.tar.gz.
File metadata
- Download URL: headroom_ai-0.2.5.tar.gz
- Upload date:
- Size: 479.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd4f369a1d6c24737a362704efbc750193be7caa06666639bdea1c7077da1851
|
|
| MD5 |
bf0c2bde8e9ca16b536f47e4e53ceb92
|
|
| BLAKE2b-256 |
73fb7a1f9ad6e99de2b4d1c984936da6719fb327107b9651e67f93d642c81ccf
|
File details
Details for the file headroom_ai-0.2.5-py3-none-any.whl.
File metadata
- Download URL: headroom_ai-0.2.5-py3-none-any.whl
- Upload date:
- Size: 370.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97392d95ac33a0f6ef5d26826dff64436eaabd23700dd8d897c413c7303ea829
|
|
| MD5 |
651e1d9c35ddf2852d2aebdb825638b1
|
|
| BLAKE2b-256 |
99206fc77834f9da9fa04ad845627c8f65f550faf8a4ca186241428ee8019753
|