Skip to main content

The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

Project description

Headroom

The Context Optimization Layer for LLM Applications

Tool outputs are 70-95% redundant boilerplate. Headroom compresses that away.

CI PyPI Python Downloads License Documentation


Demo

Headroom Demo


Quick Start

pip install "headroom-ai[all]"

Simplest: Proxy (zero code changes)

headroom proxy --port 8787
# Claude Code — just set the base URL
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Cursor, Continue, any OpenAI-compatible tool
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Works with any language, any tool, any framework. One env var. Proxy docs

Python: One function

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

Works with any Python LLM client — Anthropic, OpenAI, LiteLLM, httpx, anything.

Already have a proxy or gateway?

You don't need to replace it. Drop Headroom into your existing stack:

Your setup Add Headroom One-liner
LiteLLM Callback litellm.callbacks = [HeadroomCallback()]
Any Python proxy ASGI Middleware app.add_middleware(CompressionMiddleware)
Any Python app compress() result = compress(messages, model="gpt-4o")
Agno agents Wrap model HeadroomAgnoModel(your_model)
LangChain Wrap model HeadroomChatModel(your_llm) (experimental)

Full Integration Guide — detailed setup for LiteLLM, ASGI middleware, compress(), and every framework.


Does It Actually Work?

100 production log entries. One critical error buried at position 67.

Baseline Headroom
Input tokens 10,144 1,260
Correct answers 4/4 4/4

Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."

87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py

What Headroom kept

From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.

Accuracy Benchmarks

Benchmark Metric Result Compression
Scrapinghub Extraction Recall 98.2% 94.9%
Multi-Tool Agent (4 tools) Accuracy 100% 76.3%
SmartCrusher (JSON) Accuracy 100% 87.6%

Full methodology: Benchmarks | Run yourself: python -m headroom.evals quick


How It Works

flowchart LR
  App["Your App"] --> H["Headroom"] --> LLM["LLM Provider"]
  LLM --> Resp["Response"]

Inside Headroom

flowchart TB
  subgraph Pipeline["Transform Pipeline"]
    CA["1. CacheAligner\nStabilizes prefix for KV cache"]
    CR["2. ContentRouter\nDetects content type, picks compressor"]
    IC["3. IntelligentContext\nScore-based token fitting"]
    QE["4. Query Echo\nRe-injects user question"]
    CA --> CR --> IC --> QE
  end

  subgraph Compressors["ContentRouter dispatches to"]
    SC["SmartCrusher\nJSON arrays"]
    CC["CodeCompressor\nAST-aware code"]
    LL["LLMLingua\nML-based text"]
  end

  subgraph CCR["CCR: Compress-Cache-Retrieve"]
    Store[("Compressed\nStore")]
    Tool["headroom_retrieve"]
    Tool <--> Store
  end

  CR --> Compressors
  SC -. "stores originals +\nsummary of what's omitted" .-> Store
  QE --> LLM["LLM Provider"]
  LLM -. "retrieves when\nit needs more" .-> Tool

Headroom never throws data away. It compresses aggressively and retrieves precisely. When it compresses 500 items to 20, it tells the LLM what was omitted ("87 passed, 2 failed, 1 error") so the LLM knows when to ask for more.

Verified on Real Workloads

Scenario Before After Savings
Code search (100 results) 17,765 1,408 92%
SRE incident debugging 65,694 5,118 92%
Codebase exploration 78,502 41,254 47%
GitHub issue triage 54,174 14,761 73%

Overhead: 1-5ms compression latency.


Integrations

Integration Status Docs
compress() — one function Stable Integration Guide
LiteLLM callback Stable Integration Guide
ASGI middleware Stable Integration Guide
Proxy server Stable Proxy Docs
Agno Stable Agno Guide
MCP (Claude Code) Stable MCP Guide
Strands Stable Strands Guide
LangChain Experimental LangChain Guide

Features

Feature What it does
Content Router Auto-detects content type, routes to optimal compressor
SmartCrusher Statistically compresses JSON arrays — preserves errors, anomalies, boundaries
CodeCompressor AST-aware compression for Python, JS, Go, Rust, Java, C++
LLMLingua-2 ML-based 20x text compression
CCR Reversible compression — LLM retrieves originals when needed
Compression Summaries Tells the LLM what was omitted ("3 errors, 12 failures")
Query Echo Re-injects user question after compressed data for better attention
CacheAligner Stabilizes prefixes for provider KV cache hits
IntelligentContext Score-based context management with learned importance
Image Compression 40-90% token reduction via trained ML router
Memory Persistent memory across conversations
Compression Hooks Customize compression with pre/post hooks
Query Echo Re-injects user question after compressed data for better attention

Cloud Providers

headroom proxy --backend bedrock --region us-east-1     # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure                          # Azure OpenAI
headroom proxy --backend openrouter                     # OpenRouter (400+ models)

Installation

pip install headroom-ai                # Core library
pip install "headroom-ai[all]"         # Everything (recommended)
pip install "headroom-ai[proxy]"       # Proxy server
pip install "headroom-ai[mcp]"         # MCP for Claude Code
pip install "headroom-ai[agno]"        # Agno integration
pip install "headroom-ai[langchain]"   # LangChain (experimental)
pip install "headroom-ai[evals]"       # Evaluation framework

Python 3.10+


Documentation

Integration Guide LiteLLM, ASGI, compress(), proxy
Proxy Docs Proxy server configuration
Architecture How the pipeline works
CCR Guide Reversible compression
Benchmarks Accuracy validation
Evals Framework Prove compression preserves accuracy
Memory Persistent memory
Agno Agno agent framework
MCP Claude Code subscriptions
Configuration All options

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headroom_ai-0.3.7.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

headroom_ai-0.3.7-py3-none-any.whl (806.8 kB view details)

Uploaded Python 3

File details

Details for the file headroom_ai-0.3.7.tar.gz.

File metadata

  • Download URL: headroom_ai-0.3.7.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for headroom_ai-0.3.7.tar.gz
Algorithm Hash digest
SHA256 3ff02b66f5a6b08a1e53a38409aa3c21ff4e97a30cb9180b285b2cdf7386562b
MD5 473c4b663efd416c539c22a4797257e6
BLAKE2b-256 744f9f0f44990cc1fec8f722b7f2312edefc97c4d3f70d426c26ccaba93b40cd

See more details on using hashes here.

File details

Details for the file headroom_ai-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: headroom_ai-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 806.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for headroom_ai-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a44417766da543017189e817ee2d734052400ffe2acf3debe5f79650da00a12b
MD5 d6825c3a69e5f7258c10d31000080538
BLAKE2b-256 f88dc77f7cb02ea3f77255ea14416181241829f0de497cce3b7aca195442f339

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page