The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

These details have not been verified by PyPI

Project links

Project description

Headroom

Compress everything your AI agent reads. Same answers, fraction of the tokens.

Every tool call, DB query, file read, and RAG retrieval your agent makes is 70-95% boilerplate.
Headroom compresses it away before it hits the model.

Where Headroom Fits

Your Agent / App
      │
      │  tool calls, logs, DB reads, RAG results, file reads, API responses
      ▼
   Headroom  ← transparent proxy, no code changes needed
      │
      ▼
 LLM Provider  (OpenAI, Anthropic, Google, Bedrock, 100+ via LiteLLM)

Headroom sits between your application and the LLM provider. It intercepts requests, compresses the context, and forwards an optimized prompt. Your app doesn't change — just point it at Headroom.

What gets compressed

Headroom optimizes any data your agent injects into a prompt:

Tool outputs — shell commands, API calls, search results
Database queries — SQL results, key-value lookups
RAG retrievals — document chunks, embeddings results
File reads — code, logs, configs, CSVs
API responses — JSON, XML, HTML
Conversation history — long agent sessions with repetitive context

Quick Start

pip install "headroom-ai[all]"

Proxy (zero code changes)

headroom proxy --port 8787

# Claude Code — just set the base URL
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Cursor, Continue, any OpenAI-compatible tool
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Works with any language, any tool, any framework. One env var. Proxy docs

Python: One function

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

Works with any Python LLM client — Anthropic, OpenAI, LiteLLM, httpx, anything.

Already have a proxy or gateway?

You don't need to replace it. Drop Headroom into your existing stack:

Your setup	Add Headroom	One-liner
LiteLLM	Callback	`litellm.callbacks = [HeadroomCallback()]`
Any Python proxy	ASGI Middleware	`app.add_middleware(CompressionMiddleware)`
Any Python app	`compress()`	`result = compress(messages, model="gpt-4o")`
Agno agents	Wrap model	`HeadroomAgnoModel(your_model)`
LangChain	Wrap model	`HeadroomChatModel(your_llm)` (experimental)

Full Integration Guide — detailed setup for LiteLLM, ASGI middleware, compress(), and every framework.

Demo

Headroom Demo

Does It Actually Work?

100 production log entries. One critical error buried at position 67.

	Baseline	Headroom
Input tokens	10,144	1,260
Correct answers	4/4	4/4

Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."

87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py

What Headroom kept

From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.

Real Workloads

Scenario	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
Codebase exploration	78,502	41,254	47%
GitHub issue triage	54,174	14,761	73%

Accuracy Benchmarks

Compression preserves accuracy — tested on real OSS benchmarks.

Standard Benchmarks — Baseline (direct to API) vs Headroom (through proxy):

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030

Compression Benchmarks — Accuracy after full compression stack:

Benchmark	Category	N	Accuracy	Compression	Method
SQuAD v2	QA	100	97%	19%	Before/After
BFCL	Tool/Function	100	97%	32%	LLM-as-Judge
Tool Outputs (built-in)	Agent	8	100%	20%	Before/After
CCR Needle Retention	Lossless	50	100%	77%	Exact Match

Run it yourself:

# Quick smoke test (8 cases, ~10s)
python -m headroom.evals quick -n 8 --provider openai --model gpt-4o-mini

# Full Tier 1 suite (~$3, ~15 min)
python -m headroom.evals suite --tier 1 -o eval_results/

# CI mode (exit 1 on regression)
python -m headroom.evals suite --tier 1 --ci

Full methodology: Benchmarks | Evals Framework

Key Capabilities

Lossless Compression

Headroom never throws data away. It compresses aggressively, stores the originals, and gives the LLM a tool to retrieve full details when needed. When it compresses 500 items to 20, it tells the model what was omitted ("87 passed, 2 failed, 1 error") so the model knows when to ask for more.

Smart Content Detection

Auto-detects what's in your context — JSON arrays, code, logs, plain text — and routes each to the best compressor. JSON goes to SmartCrusher, code goes through AST-aware compression (Python, JS, Go, Rust, Java, C++), prose goes to LLMLingua-2.

Cache Optimization

Stabilizes message prefixes so your provider's KV cache actually works. Claude offers a 90% read discount on cached prefixes — but almost no framework takes advantage of it. Headroom does.

Failure Learning

headroom learn                   # Analyze past Claude Code sessions, show recommendations
headroom learn --apply           # Write learnings to CLAUDE.md and MEMORY.md
headroom learn --all --apply     # Learn across all your projects

Reads your conversation history, finds every failed tool call, correlates it with what eventually succeeded, and writes specific corrections into your project files. Next session starts smarter. Learn docs

headroom learn demo

Image Compression

40-90% token reduction via trained ML router. Automatically selects the right resize/quality tradeoff per image.

All features

Feature	What it does
Content Router	Auto-detects content type, routes to optimal compressor
SmartCrusher	Universal JSON compression — arrays of dicts, strings, numbers, mixed types, nested objects
CodeCompressor	AST-aware compression for Python, JS, Go, Rust, Java, C++
LLMLingua-2	ML-based 20x text compression
CCR	Reversible compression — LLM retrieves originals when needed
Compression Summaries	Tells the LLM what was omitted ("3 errors, 12 failures")
CacheAligner	Stabilizes prefixes for provider KV cache hits
IntelligentContext	Score-based context management with learned importance
Image Compression	40-90% token reduction via trained ML router
Memory	Persistent memory across conversations
Compression Hooks	Customize compression with pre/post hooks
Read Lifecycle	Detects stale/superseded Read outputs, replaces with CCR markers
`headroom learn`	Analyzes past failures, writes project-specific learnings to CLAUDE.md/MEMORY.md

Headroom vs Alternatives

Context compression is a new space. Here's how the approaches differ:

	Approach	Scope	Deploy as	Framework integrations	Data stays local?	Reversible
Headroom	Multi-algorithm compression	All context (tool outputs, DB reads, RAG, files, logs, history)	Proxy, Python library, ASGI middleware, or callback	LangChain, Agno, LiteLLM, Strands, MCP	Yes (OSS)	Yes (CCR)
RTK	CLI command rewriter	Shell command outputs	CLI wrapper	None	Yes (OSS)	No
Compresr	Cloud compression API	Text sent to their API	API call	None	No	No
Token Company	Cloud compression API	Text sent to their API	API call	None	No	No

Use it however you want. Headroom works as a standalone proxy (headroom proxy), a one-function Python library (compress()), ASGI middleware, or a LiteLLM callback. Already using LiteLLM, LangChain, or Agno? Drop Headroom in without replacing anything.

Headroom + RTK work well together. RTK rewrites CLI commands (git show → git show --short), Headroom compresses everything else (JSON arrays, code, logs, RAG results, conversation history). Use both.

Headroom vs cloud APIs. Compresr and Token Company are hosted services — you send your context to their servers, they compress and return it. Headroom runs locally. Your data never leaves your machine. You also get lossless compression (CCR): the LLM can retrieve the full original when it needs more detail.

How It Works Inside

  Your prompt
      │
      ▼
  1. CacheAligner            Stabilize prefix for KV cache
      │
      ▼
  2. ContentRouter           Route each content type:
      │                         → SmartCrusher    (JSON)
      │                         → CodeCompressor  (code)
      │                         → LLMLingua       (text)
      ▼
  3. IntelligentContext      Score-based token fitting
      │
      ▼
  LLM Provider

  Needs full details? LLM calls headroom_retrieve.
  Originals are in the Compressed Store — nothing is thrown away.

Overhead: 15-200ms compression latency (net positive for Sonnet/Opus). Full data: Latency Benchmarks

Integrations

Integration	Status	Docs
`compress()` — one function	Stable	Integration Guide
LiteLLM callback	Stable	Integration Guide
ASGI middleware	Stable	Integration Guide
Proxy server	Stable	Proxy Docs
Agno	Stable	Agno Guide
MCP (Claude Code)	Stable	MCP Guide
Strands	Stable	Strands Guide
LangChain	Experimental	LangChain Guide

Cloud Providers

headroom proxy --backend bedrock --region us-east-1     # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure                          # Azure OpenAI
headroom proxy --backend openrouter                     # OpenRouter (400+ models)

Installation

pip install headroom-ai                # Core library
pip install "headroom-ai[all]"         # Everything including evals (recommended)
pip install "headroom-ai[proxy]"       # Proxy server
pip install "headroom-ai[mcp]"         # MCP for Claude Code
pip install "headroom-ai[agno]"        # Agno integration
pip install "headroom-ai[langchain]"   # LangChain (experimental)
pip install "headroom-ai[evals]"       # Evaluation framework only

Python 3.10+

Documentation


Integration Guide	LiteLLM, ASGI, compress(), proxy
Proxy Docs	Proxy server configuration
Architecture	How the pipeline works
CCR Guide	Reversible compression
Benchmarks	Accuracy validation
Latency Benchmarks	Compression overhead & cost-benefit analysis
Limitations	When compression helps, when it doesn't
Evals Framework	Prove compression preserves accuracy
Memory	Persistent memory
Agno	Agno agent framework
MCP	Claude Code subscriptions
Learn	Offline failure learning for coding agents
Configuration	All options

Community

Questions, feedback, or just want to follow along? Join us on Discord

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

License

Apache License 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.2

Apr 21, 2026

0.8.1

Apr 21, 2026

0.8.0

Apr 21, 2026

0.7.4

Apr 21, 2026

0.7.3

Apr 21, 2026

0.7.2

Apr 21, 2026

0.7.1

Apr 20, 2026

0.7.0

Apr 20, 2026

0.6.7

Apr 20, 2026

0.6.6

Apr 20, 2026

0.6.5

Apr 19, 2026

0.6.4

Apr 19, 2026

0.6.3

Apr 18, 2026

0.6.2

Apr 18, 2026

0.6.1

Apr 17, 2026

0.5.25

Apr 13, 2026

0.5.24

Apr 12, 2026

0.5.23

Apr 12, 2026

0.5.22

Apr 12, 2026

0.5.21

Apr 8, 2026

0.5.20

Apr 8, 2026

0.5.19

Apr 7, 2026

0.5.18

Apr 3, 2026

0.5.17

Mar 31, 2026

0.5.16

Mar 31, 2026

0.5.15

Mar 31, 2026

0.5.14

Mar 30, 2026

0.5.13

Mar 30, 2026

0.5.12

Mar 30, 2026

0.5.11

Mar 30, 2026

0.5.10

Mar 29, 2026

0.5.9

Mar 28, 2026

0.5.8

Mar 27, 2026

0.5.7

Mar 26, 2026

0.5.6

Mar 25, 2026

0.5.5

Mar 25, 2026

0.5.4

Mar 24, 2026

0.5.3

Mar 24, 2026

0.5.2

Mar 20, 2026

0.5.1

Mar 19, 2026

0.5.0

Mar 19, 2026

0.4.6

Mar 17, 2026

0.4.5

Mar 15, 2026

0.4.4

Mar 14, 2026

0.4.3

Mar 13, 2026

This version

0.4.2

Mar 13, 2026

0.4.1

Mar 13, 2026

0.4.0

Mar 11, 2026

0.3.8

Mar 10, 2026

0.3.7

Feb 19, 2026

0.3.6

Feb 19, 2026

0.3.5

Feb 19, 2026

0.3.4

Feb 16, 2026

0.3.3

Feb 11, 2026

0.3.2

Feb 11, 2026

0.3.1

Feb 2, 2026

0.3.0

Jan 31, 2026

0.2.15

Jan 21, 2026

0.2.14

Jan 20, 2026

0.2.13

Jan 19, 2026

0.2.12

Jan 18, 2026

0.2.10

Jan 17, 2026

0.2.9

Jan 17, 2026

0.2.8

Jan 16, 2026

0.2.7

Jan 16, 2026

0.2.6

Jan 16, 2026

0.2.5

Jan 16, 2026

0.2.4

Jan 15, 2026

0.2.2

Jan 14, 2026

0.2.1

Jan 10, 2026

0.2.0

Jan 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headroom_ai-0.4.2.tar.gz (1.2 MB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

headroom_ai-0.4.2-py3-none-any.whl (914.3 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file headroom_ai-0.4.2.tar.gz.

File metadata

Download URL: headroom_ai-0.4.2.tar.gz
Upload date: Mar 13, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for headroom_ai-0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`6bad76c5c75292b6491729203db72c07728526d4f34800a4002a2a767030765c`
MD5	`77c31f009e79cf3fff9d5a7590b46216`
BLAKE2b-256	`5e8347238452395808b00f64ed9678b247799c0d5c72336054bfedfd83bd912c`

See more details on using hashes here.

File details

Details for the file headroom_ai-0.4.2-py3-none-any.whl.

File metadata

Download URL: headroom_ai-0.4.2-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 914.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for headroom_ai-0.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1adaa06549f019d3214d3f720530e11ee1339d8c020ed0a2109b97ae2e2b83bf`
MD5	`bf29ccf90ef5c7f337809c475b518a8d`
BLAKE2b-256	`864fc6192d1bacf3e38cc5a9ea0725bfe1bf2fc3f54f19c8e3f3ca2f2b2bc182`

See more details on using hashes here.

headroom-ai 0.4.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Headroom

Where Headroom Fits

What gets compressed

Quick Start

Proxy (zero code changes)

Python: One function

Already have a proxy or gateway?

Demo

Does It Actually Work?

Real Workloads

Accuracy Benchmarks

Key Capabilities

Lossless Compression

Smart Content Detection

Cache Optimization

Failure Learning

Image Compression

Headroom vs Alternatives

How It Works Inside

Integrations

Cloud Providers

Installation

Documentation

Community

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes