The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

These details have not been verified by PyPI

Project links

Project description

Headroom

The Context Optimization Layer for LLM Applications

Cut your LLM costs by 50-90% without losing accuracy

Why Headroom?

Zero code changes - works as a transparent proxy
50-90% cost savings - verified on real workloads
Reversible compression - LLM retrieves original data via CCR
Content-aware - code, logs, JSON each handled optimally
Provider caching - automatic prefix optimization for cache hits
Persistent memory - remember across conversations with zero-latency extraction
Framework native - LangChain, Agno, MCP, agents supported

Headroom vs Alternatives

Approach	Token Reduction	Accuracy	Reversible	Latency
Headroom	50-90%	No loss	Yes (CCR)	~1-5ms
Truncation	Variable	Data loss	No	~0ms
Summarization	60-80%	Lossy	No	~500ms+
No optimization	0%	Full	N/A	0ms

Headroom wins because it intelligently selects relevant content while keeping a retrieval path to the original data.

30-Second Quickstart

Option 1: Proxy (Zero Code Changes)

pip install "headroom-ai[proxy]"
headroom proxy --port 8787

Point your tools at the proxy:

# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Option 2: LangChain Integration

pip install "headroom-ai[langchain]"

from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel

# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))

# Use exactly like before
response = llm.invoke("Hello!")

See the full LangChain Integration Guide for memory, retrievers, agents, and more.

Option 3: Agno Integration

pip install "headroom-ai[agno]"

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

# Wrap your model - that's it!
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)

# Use exactly like before
response = agent.run("Hello!")

# Check savings
print(f"Tokens saved: {model.total_tokens_saved}")

See the full Agno Integration Guide for hooks, multi-provider support, and more.

Framework Integrations

Framework	Integration	Docs
LangChain	`HeadroomChatModel`, memory, retrievers, agents	Guide
Agno	`HeadroomAgnoModel`, hooks, multi-provider	Guide
MCP	Tool output compression for Claude	Guide
Any OpenAI Client	Proxy server	Guide

Features

Feature	Description	Docs
Memory	Persistent memory across conversations (zero-latency inline extraction)	Memory
Universal Compression	ML-based content detection + structure-preserving compression	Compression
SmartCrusher	Compresses JSON tool outputs statistically	Transforms
CacheAligner	Stabilizes prefixes for provider caching	Transforms
RollingWindow	Manages context limits without breaking tools	Transforms
CCR	Reversible compression with automatic retrieval	CCR Guide
LangChain	Memory, retrievers, agents, streaming	LangChain
Agno	Agent framework integration with hooks	Agno
Text Utilities	Opt-in compression for search/logs	Text Compression
LLMLingua-2	ML-based 20x compression (opt-in)	LLMLingua
Code-Aware	AST-based code compression (tree-sitter)	Transforms

Performance

Scenario	Before	After	Savings
Search results (1000 items)	45,000 tokens	4,500 tokens	90%
Log analysis (500 entries)	22,000 tokens	3,300 tokens	85%
Long conversation (50 turns)	80,000 tokens	32,000 tokens	60%
Agent with tools (10 calls)	100,000 tokens	15,000 tokens	85%

Overhead: ~1-5ms per request

Providers

Provider	Token Counting	Cache Optimization
OpenAI	tiktoken (exact)	Automatic prefix caching
Anthropic	Official API	cache_control blocks
Google	Official API	Context caching
Cohere	Official API	-
Mistral	Official tokenizer	-

New models auto-supported via naming pattern detection.

Safety Guarantees

Never removes human content - user/assistant messages preserved
Never breaks tool ordering - tool calls and responses stay paired
Parse failures are no-ops - malformed content passes through unchanged
Compression is reversible - LLM retrieves original data via CCR

Installation

pip install headroom-ai              # SDK only
pip install "headroom-ai[proxy]"     # Proxy server
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[agno]"      # Agno agent framework
pip install "headroom-ai[code]"      # AST-based code compression
pip install "headroom-ai[llmlingua]" # ML-based compression
pip install "headroom-ai[all]"       # Everything

Requirements: Python 3.10+

Documentation

Guide	Description
Memory Guide	Persistent memory for LLMs
Compression Guide	Universal compression with ML detection
LangChain Integration	Full LangChain support
Agno Integration	Full Agno agent framework support
SDK Guide	Fine-grained control
Proxy Guide	Production deployment
Configuration	All options
CCR Guide	Reversible compression
Metrics	Monitoring
Troubleshooting	Common issues

Who's Using Headroom?

Add your project here! Open a PR or start a discussion.

Contributing

git clone https://github.com/chopratejas/headroom.git
cd headroom
pip install -e ".[dev]"
pytest

See CONTRIBUTING.md for details.

License

Apache License 2.0 - see LICENSE.

_{Built for the AI developer community}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.2

Apr 21, 2026

0.8.1

Apr 21, 2026

0.8.0

Apr 21, 2026

0.7.4

Apr 21, 2026

0.7.3

Apr 21, 2026

0.7.2

Apr 21, 2026

0.7.1

Apr 20, 2026

0.7.0

Apr 20, 2026

0.6.7

Apr 20, 2026

0.6.6

Apr 20, 2026

0.6.5

Apr 19, 2026

0.6.4

Apr 19, 2026

0.6.3

Apr 18, 2026

0.6.2

Apr 18, 2026

0.6.1

Apr 17, 2026

0.5.25

Apr 13, 2026

0.5.24

Apr 12, 2026

0.5.23

Apr 12, 2026

0.5.22

Apr 12, 2026

0.5.21

Apr 8, 2026

0.5.20

Apr 8, 2026

0.5.19

Apr 7, 2026

0.5.18

Apr 3, 2026

0.5.17

Mar 31, 2026

0.5.16

Mar 31, 2026

0.5.15

Mar 31, 2026

0.5.14

Mar 30, 2026

0.5.13

Mar 30, 2026

0.5.12

Mar 30, 2026

0.5.11

Mar 30, 2026

0.5.10

Mar 29, 2026

0.5.9

Mar 28, 2026

0.5.8

Mar 27, 2026

0.5.7

Mar 26, 2026

0.5.6

Mar 25, 2026

0.5.5

Mar 25, 2026

0.5.4

Mar 24, 2026

0.5.3

Mar 24, 2026

0.5.2

Mar 20, 2026

0.5.1

Mar 19, 2026

0.5.0

Mar 19, 2026

0.4.6

Mar 17, 2026

0.4.5

Mar 15, 2026

0.4.4

Mar 14, 2026

0.4.3

Mar 13, 2026

0.4.2

Mar 13, 2026

0.4.1

Mar 13, 2026

0.4.0

Mar 11, 2026

0.3.8

Mar 10, 2026

0.3.7

Feb 19, 2026

0.3.6

Feb 19, 2026

0.3.5

Feb 19, 2026

0.3.4

Feb 16, 2026

0.3.3

Feb 11, 2026

0.3.2

Feb 11, 2026

0.3.1

Feb 2, 2026

0.3.0

Jan 31, 2026

0.2.15

Jan 21, 2026

0.2.14

Jan 20, 2026

0.2.13

Jan 19, 2026

0.2.12

Jan 18, 2026

0.2.10

Jan 17, 2026

This version

0.2.9

Jan 17, 2026

0.2.8

Jan 16, 2026

0.2.7

Jan 16, 2026

0.2.6

Jan 16, 2026

0.2.5

Jan 16, 2026

0.2.4

Jan 15, 2026

0.2.2

Jan 14, 2026

0.2.1

Jan 10, 2026

0.2.0

Jan 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headroom_ai-0.2.9.tar.gz (498.9 kB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

headroom_ai-0.2.9-py3-none-any.whl (390.1 kB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file headroom_ai-0.2.9.tar.gz.

File metadata

Download URL: headroom_ai-0.2.9.tar.gz
Upload date: Jan 17, 2026
Size: 498.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for headroom_ai-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`6c6b8827da5e49ece685e560647aaa228414cdd159305c46e9fedbde4b2b733a`
MD5	`d4d38f213fdc044b86c1003cb979854b`
BLAKE2b-256	`86a811004cd56b8d71468234cb71bb33817aadde8989b144b2b4fd803ccfdfc3`

See more details on using hashes here.

File details

Details for the file headroom_ai-0.2.9-py3-none-any.whl.

File metadata

Download URL: headroom_ai-0.2.9-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 390.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for headroom_ai-0.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81b55db9c2e6b36cf1069bc3da54b6c9c1a08354000fc668bfd029cc7e13cb2a`
MD5	`befdcec8944549b403bb9c213f80aeb7`
BLAKE2b-256	`2fbf87cc9778b769faa5ecf453395d3b1b2574c69d3f562e87db37dc3b8781b0`

See more details on using hashes here.

headroom-ai 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Headroom

Why Headroom?

Headroom vs Alternatives

30-Second Quickstart

Option 1: Proxy (Zero Code Changes)

Option 2: LangChain Integration

Option 3: Agno Integration

Framework Integrations

Features

Performance

Providers

Safety Guarantees

Installation

Documentation

Who's Using Headroom?

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes