Skip to main content

The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

Project description

Headroom

The Context Optimization Layer for LLM Applications

Cut your LLM costs by 50-90% without losing accuracy

CI PyPI Python License


Why Headroom?

  • Zero code changes - works as a transparent proxy
  • 50-90% cost savings - verified on real workloads
  • Reversible compression - LLM retrieves original data via CCR
  • Content-aware - code, logs, JSON each handled optimally
  • Provider caching - automatic prefix optimization for cache hits
  • Persistent memory - remember across conversations with zero-latency extraction
  • Framework native - LangChain, Agno, MCP, agents supported

Headroom vs Alternatives

Approach Token Reduction Accuracy Reversible Latency
Headroom 50-90% No loss Yes (CCR) ~1-5ms
Truncation Variable Data loss No ~0ms
Summarization 60-80% Lossy No ~500ms+
No optimization 0% Full N/A 0ms

Headroom wins because it intelligently selects relevant content while keeping a retrieval path to the original data.


30-Second Quickstart

Option 1: Proxy (Zero Code Changes)

pip install "headroom-ai[proxy]"
headroom proxy --port 8787

Point your tools at the proxy:

# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Option 2: LangChain Integration

pip install "headroom-ai[langchain]"
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel

# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))

# Use exactly like before
response = llm.invoke("Hello!")

See the full LangChain Integration Guide for memory, retrievers, agents, and more.

Option 3: Agno Integration

pip install "headroom-ai[agno]"
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

# Wrap your model - that's it!
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)

# Use exactly like before
response = agent.run("Hello!")

# Check savings
print(f"Tokens saved: {model.total_tokens_saved}")

See the full Agno Integration Guide for hooks, multi-provider support, and more.


Framework Integrations

Framework Integration Docs
LangChain HeadroomChatModel, memory, retrievers, agents Guide
Agno HeadroomAgnoModel, hooks, multi-provider Guide
MCP Tool output compression for Claude Guide
Any OpenAI Client Proxy server Guide

Features

Feature Description Docs
Memory Persistent memory across conversations (zero-latency inline extraction) Memory
Universal Compression ML-based content detection + structure-preserving compression Compression
SmartCrusher Compresses JSON tool outputs statistically Transforms
CacheAligner Stabilizes prefixes for provider caching Transforms
RollingWindow Manages context limits without breaking tools Transforms
CCR Reversible compression with automatic retrieval CCR Guide
LangChain Memory, retrievers, agents, streaming LangChain
Agno Agent framework integration with hooks Agno
Text Utilities Opt-in compression for search/logs Text Compression
LLMLingua-2 ML-based 20x compression (opt-in) LLMLingua
Code-Aware AST-based code compression (tree-sitter) Transforms

Performance

Scenario Before After Savings
Search results (1000 items) 45,000 tokens 4,500 tokens 90%
Log analysis (500 entries) 22,000 tokens 3,300 tokens 85%
Long conversation (50 turns) 80,000 tokens 32,000 tokens 60%
Agent with tools (10 calls) 100,000 tokens 15,000 tokens 85%

Overhead: ~1-5ms per request


Providers

Provider Token Counting Cache Optimization
OpenAI tiktoken (exact) Automatic prefix caching
Anthropic Official API cache_control blocks
Google Official API Context caching
Cohere Official API -
Mistral Official tokenizer -

New models auto-supported via naming pattern detection.


Safety Guarantees

  • Never removes human content - user/assistant messages preserved
  • Never breaks tool ordering - tool calls and responses stay paired
  • Parse failures are no-ops - malformed content passes through unchanged
  • Compression is reversible - LLM retrieves original data via CCR

Installation

pip install headroom-ai              # SDK only
pip install "headroom-ai[proxy]"     # Proxy server
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[agno]"      # Agno agent framework
pip install "headroom-ai[code]"      # AST-based code compression
pip install "headroom-ai[llmlingua]" # ML-based compression
pip install "headroom-ai[all]"       # Everything

Requirements: Python 3.10+


Documentation

Guide Description
Memory Guide Persistent memory for LLMs
Compression Guide Universal compression with ML detection
LangChain Integration Full LangChain support
Agno Integration Full Agno agent framework support
SDK Guide Fine-grained control
Proxy Guide Production deployment
Configuration All options
CCR Guide Reversible compression
Metrics Monitoring
Troubleshooting Common issues

Who's Using Headroom?

Add your project here! Open a PR or start a discussion.


Contributing

git clone https://github.com/chopratejas/headroom.git
cd headroom
pip install -e ".[dev]"
pytest

See CONTRIBUTING.md for details.


License

Apache License 2.0 - see LICENSE.


Built for the AI developer community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headroom_ai-0.2.9.tar.gz (498.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

headroom_ai-0.2.9-py3-none-any.whl (390.1 kB view details)

Uploaded Python 3

File details

Details for the file headroom_ai-0.2.9.tar.gz.

File metadata

  • Download URL: headroom_ai-0.2.9.tar.gz
  • Upload date:
  • Size: 498.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for headroom_ai-0.2.9.tar.gz
Algorithm Hash digest
SHA256 6c6b8827da5e49ece685e560647aaa228414cdd159305c46e9fedbde4b2b733a
MD5 d4d38f213fdc044b86c1003cb979854b
BLAKE2b-256 86a811004cd56b8d71468234cb71bb33817aadde8989b144b2b4fd803ccfdfc3

See more details on using hashes here.

File details

Details for the file headroom_ai-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: headroom_ai-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 390.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for headroom_ai-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 81b55db9c2e6b36cf1069bc3da54b6c9c1a08354000fc668bfd029cc7e13cb2a
MD5 befdcec8944549b403bb9c213f80aeb7
BLAKE2b-256 2fbf87cc9778b769faa5ecf453395d3b1b2574c69d3f562e87db37dc3b8781b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page