Skip to main content

The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

Project description

Headroom

The Context Optimization Layer for LLM Applications

Cut your LLM costs by 50-90% without losing accuracy

CI PyPI Python License


What It Does

Headroom is a smart compression proxy for LLM applications:

  • Compresses tool outputs — 1000 search results → 15 items (keeps errors, anomalies, relevant items)
  • Enables provider caching — Stabilizes prefixes so cache hits actually happen
  • Manages context windows — Prevents token limit failures without breaking tool calls
  • Reversible compression — LLM can retrieve original data if needed (CCR architecture)

Zero code changes required — point your existing tools at the proxy.


30-Second Quickstart

# Install
pip install "headroom-ai[proxy]"

# Start proxy
headroom proxy --port 8787

# Verify
curl http://localhost:8787/health

Use with your tools:

# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude

# Cursor / Continue / any OpenAI client
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

# Python scripts
export OPENAI_BASE_URL=http://localhost:8787/v1
python your_script.py

That's it. You're saving tokens.


Verify It's Working

curl http://localhost:8787/stats
{
  "tokens": {"saved": 12500, "savings_percent": 25.0},
  "cost": {"total_savings_usd": 0.04}
}

Installation

pip install "headroom-ai[proxy]"     # Proxy server (recommended)
pip install headroom-ai              # SDK only
pip install "headroom-ai[all]"       # Everything

Requirements: Python 3.10+


Features

Feature Description Docs
SmartCrusher Compresses JSON tool outputs statistically Transforms
CacheAligner Stabilizes prefixes for provider caching Transforms
RollingWindow Manages context limits without breaking tools Transforms
CCR Reversible compression with automatic retrieval CCR Guide
Text Utilities Opt-in compression for search/logs Text Compression
LLMLingua-2 ML-based 20x compression (opt-in) LLMLingua

Providers

Provider Token Counting Cache Optimization
OpenAI tiktoken (exact) Automatic prefix caching
Anthropic Official API cache_control blocks
Google Official API Context caching
Cohere Official API -
Mistral Official tokenizer -

Performance

Scenario Before After Savings
Search results (1000 items) 45,000 tokens 4,500 tokens 90%
Log analysis (500 entries) 22,000 tokens 3,300 tokens 85%
Long conversation (50 turns) 80,000 tokens 32,000 tokens 60%

Overhead: ~1-5ms per request.


Safety

  • Never removes human content — User/assistant messages are never compressed
  • Never breaks tool ordering — Tool calls and responses stay paired
  • Parse failures are no-ops — Malformed content passes through unchanged
  • Compression is reversible — LLM can retrieve original data via CCR

Documentation

Guide Description
SDK Guide Wrap your client for fine-grained control
Proxy Guide Production deployment
Configuration All configuration options
CCR Guide Reversible compression architecture
Metrics Monitoring and observability
Troubleshooting Common issues
Architecture How it works internally

Examples

See examples/ for runnable code:

  • basic_usage.py — Simple SDK usage
  • proxy_integration.py — Using with different clients
  • ccr_demo.py — CCR architecture demonstration

Contributing

git clone https://github.com/chopratejas/headroom.git
cd headroom
pip install -e ".[dev]"
pytest

See CONTRIBUTING.md for details.


License

Apache License 2.0 — see LICENSE.


Built for the AI developer community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headroom_ai-0.2.2.tar.gz (364.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

headroom_ai-0.2.2-py3-none-any.whl (282.3 kB view details)

Uploaded Python 3

File details

Details for the file headroom_ai-0.2.2.tar.gz.

File metadata

  • Download URL: headroom_ai-0.2.2.tar.gz
  • Upload date:
  • Size: 364.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for headroom_ai-0.2.2.tar.gz
Algorithm Hash digest
SHA256 636f3e47abfad88434d12a03d90705210137bcbe05a1f02bdac6805ffa21e098
MD5 38cca46f69c4a5ed8b0614aade5240df
BLAKE2b-256 4672b35f6cf2339b1bbbf8bbd0df0ad6be813058d6e726033efac380ed3ec128

See more details on using hashes here.

File details

Details for the file headroom_ai-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: headroom_ai-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 282.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for headroom_ai-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a2f0cc100243568cb7738fc174c60abf93d3c0cb59cc8a0479b6ecd7576b792
MD5 b5c942a141e75fb5505c9f7c7be4fd1f
BLAKE2b-256 a121f50fc5512578c52c9ce1223a7b6b5240d883ed5ec14d2738f80494d0b289

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page