The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.
Project description
ContextLens
The context compression protocol for LLM inference.
80,000+ words saved. 94% meaning retained. One line of code.
The Problem
Every time you call Claude or GPT, you're sending the same context over and over again. Repeated messages, duplicate code blocks, redundant explanations — all costing you tokens.
A typical 20-message conversation has ~70% redundant content.
The Fix
import anthropic
import ctxlens as cx
client = cx.wrap(anthropic.Anthropic(api_key="..."))
# That's it. Every API call is now automatically compressed.
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": "..."}]
)
One line. Zero changes to your existing code. Drop-in replacement.
Results
Real benchmark across 3 production datasets (coding, agent loops, research):
| Method | Token Reduction | Fidelity | Latency |
|---|---|---|---|
| No compression | 0% | 64.1% | 0.0ms |
| Simple truncation | 39.7% | 72.4% | 0.0ms |
| ctxlens balanced | 66.8% | 67.5% | 0.2ms |
- 1.7x more token reduction than simple truncation
- 83.8% fidelity on agent loops — beats truncation
- 0.2ms latency after model warmup — negligible overhead
- 80,500+ words saved across 395 real conversations
- 100% fact retention on agent tasks
→ See full benchmark methodology
Install
pip install ctxlens
With semantic compression (recommended):
pip install ctxlens[semantic]
Usage
Anthropic
import anthropic
import ctxlens as cx
client = cx.wrap(anthropic.Anthropic(api_key="..."))
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": "..."}]
)
# See how much you saved
print(client.savings)
# {
# 'calls': 1,
# 'tokens_saved_estimate': 847,
# 'redundancy_pct': 73.2,
# 'cost_saved_gbp': 0.0025
# }
OpenAI
import openai
import ctxlens as cx
client = cx.wrap(openai.OpenAI(api_key="..."))
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}]
)
print(client.savings)
Async (AsyncAnthropic / AsyncOpenAI)
import anthropic
import ctxlens as cx
client = cx.wrap(anthropic.AsyncAnthropic(api_key="..."))
async def main():
response = await client.messages.create(
model="claude-opus-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": "..."}]
)
print(client.savings)
Agent loops
import ctxlens as cx
# Wrap your agent — prevents context limit failures on long runs
agent = cx.wrap_agent(your_agent, budget="economic")
result = agent.run("your task here")
Direct compression
from ctxlens import ContextLens
engine = ContextLens(budget="balanced", show_savings=True)
messages = [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
]
result = engine.compress(messages)
print(f"Saved: {result.tokens_estimated_saved} tokens")
print(f"Fidelity: {result.fidelity_score * 100:.1f}% meaning retained")
Compression budgets
| Budget | Aggressiveness | Best for |
|---|---|---|
economic |
High | Long agent loops, cost-sensitive apps |
balanced |
Medium | General use (default) |
precise |
Low | When accuracy is critical |
client = cx.wrap(anthropic.Anthropic(), budget="economic")
How it works
ContextLens runs three compression stages:
- Exact deduplication — removes identical repeated messages (~0ms overhead)
- Semantic triage — scores every message by relevance to the current query using
all-MiniLM-L6-v2locally — zero external API calls - Agent-aware compression — classifies messages by type (goal, error, tool_call, reasoning) and applies type-specific rules
The fidelity score measures how much meaning was retained after compression. A score of 0.95 means 95% of the semantic content was preserved.
Chrome Extension
ContextLens also comes as a Chrome extension that works directly in your browser on Claude, ChatGPT, Gemini, DeepSeek, and Perplexity — no API key needed.
- Auto-compresses when context reaches 75%
- Shows meaning retained score after every response
- Stores memory across conversations and platforms
- Export your memory as JSON
Roadmap
- GitHub pre-fetch filter (reduce token usage in agentic coding)
- Project memory injection (auto-inject context into new chats)
- Node.js SDK
- MCP server for Claude Code
License
MIT — free for personal and commercial use.
Author
Built by Usama Fateh Ali as part of ARIA — an autonomous financial intelligence system.
"93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste."
Real Production Measurement
Measured on ARIA — a live autonomous financial intelligence system running 1,440 decision cycles per day:
| Metric | Value |
|---|---|
| Token redundancy detected | 97.6% |
| Tokens before compression | 1,950 |
| Tokens after compression | 46 |
| Cost per cycle | £0.0057 → £0.00014 |
| Monthly saving at scale | £100-500+ |
Same decisions. 97.6% cheaper.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ctxlens-1.0.3.tar.gz.
File metadata
- Download URL: ctxlens-1.0.3.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0919f1e118fc744f20e0031be30381c807af14b15d5dac3d277af6f83176e106
|
|
| MD5 |
db9e744ba3b6370487eec23475943720
|
|
| BLAKE2b-256 |
f40b5b4a30e27409d1a593244fd051bb2c7ce67447882d0b19075035badbc8fd
|
File details
Details for the file ctxlens-1.0.3-py3-none-any.whl.
File metadata
- Download URL: ctxlens-1.0.3-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeabd08e86175d20d34ac79fe7bd57d57f11dc08fb8b516f4f432bbfce17b6af
|
|
| MD5 |
c0d0c1e7f900cc5a9d3af947177c6570
|
|
| BLAKE2b-256 |
68452cd8667490105f1325942450d853e19895904b06be13f20e6b98b7d09b9f
|