Skip to main content

The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.

Project description

ContextLens

93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste.

License: MIT PyPI version Protocol: CXP v0.1


The Problem

Every LLM application wastes tokens. Not 10%. Not 20%. 93%.

We measured a real production AI system (a live financial intelligence platform making thousands of decisions per day) and found:

Metric Value
Total messages sent 8,584
Unique messages 599
Total characters sent 282,981
Wasted characters 263,234
Redundancy 93.0%

The same reasoning, same context, same instructions — sent hundreds of times. Your LLM reads it fresh every single time. You pay for every single token.

This gets worse with agents. A 50-step agent loop can consume 800,000 tokens to complete a task that needs 50,000 tokens of actual information. That is 94% waste on every complex task.


The Solution

ContextLens is a context compression protocol that sits between your code and any LLM API. It intercepts every request, eliminates redundancy, and forwards only what the model actually needs.

One line. Zero configuration. Works with your existing code.

# Before
import anthropic
client = anthropic.Anthropic(api_key="...")

# After — one line change
import contextlens as cx
client = cx.wrap(anthropic.Anthropic(api_key="..."))

# Everything else stays identical
# Your costs drop immediately

What It Does

Layer 1 — Semantic Triage

Scores every message in your conversation history for relevance to the current prompt. Irrelevant history is compressed or archived. The model only sees what matters right now.

Score > 0.8  →  Sent to model (Hot)
Score 0.3-0.8 →  Compressed to summary (Warm)  
Score < 0.3  →  Archived locally (Cold)

Layer 2 — Deduplication Engine

Identical or near-identical content is stored once and referenced. If your agent re-reads the same system context 200 times, it is sent once and cached.

"Regime:CRISIS Score:-45.4 Top:GLD" × 202 times
→ "Regime:CRISIS [stable, 202 cycles, Score:-45.4]" × 1 time

Layer 3 — Agent State Machine

Agent loops are the worst offenders. ContextLens understands agent-specific message types and applies intelligent rules automatically.

GOAL        →  Always kept (never removed)
TOOL_RESULT →  Kept if referenced in last 3 steps, else summarised
TOOL_CALL   →  Only most recent per tool type kept
REASONING   →  Last 5 steps kept, rest archived
ERROR       →  Count + last error only ("Failed 3x: last=X")

Layer 4 — Prompt Cache Integration

Stable context blocks are automatically flagged for provider-side caching. You pay full price once. Every repeat is 90% cheaper.


Results

Use Case Tokens Before Tokens After Reduction
Long conversation 40,000 8,000 80%
50-step agent loop 800,000 120,000 85%
Code assistant session 60,000 9,000 85%
Production AI system* 282,981 chars 19,747 chars 93%

*Measured on real production data


Installation

pip install contextlens

Usage

Basic — Anthropic

import anthropic
import contextlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))

# Identical to normal usage — compression is automatic
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Basic — OpenAI

import openai
import contextlens as cx

client = cx.wrap(openai.OpenAI(api_key="..."))
# Works identically

Agent Loops

import contextlens as cx

# Wrap your existing agent — zero other changes
agent = cx.wrap_agent(your_langchain_agent)

# Agent now never hits context limits
# Long tasks cost same as short tasks
result = agent.run("Analyse this 10,000 line codebase and fix all bugs")

Token Budget Dial

# Economic — aggressive compression, maximum savings
client = cx.wrap(client, budget="economic")

# Balanced — smart compression, preserves nuance  
client = cx.wrap(client, budget="balanced")

# Precise — minimal compression, maximum accuracy
client = cx.wrap(client, budget="precise")

See Your Savings

import contextlens as cx

client = cx.wrap(client, show_savings=True)

# After each call:
# ─────────────────────────────────────
# ContextLens | This request
#   Sent:     4,847 tokens (↓ from 12,203)
#   Saved:    7,356 tokens  (~£0.018)
#   Session:  £1.24 saved | 147g CO₂ avoided
# ─────────────────────────────────────

For Infrastructure / Data Centers

ContextLens runs as a drop-in proxy in front of any inference server.

docker run -p 8080:8080 \
  -e UPSTREAM=http://your-vllm-instance:8000 \
  contextlens/proxy:latest

Change one line in your application:

ANTHROPIC_BASE_URL=https://your-proxy:8080

Result: Every GPU on your cluster handles more concurrent users. Same hardware. Less energy. More revenue per chip.

vLLM Plugin (coming in v0.3)

vllm serve meta-llama/Llama-3-70b --contextlens-plugin enabled

Context is compressed before the KV cache is allocated. The GPU never processes redundant tokens.


The Carbon Impact

Every redundant token burns real energy. At scale:

  • 10 million API calls/day × 60% average compression = 6 million fewer GPU-seconds per day
  • Equivalent to powering a city block for a year — eliminated entirely
  • ContextLens tracks your CO₂ avoided in real time

AI inference is the fastest growing slice of global electricity consumption. Context waste is the fastest fix.


The Open Protocol: CXP

ContextLens is built on the Context Exchange Protocol (CXP) — an open specification for how context moves between applications and language models.

The spec is free. Anyone can implement it. Any model provider can support it.

Read the CXP v0.1 Specification


Benchmarks

All benchmarks are reproducible. The methodology is open source.

See benchmark methodology
Run benchmarks on your own data


Roadmap

Version Feature Status
v0.1 Deduplication engine + basic proxy 🔨 Building
v0.2 Semantic triage (MiniLM embeddings) Planned
v0.3 Agent state machine Planned
v0.4 Prompt cache integration Planned
v1.0 Accuracy guarantee + CXP spec final Planned
v2.0 vLLM hardware plugin Planned

Why Free

Context waste is an infrastructure problem that affects every developer, every company, and the planet. A solution locked behind a paywall does not fix the infrastructure.

ContextLens is free for developers. Forever.

Infrastructure deployments (data centers, GPU clouds, enterprise on-premise) are the paid tier. They save millions. They can pay.


Contributing

The protocol is open. Implementations are welcome.

git clone https://github.com/Usama1909/contextlens
cd contextlens
pip install -e ".[dev]"
pytest tests/

Contribution Guide


License

MIT — use it, fork it, build on it.


Built by @Usama1909
Founding benchmark measured on ARIA — a live autonomous financial intelligence system

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxlens-1.0.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxlens-1.0.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file ctxlens-1.0.0.tar.gz.

File metadata

  • Download URL: ctxlens-1.0.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7b11691fddfede36111c94bc2f0e045db43dc0588b2dd0f9616a4cb189c354b6
MD5 32f520d890cf0dd79c598cf883317d04
BLAKE2b-256 edb9e7ed4dee079cb0f45b20303e9f7579ebef1c909d784cebf1a82d27066fb0

See more details on using hashes here.

File details

Details for the file ctxlens-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ctxlens-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8387582d4f10bbd652f523d4a0fa6afb2878392df9958417d2ddf7945b43383
MD5 2d8a33639e067ae48df43915485b99aa
BLAKE2b-256 fc43c236795da6e13361d3684745b68592fd7ccbec9b41076d2e0c99d5c0e29a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page