The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.

These details have not been verified by PyPI

Project links

Project description

ContextLens

93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste.

The Problem

Every LLM application wastes tokens. Not 10%. Not 20%. 93%.

We measured a real production AI system (a live financial intelligence platform making thousands of decisions per day) and found:

Metric	Value
Total messages sent	8,584
Unique messages	599
Total characters sent	282,981
Wasted characters	263,234
Redundancy	93.0%

The same reasoning, same context, same instructions — sent hundreds of times. Your LLM reads it fresh every single time. You pay for every single token.

This gets worse with agents. A 50-step agent loop can consume 800,000 tokens to complete a task that needs 50,000 tokens of actual information. That is 94% waste on every complex task.

The Solution

ContextLens is a context compression protocol that sits between your code and any LLM API. It intercepts every request, eliminates redundancy, and forwards only what the model actually needs.

One line. Zero configuration. Works with your existing code.

# Before
import anthropic
client = anthropic.Anthropic(api_key="...")

# After — one line change
import contextlens as cx
client = cx.wrap(anthropic.Anthropic(api_key="..."))

# Everything else stays identical
# Your costs drop immediately

What It Does

Layer 1 — Semantic Triage

Scores every message in your conversation history for relevance to the current prompt. Irrelevant history is compressed or archived. The model only sees what matters right now.

Score > 0.8  →  Sent to model (Hot)
Score 0.3-0.8 →  Compressed to summary (Warm)  
Score < 0.3  →  Archived locally (Cold)

Layer 2 — Deduplication Engine

Identical or near-identical content is stored once and referenced. If your agent re-reads the same system context 200 times, it is sent once and cached.

"Regime:CRISIS Score:-45.4 Top:GLD" × 202 times
→ "Regime:CRISIS [stable, 202 cycles, Score:-45.4]" × 1 time

Layer 3 — Agent State Machine

Agent loops are the worst offenders. ContextLens understands agent-specific message types and applies intelligent rules automatically.

GOAL        →  Always kept (never removed)
TOOL_RESULT →  Kept if referenced in last 3 steps, else summarised
TOOL_CALL   →  Only most recent per tool type kept
REASONING   →  Last 5 steps kept, rest archived
ERROR       →  Count + last error only ("Failed 3x: last=X")

Layer 4 — Prompt Cache Integration

Stable context blocks are automatically flagged for provider-side caching. You pay full price once. Every repeat is 90% cheaper.

Results

Use Case	Tokens Before	Tokens After	Reduction
Long conversation	40,000	8,000	80%
50-step agent loop	800,000	120,000	85%
Code assistant session	60,000	9,000	85%
Production AI system*	282,981 chars	19,747 chars	93%

*Measured on real production data

Installation

pip install contextlens

Usage

Basic — Anthropic

import anthropic
import contextlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))

# Identical to normal usage — compression is automatic
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Basic — OpenAI

import openai
import contextlens as cx

client = cx.wrap(openai.OpenAI(api_key="..."))
# Works identically

Agent Loops

import contextlens as cx

# Wrap your existing agent — zero other changes
agent = cx.wrap_agent(your_langchain_agent)

# Agent now never hits context limits
# Long tasks cost same as short tasks
result = agent.run("Analyse this 10,000 line codebase and fix all bugs")

Token Budget Dial

# Economic — aggressive compression, maximum savings
client = cx.wrap(client, budget="economic")

# Balanced — smart compression, preserves nuance  
client = cx.wrap(client, budget="balanced")

# Precise — minimal compression, maximum accuracy
client = cx.wrap(client, budget="precise")

See Your Savings

import contextlens as cx

client = cx.wrap(client, show_savings=True)

# After each call:
# ─────────────────────────────────────
# ContextLens | This request
#   Sent:     4,847 tokens (↓ from 12,203)
#   Saved:    7,356 tokens  (~£0.018)
#   Session:  £1.24 saved | 147g CO₂ avoided
# ─────────────────────────────────────

For Infrastructure / Data Centers

ContextLens runs as a drop-in proxy in front of any inference server.

docker run -p 8080:8080 \
  -e UPSTREAM=http://your-vllm-instance:8000 \
  contextlens/proxy:latest

Change one line in your application:

ANTHROPIC_BASE_URL=https://your-proxy:8080

Result: Every GPU on your cluster handles more concurrent users. Same hardware. Less energy. More revenue per chip.

vLLM Plugin (coming in v0.3)

vllm serve meta-llama/Llama-3-70b --contextlens-plugin enabled

Context is compressed before the KV cache is allocated. The GPU never processes redundant tokens.

The Carbon Impact

Every redundant token burns real energy. At scale:

10 million API calls/day × 60% average compression = 6 million fewer GPU-seconds per day
Equivalent to powering a city block for a year — eliminated entirely
ContextLens tracks your CO₂ avoided in real time

AI inference is the fastest growing slice of global electricity consumption. Context waste is the fastest fix.

The Open Protocol: CXP

ContextLens is built on the Context Exchange Protocol (CXP) — an open specification for how context moves between applications and language models.

The spec is free. Anyone can implement it. Any model provider can support it.

→ Read the CXP v0.1 Specification

Benchmarks

All benchmarks are reproducible. The methodology is open source.

→ See benchmark methodology
→ Run benchmarks on your own data

Roadmap

Version	Feature	Status
v0.1	Deduplication engine + basic proxy	🔨 Building
v0.2	Semantic triage (MiniLM embeddings)	Planned
v0.3	Agent state machine	Planned
v0.4	Prompt cache integration	Planned
v1.0	Accuracy guarantee + CXP spec final	Planned
v2.0	vLLM hardware plugin	Planned

Why Free

Context waste is an infrastructure problem that affects every developer, every company, and the planet. A solution locked behind a paywall does not fix the infrastructure.

ContextLens is free for developers. Forever.

Infrastructure deployments (data centers, GPU clouds, enterprise on-premise) are the paid tier. They save millions. They can pay.

Contributing

The protocol is open. Implementations are welcome.

git clone https://github.com/Usama1909/contextlens
cd contextlens
pip install -e ".[dev]"
pytest tests/

→ Contribution Guide

License

MIT — use it, fork it, build on it.

Built by @Usama1909
Founding benchmark measured on ARIA — a live autonomous financial intelligence system

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

May 23, 2026

1.0.2

May 21, 2026

1.0.1

May 21, 2026

This version

1.0.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxlens-1.0.0.tar.gz (20.7 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ctxlens-1.0.0-py3-none-any.whl (18.6 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file ctxlens-1.0.0.tar.gz.

File metadata

Download URL: ctxlens-1.0.0.tar.gz
Upload date: May 21, 2026
Size: 20.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7b11691fddfede36111c94bc2f0e045db43dc0588b2dd0f9616a4cb189c354b6`
MD5	`32f520d890cf0dd79c598cf883317d04`
BLAKE2b-256	`edb9e7ed4dee079cb0f45b20303e9f7579ebef1c909d784cebf1a82d27066fb0`

See more details on using hashes here.

File details

Details for the file ctxlens-1.0.0-py3-none-any.whl.

File metadata

Download URL: ctxlens-1.0.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 18.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8387582d4f10bbd652f523d4a0fa6afb2878392df9958417d2ddf7945b43383`
MD5	`2d8a33639e067ae48df43915485b99aa`
BLAKE2b-256	`fc43c236795da6e13361d3684745b68592fd7ccbec9b41076d2e0c99d5c0e29a`

See more details on using hashes here.

ctxlens 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextLens

The Problem

The Solution

What It Does

Layer 1 — Semantic Triage

Layer 2 — Deduplication Engine

Layer 3 — Agent State Machine

Layer 4 — Prompt Cache Integration

Results

Installation

Usage

Basic — Anthropic

Basic — OpenAI

Agent Loops

Token Budget Dial

See Your Savings

For Infrastructure / Data Centers

vLLM Plugin (coming in v0.3)

The Carbon Impact

The Open Protocol: CXP

Benchmarks

Roadmap

Why Free

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes