The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.

These details have not been verified by PyPI

Project links

Project description

ContextLens

The context compression protocol for LLM inference.

80,000+ words saved. 94% meaning retained. One line of code.

The Problem

Every time you call Claude or GPT, you're sending the same context over and over again. Repeated messages, duplicate code blocks, redundant explanations — all costing you tokens.

A typical 20-message conversation has ~70% redundant content.

The Fix

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))

# That's it. Every API call is now automatically compressed.
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

One line. Zero changes to your existing code. Drop-in replacement.

Results

Real benchmark across 3 production datasets (coding, agent loops, research):

Method	Token Reduction	Fidelity	Latency
No compression	0%	64.1%	0.0ms
Simple truncation	39.7%	72.4%	0.0ms
ctxlens balanced	66.8%	67.5%	0.2ms

1.7x more token reduction than simple truncation
83.8% fidelity on agent loops — beats truncation
0.2ms latency after model warmup — negligible overhead
80,500+ words saved across 395 real conversations
100% fact retention on agent tasks

→ See full benchmark methodology

Install

pip install ctxlens

With semantic compression (recommended):

pip install ctxlens[semantic]

Usage

Anthropic

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

# See how much you saved
print(client.savings)
# {
#   'calls': 1,
#   'tokens_saved_estimate': 847,
#   'redundancy_pct': 73.2,
#   'cost_saved_gbp': 0.0025
# }

OpenAI

import openai
import ctxlens as cx

client = cx.wrap(openai.OpenAI(api_key="..."))
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}]
)

print(client.savings)

Async (AsyncAnthropic / AsyncOpenAI)

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.AsyncAnthropic(api_key="..."))

async def main():
    response = await client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1000,
        messages=[{"role": "user", "content": "..."}]
    )
    print(client.savings)

Agent loops

import ctxlens as cx

# Wrap your agent — prevents context limit failures on long runs
agent = cx.wrap_agent(your_agent, budget="economic")
result = agent.run("your task here")

Direct compression

from ctxlens import ContextLens

engine = ContextLens(budget="balanced", show_savings=True)

messages = [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."},
]

result = engine.compress(messages)
print(f"Saved: {result.tokens_estimated_saved} tokens")
print(f"Fidelity: {result.fidelity_score * 100:.1f}% meaning retained")

Compression budgets

Budget	Aggressiveness	Best for
`economic`	High	Long agent loops, cost-sensitive apps
`balanced`	Medium	General use (default)
`precise`	Low	When accuracy is critical

client = cx.wrap(anthropic.Anthropic(), budget="economic")

How it works

ContextLens runs three compression stages:

Exact deduplication — removes identical repeated messages (~0ms overhead)
Semantic triage — scores every message by relevance to the current query using all-MiniLM-L6-v2 locally — zero external API calls
Agent-aware compression — classifies messages by type (goal, error, tool_call, reasoning) and applies type-specific rules

The fidelity score measures how much meaning was retained after compression. A score of 0.95 means 95% of the semantic content was preserved.

Chrome Extension

ContextLens also comes as a Chrome extension that works directly in your browser on Claude, ChatGPT, Gemini, DeepSeek, and Perplexity — no API key needed.

Auto-compresses when context reaches 75%
Shows meaning retained score after every response
Stores memory across conversations and platforms
Export your memory as JSON

Roadmap

GitHub pre-fetch filter (reduce token usage in agentic coding)
Project memory injection (auto-inject context into new chats)
Node.js SDK
MCP server for Claude Code

License

MIT — free for personal and commercial use.

Author

Built by Usama Fateh Ali as part of ARIA — an autonomous financial intelligence system.

"93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste."

Real Production Measurement

Measured on ARIA — a live autonomous financial intelligence system running 1,440 decision cycles per day:

Metric	Value
Token redundancy detected	97.6%
Tokens before compression	1,950
Tokens after compression	46
Cost per cycle	£0.0057 → £0.00014
Monthly saving at scale	£100-500+

Same decisions. 97.6% cheaper.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.3

May 23, 2026

1.0.2

May 21, 2026

1.0.1

May 21, 2026

1.0.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxlens-1.0.3.tar.gz (19.2 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ctxlens-1.0.3-py3-none-any.whl (18.4 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file ctxlens-1.0.3.tar.gz.

File metadata

Download URL: ctxlens-1.0.3.tar.gz
Upload date: May 23, 2026
Size: 19.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0919f1e118fc744f20e0031be30381c807af14b15d5dac3d277af6f83176e106`
MD5	`db9e744ba3b6370487eec23475943720`
BLAKE2b-256	`f40b5b4a30e27409d1a593244fd051bb2c7ce67447882d0b19075035badbc8fd`

See more details on using hashes here.

File details

Details for the file ctxlens-1.0.3-py3-none-any.whl.

File metadata

Download URL: ctxlens-1.0.3-py3-none-any.whl
Upload date: May 23, 2026
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eeabd08e86175d20d34ac79fe7bd57d57f11dc08fb8b516f4f432bbfce17b6af`
MD5	`c0d0c1e7f900cc5a9d3af947177c6570`
BLAKE2b-256	`68452cd8667490105f1325942450d853e19895904b06be13f20e6b98b7d09b9f`

See more details on using hashes here.

ctxlens 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextLens

The Problem

The Fix

Results

Install

Usage

Anthropic

OpenAI

Async (AsyncAnthropic / AsyncOpenAI)

Agent loops

Direct compression

Compression budgets

How it works

Chrome Extension

Roadmap

License

Author

Real Production Measurement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes