Skip to main content

The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.

Project description

ContextLens

The context compression protocol for LLM inference.

80,000+ words saved. 94% meaning retained. One line of code.

PyPI version Python 3.9+ License: MIT


The Problem

Every time you call Claude or GPT, you're sending the same context over and over again. Repeated messages, duplicate code blocks, redundant explanations — all costing you tokens.

A typical 20-message conversation has ~70% redundant content.

The Fix

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))

# That's it. Every API call is now automatically compressed.
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

One line. Zero changes to your existing code. Drop-in replacement.


Results

Real benchmark across 3 production datasets (coding, agent loops, research):

Method Token Reduction Fidelity Latency
No compression 0% 64.1% 0.0ms
Simple truncation 39.7% 72.4% 0.0ms
ctxlens balanced 66.8% 67.5% 0.2ms
  • 1.7x more token reduction than simple truncation
  • 83.8% fidelity on agent loops — beats truncation
  • 0.2ms latency after model warmup — negligible overhead
  • 80,500+ words saved across 395 real conversations
  • 100% fact retention on agent tasks

See full benchmark methodology


Install

pip install ctxlens

With semantic compression (recommended):

pip install ctxlens[semantic]

Usage

Anthropic

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

# See how much you saved
print(client.savings)
# {
#   'calls': 1,
#   'tokens_saved_estimate': 847,
#   'redundancy_pct': 73.2,
#   'cost_saved_gbp': 0.0025
# }

OpenAI

import openai
import ctxlens as cx

client = cx.wrap(openai.OpenAI(api_key="..."))
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}]
)

print(client.savings)

Async (AsyncAnthropic / AsyncOpenAI)

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.AsyncAnthropic(api_key="..."))

async def main():
    response = await client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1000,
        messages=[{"role": "user", "content": "..."}]
    )
    print(client.savings)

Agent loops

import ctxlens as cx

# Wrap your agent — prevents context limit failures on long runs
agent = cx.wrap_agent(your_agent, budget="economic")
result = agent.run("your task here")

Direct compression

from ctxlens import ContextLens

engine = ContextLens(budget="balanced", show_savings=True)

messages = [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."},
]

result = engine.compress(messages)
print(f"Saved: {result.tokens_estimated_saved} tokens")
print(f"Fidelity: {result.fidelity_score * 100:.1f}% meaning retained")

Compression budgets

Budget Aggressiveness Best for
economic High Long agent loops, cost-sensitive apps
balanced Medium General use (default)
precise Low When accuracy is critical
client = cx.wrap(anthropic.Anthropic(), budget="economic")

How it works

ContextLens runs three compression stages:

  1. Exact deduplication — removes identical repeated messages (~0ms overhead)
  2. Semantic triage — scores every message by relevance to the current query using all-MiniLM-L6-v2 locally — zero external API calls
  3. Agent-aware compression — classifies messages by type (goal, error, tool_call, reasoning) and applies type-specific rules

The fidelity score measures how much meaning was retained after compression. A score of 0.95 means 95% of the semantic content was preserved.


Chrome Extension

ContextLens also comes as a Chrome extension that works directly in your browser on Claude, ChatGPT, Gemini, DeepSeek, and Perplexity — no API key needed.

  • Auto-compresses when context reaches 75%
  • Shows meaning retained score after every response
  • Stores memory across conversations and platforms
  • Export your memory as JSON

Roadmap

  • GitHub pre-fetch filter (reduce token usage in agentic coding)
  • Project memory injection (auto-inject context into new chats)
  • Node.js SDK
  • MCP server for Claude Code

License

MIT — free for personal and commercial use.


Author

Built by Usama Fateh Ali as part of ARIA — an autonomous financial intelligence system.

"93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste."

Real Production Measurement

Measured on ARIA — a live autonomous financial intelligence system running 1,440 decision cycles per day:

Metric Value
Token redundancy detected 97.6%
Tokens before compression 1,950
Tokens after compression 46
Cost per cycle £0.0057 → £0.00014
Monthly saving at scale £100-500+

Same decisions. 97.6% cheaper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxlens-1.0.3.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxlens-1.0.3-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file ctxlens-1.0.3.tar.gz.

File metadata

  • Download URL: ctxlens-1.0.3.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.3.tar.gz
Algorithm Hash digest
SHA256 0919f1e118fc744f20e0031be30381c807af14b15d5dac3d277af6f83176e106
MD5 db9e744ba3b6370487eec23475943720
BLAKE2b-256 f40b5b4a30e27409d1a593244fd051bb2c7ce67447882d0b19075035badbc8fd

See more details on using hashes here.

File details

Details for the file ctxlens-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: ctxlens-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 eeabd08e86175d20d34ac79fe7bd57d57f11dc08fb8b516f4f432bbfce17b6af
MD5 c0d0c1e7f900cc5a9d3af947177c6570
BLAKE2b-256 68452cd8667490105f1325942450d853e19895904b06be13f20e6b98b7d09b9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page