Skip to main content

The context compression protocol for LLM inference. Eliminate 93% token redundancy in one line.

Project description

ContextLens

The context compression protocol for LLM inference.

80,000+ words saved. 94% meaning retained. One line of code.

PyPI version Python 3.9+ License: MIT


The Problem

Every time you call Claude or GPT, you're sending the same context over and over again. Repeated messages, duplicate code blocks, redundant explanations — all costing you tokens.

A typical 20-message conversation has ~70% redundant content.

The Fix

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))

# That's it. Every API call is now automatically compressed.
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

One line. Zero changes to your existing code. Drop-in replacement.


Results

Metric Value
Token redundancy eliminated up to 93%
Meaning retained (fidelity score) 94.9%
Words saved (real usage, 297 conversations) 80,500+
Latency overhead ~2ms
API compatibility Anthropic, OpenAI, async support

Install

pip install ctxlens

With semantic compression (recommended):

pip install ctxlens[semantic]

Usage

Anthropic

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.Anthropic(api_key="..."))
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "..."}]
)

# See how much you saved
print(client.savings)
# {
#   'calls': 1,
#   'tokens_saved_estimate': 847,
#   'redundancy_pct': 73.2,
#   'cost_saved_gbp': 0.0025
# }

OpenAI

import openai
import ctxlens as cx

client = cx.wrap(openai.OpenAI(api_key="..."))
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}]
)

print(client.savings)

Async (AsyncAnthropic / AsyncOpenAI)

import anthropic
import ctxlens as cx

client = cx.wrap(anthropic.AsyncAnthropic(api_key="..."))

async def main():
    response = await client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1000,
        messages=[{"role": "user", "content": "..."}]
    )
    print(client.savings)

Agent loops

import ctxlens as cx

# Wrap your agent — prevents context limit failures on long runs
agent = cx.wrap_agent(your_agent, budget="economic")
result = agent.run("your task here")

Direct compression

from ctxlens import ContextLens

engine = ContextLens(budget="balanced", show_savings=True)

messages = [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."},
]

result = engine.compress(messages)
print(f"Saved: {result.tokens_estimated_saved} tokens")
print(f"Fidelity: {result.fidelity_score * 100:.1f}% meaning retained")

Compression budgets

Budget Aggressiveness Best for
economic High Long agent loops, cost-sensitive apps
balanced Medium General use (default)
precise Low When accuracy is critical
client = cx.wrap(anthropic.Anthropic(), budget="economic")

How it works

ContextLens runs three compression stages:

  1. Exact deduplication — removes identical repeated messages (~0ms overhead)
  2. Semantic triage — scores every message by relevance to the current query using all-MiniLM-L6-v2 locally — zero external API calls
  3. Agent-aware compression — classifies messages by type (goal, error, tool_call, reasoning) and applies type-specific rules

The fidelity score measures how much meaning was retained after compression. A score of 0.95 means 95% of the semantic content was preserved.


Chrome Extension

ContextLens also comes as a Chrome extension that works directly in your browser on Claude, ChatGPT, Gemini, DeepSeek, and Perplexity — no API key needed.

  • Auto-compresses when context reaches 75%
  • Shows meaning retained score after every response
  • Stores memory across conversations and platforms
  • Export your memory as JSON

Roadmap

  • GitHub pre-fetch filter (reduce token usage in agentic coding)
  • Project memory injection (auto-inject context into new chats)
  • Node.js SDK
  • MCP server for Claude Code

License

MIT — free for personal and commercial use.


Author

Built by Usama Fateh Ali as part of ARIA — an autonomous financial intelligence system.

"93% of tokens sent to LLMs are identical repeated data. ContextLens eliminates that waste."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxlens-1.0.1.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxlens-1.0.1-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file ctxlens-1.0.1.tar.gz.

File metadata

  • Download URL: ctxlens-1.0.1.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8ccf23a20623f602a2cc573918f12db2dbb387f31ccfe05f1e1d1b565a1ca870
MD5 228b7845a1bfe2c1581a459763f9db75
BLAKE2b-256 89c77f6c3cb06b37aa3267a195af5db4483f91ad16dd79b01e7f3c096e515f13

See more details on using hashes here.

File details

Details for the file ctxlens-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: ctxlens-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ctxlens-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 16c0d9bea4b59e074c6e5889db4421ac2ab8c30134cfa2a6cb084d61bd6ffa9e
MD5 4aa8deb4e15d59a0ee70563dd75169d5
BLAKE2b-256 b87c89f15c89a99211fd5fdfc7cf7a3099f07de2455be0596eb5ad7e23ed64c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page