Skip to main content

Smart context compression for LLM agents — preserve critical info, reduce tokens 40-80%

Project description

Agent Context Compressor 🗜️

Smart context compression for LLM agents. Preserves critical information (decisions, errors, preferences, code) while reducing token usage by 40-80%.

Problem

LLM context windows are expensive. 90% of agent conversations contain filler — greetings, acknowledgments, repeated tool outputs, verbose explanations. But naive truncation drops critical info like decisions, errors, and user preferences.

Solution

Context Compressor scores every message by importance, then strategically drops low-value content while guaranteeing critical information is preserved.

Before: 603 tokens, 14 messages
After:  410 tokens,  5 messages (32% compressed, 98.7% confidence)

Features

  • 🧠 Smart scoring — classifies messages as decisions, errors, code, preferences, noise
  • 🔒 Critical preservation — decisions, errors, preferences NEVER dropped
  • 🔄 Deduplication — removes near-duplicate messages (Jaccard similarity)
  • 📝 Summarization — long messages compressed instead of dropped
  • 📊 Confidence tracking — know exactly how much info is preserved
  • 🛠️ Multiple interfaces — Python library, CLI, REST API

Installation

pip install agent-ctx-compress

With optional dependencies:

# For API server
pip install agent-ctx-compress[server]

# For accurate token counting
pip install agent-ctx-compress[tiktoken]

# For LLM summarization
pip install agent-ctx-compress[openai]

# Everything
pip install agent-ctx-compress[all]

Quick Start

from context_compressor import compress

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "yo"},
    {"role": "assistant", "content": "Hey! How can I help?"},
    {"role": "user", "content": "deploy the app to production"},
    {"role": "assistant", "content": "Deployed! Status: ✅ running"},
    {"role": "user", "content": "thanks"},
    {"role": "assistant", "content": "👍"},
]

result = compress(messages, target_ratio=0.3)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
print(f"Confidence: {result.confidence:.0%}")
print(result.compressed)

CLI

# Install from PyPI
pip install agent-ctx-compress

# Compress from file
cat conversation.json | ctxcompress --ratio 0.3

# Compress with stats
ctxcompress --input chat.json --ratio 0.2 --format full

# Stats only
ctxcompress --input chat.json --stats-only

# JSON output
ctxcompress --input chat.json --json

API Server

# Install with server deps
pip install agent-ctx-compress[server]

# Start server
ctxcompress-server
# → http://localhost:8000

# Compress via API
curl -X POST http://localhost:8000/compress \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "hello"},
      {"role": "assistant", "content": "Hi!"},
      {"role": "user", "content": "deploy app"},
      {"role": "assistant", "content": "Deployed successfully"}
    ],
    "target_ratio": 0.3
  }'

How It Works

Input Messages
     │
     ▼
┌─────────────┐
│   Scorer    │ Classify: decision/error/code/preference/noise
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Deduplicat │ Remove near-duplicate messages
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Merge Tool │ Combine consecutive tool results
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Priority   │ Drop lowest-scored until target ratio
│  Drop       │ Preserve CRITICAL messages always
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Summarize   │ Compress long messages instead of dropping
└──────┬──────┘
       │
       ▼
Compressed Context + Metadata

Scoring Categories

Category Importance Droppable Examples
system CRITICAL System prompts
decision CRITICAL "Let's use approach A"
error CRITICAL Error messages, fixes
preference CRITICAL "I prefer dark mode"
code HIGH ⚠️ Code blocks, scripts
tool_result HIGH ⚠️ API responses, outputs
user_query HIGH ⚠️ User questions
structured_response MEDIUM Lists, explanations
explanation MEDIUM Long explanations
brief_response LOW Short replies
noise NOISE "yo", "sip", "👍"

Use Cases

  1. Agent context management — Keep conversations within token limits
  2. Cost optimization — Reduce API costs by 40-80%
  3. Session handoff — Compress before switching models
  4. Memory systems — Store compressed conversation summaries
  5. Multi-agent — Share compressed context between agents

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_ctx_compress-0.2.0.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_ctx_compress-0.2.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_ctx_compress-0.2.0.tar.gz.

File metadata

  • Download URL: agent_ctx_compress-0.2.0.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for agent_ctx_compress-0.2.0.tar.gz
Algorithm Hash digest
SHA256 67251cd14a3e4b5fe225c1995a9cdda67e9797fe8029114039558c12d5c24672
MD5 a5788bf4d1bc257efe05d9794d58a49f
BLAKE2b-256 a66eef404946bb925a0c882955cfce935f6434bdab4e7295471ad0d85c4e80ab

See more details on using hashes here.

File details

Details for the file agent_ctx_compress-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_ctx_compress-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4e5b81b297be9896e395d1f1af5797e3f39f1f3b37534a580ccdebb7a4a0692
MD5 430878ffe72e20413bf414ec0790bd7a
BLAKE2b-256 f33d3d5b04b32d2920c88d36ec08b110b7204455324595235f68a77c72e24290

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page