Skip to main content

Smart context compression for LLM agents — preserve critical info, reduce tokens 40-80%

Project description

Agent Context Compressor 🗜️

Smart context compression for LLM agents. Preserves critical information (decisions, errors, preferences, code) while reducing token usage by 40-80%.

Problem

LLM context windows are expensive. 90% of agent conversations contain filler — greetings, acknowledgments, repeated tool outputs, verbose explanations. But naive truncation drops critical info like decisions, errors, and user preferences.

Solution

Context Compressor scores every message by importance, then strategically drops low-value content while guaranteeing critical information is preserved.

Before: 603 tokens, 14 messages
After:  410 tokens,  5 messages (32% compressed, 98.7% confidence)

Features

  • 🧠 Smart scoring — classifies messages as decisions, errors, code, preferences, noise
  • 🔒 Critical preservation — decisions, errors, preferences NEVER dropped
  • 🔄 Deduplication — removes near-duplicate messages (Jaccard similarity)
  • 📝 Summarization — long messages compressed instead of dropped
  • 📊 Confidence tracking — know exactly how much info is preserved
  • 🛠️ Multiple interfaces — Python library, CLI, REST API

Quick Start

from context_compressor import compress

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "yo"},
    {"role": "assistant", "content": "Hey! How can I help?"},
    {"role": "user", "content": "deploy the app to production"},
    {"role": "assistant", "content": "Deployed! Status: ✅ running"},
    {"role": "user", "content": "thanks"},
    {"role": "assistant", "content": "👍"},
]

result = compress(messages, target_ratio=0.3)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
print(f"Confidence: {result.confidence:.0%}")
print(result.compressed)

CLI

# Install
pip install -e .

# Compress from file
cat conversation.json | ctxcompress --ratio 0.3

# Compress with stats
ctxcompress --input chat.json --ratio 0.2 --format full

# Stats only
ctxcompress --input chat.json --stats-only

# JSON output
ctxcompress --input chat.json --json

API Server

# Install with server deps
pip install -e ".[server]"

# Start server
ctxcompress-server
# → http://localhost:8000

# Compress via API
curl -X POST http://localhost:8000/compress \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "hello"},
      {"role": "assistant", "content": "Hi!"},
      {"role": "user", "content": "deploy app"},
      {"role": "assistant", "content": "Deployed successfully"}
    ],
    "target_ratio": 0.3
  }'

How It Works

Input Messages
     │
     ▼
┌─────────────┐
│   Scorer    │ Classify: decision/error/code/preference/noise
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Deduplicat │ Remove near-duplicate messages
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Merge Tool │ Combine consecutive tool results
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Priority   │ Drop lowest-scored until target ratio
│  Drop       │ Preserve CRITICAL messages always
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Summarize   │ Compress long messages instead of dropping
└──────┬──────┘
       │
       ▼
Compressed Context + Metadata

Scoring Categories

Category Importance Droppable Examples
system CRITICAL System prompts
decision CRITICAL "Let's use approach A"
error CRITICAL Error messages, fixes
preference CRITICAL "I prefer dark mode"
code HIGH ⚠️ Code blocks, scripts
tool_result HIGH ⚠️ API responses, outputs
user_query HIGH ⚠️ User questions
structured_response MEDIUM Lists, explanations
explanation MEDIUM Long explanations
brief_response LOW Short replies
noise NOISE "yo", "sip", "👍"

Use Cases

  1. Agent context management — Keep conversations within token limits
  2. Cost optimization — Reduce API costs by 40-80%
  3. Session handoff — Compress before switching models
  4. Memory systems — Store compressed conversation summaries
  5. Multi-agent — Share compressed context between agents

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_ctx_compress-0.1.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_ctx_compress-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file agent_ctx_compress-0.1.0.tar.gz.

File metadata

  • Download URL: agent_ctx_compress-0.1.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for agent_ctx_compress-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb5e36d6096bcfe9d10bc137923694d97bd43ffcfd74084df63499f0b3d6b2df
MD5 3b4cd5103e8d978b148aef3ee0ea1389
BLAKE2b-256 7fb6bb221d3c474fb436a768608f4163a1cd1c466e2a0e609d2908845e5edf4f

See more details on using hashes here.

File details

Details for the file agent_ctx_compress-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_ctx_compress-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f97f4860db4d76131cc47d60ef7b858333ea509b95dcb93a67c99fa4e73b076b
MD5 c3b987188644aece1998f7b1153df263
BLAKE2b-256 b599892e28f6ba3da80f5e80e557faede2783cfbfbcb44a139c03d0ff8f2dae6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page