Skip to main content

Lightweight context window management for AI agents

Project description

agent-context-manager

A lightweight Python library for managing LLM context windows in AI agents. Prevents context overflow, reduces token costs, and maintains conversation coherence.

The Problem

AI agents face a critical challenge: context windows fill up fast. When they overflow:

  • Costs explode - token usage grows exponentially
  • Performance degrades - LLMs struggle with long contexts ("lost in the middle")
  • Coherence breaks - agents forget important context while keeping noise

Current solutions are either too complex (require LLM calls for summarization) or too naive (just truncate old messages).

The Solution

agent-context-manager provides intelligent context compression without requiring additional LLM calls:

  • Token-aware management - Track usage, warn before overflow
  • Multiple compression strategies - Choose what fits your use case
  • Framework agnostic - Works with any LLM provider
  • Zero LLM dependencies - No API calls needed for compression

Installation

pip install agent-context-manager

Quick Start

from agent_context_manager import ContextManager, SlidingWindowStrategy

# Create a context manager with 8K token limit
manager = ContextManager(
    max_tokens=8000,
    strategy=SlidingWindowStrategy(keep_system=True, keep_recent=10)
)

# Add messages as your agent works
manager.add_message({"role": "system", "content": "You are a helpful assistant."})
manager.add_message({"role": "user", "content": "Hello!"})
manager.add_message({"role": "assistant", "content": "Hi there!"})

# Get compressed context when needed
context = manager.get_context()

# Check token usage
print(f"Tokens used: {manager.token_count}/{manager.max_tokens}")

Compression Strategies

1. Sliding Window (Default)

Keeps the most recent N messages, always preserving system messages.

from agent_context_manager import SlidingWindowStrategy

strategy = SlidingWindowStrategy(
    keep_system=True,      # Always keep system messages
    keep_recent=20,        # Keep last 20 messages
    keep_first_user=True   # Keep the original user request
)

2. Importance Scoring

Scores messages by relevance and keeps the most important ones.

from agent_context_manager import ImportanceStrategy

strategy = ImportanceStrategy(
    system_weight=1.0,     # System messages always kept
    user_weight=0.8,       # User messages high priority
    assistant_weight=0.6,  # Assistant messages medium priority
    tool_weight=0.4,       # Tool results lower priority
    recency_decay=0.95     # Recent messages score higher
)

3. Semantic Deduplication

Removes near-duplicate messages to reduce redundancy.

from agent_context_manager import DeduplicationStrategy

strategy = DeduplicationStrategy(
    similarity_threshold=0.85,  # Remove if >85% similar
    keep_latest=True            # Keep the most recent of duplicates
)

4. Hybrid (Recommended for Production)

Combines multiple strategies for best results.

from agent_context_manager import HybridStrategy

strategy = HybridStrategy([
    DeduplicationStrategy(similarity_threshold=0.9),
    ImportanceStrategy(recency_decay=0.95),
    SlidingWindowStrategy(keep_recent=50)
])

Token Counting

Built-in token counting for popular models:

from agent_context_manager import ContextManager

# Auto-detect tokenizer based on model
manager = ContextManager(max_tokens=8000, model="gpt-4")
manager = ContextManager(max_tokens=100000, model="claude-3")

# Or use a custom tokenizer
manager = ContextManager(
    max_tokens=8000,
    tokenizer=my_custom_tokenizer
)

Overflow Handling

from agent_context_manager import ContextManager, OverflowPolicy

manager = ContextManager(
    max_tokens=8000,
    overflow_policy=OverflowPolicy.COMPRESS,  # Auto-compress when near limit
    overflow_threshold=0.9  # Compress at 90% capacity
)

# Or get warnings instead
manager = ContextManager(
    max_tokens=8000,
    overflow_policy=OverflowPolicy.WARN
)

# Check status
if manager.is_near_overflow():
    print(f"Warning: {manager.usage_percent}% of context used")

Memory Blocks (Structured Context)

Organize context into logical blocks with size limits:

from agent_context_manager import ContextManager, MemoryBlock

manager = ContextManager(max_tokens=8000)

# Define memory blocks
manager.add_block(MemoryBlock(
    name="system",
    max_tokens=500,
    priority=1.0,  # Highest priority, never compressed
    content="You are a helpful coding assistant."
))

manager.add_block(MemoryBlock(
    name="user_profile",
    max_tokens=200,
    priority=0.9,
    content="User prefers Python, uses VS Code."
))

manager.add_block(MemoryBlock(
    name="conversation",
    max_tokens=7000,
    priority=0.5,  # Can be compressed if needed
    strategy=SlidingWindowStrategy(keep_recent=30)
))

# Update blocks as needed
manager.update_block("user_profile", "User prefers Python, uses VS Code, timezone: PST")

Integration Examples

With OpenAI

from openai import OpenAI
from agent_context_manager import ContextManager

client = OpenAI()
manager = ContextManager(max_tokens=8000, model="gpt-4")

manager.add_message({"role": "system", "content": "You are helpful."})

while True:
    user_input = input("You: ")
    manager.add_message({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=manager.get_context()  # Auto-compressed if needed
    )
    
    assistant_message = response.choices[0].message.content
    manager.add_message({"role": "assistant", "content": assistant_message})
    print(f"Assistant: {assistant_message}")

With Anthropic

from anthropic import Anthropic
from agent_context_manager import ContextManager

client = Anthropic()
manager = ContextManager(max_tokens=100000, model="claude-3")

# Same pattern works with any provider

With LangChain

from langchain.memory import ConversationBufferMemory
from agent_context_manager import ContextManager, LangChainAdapter

manager = ContextManager(max_tokens=8000)
memory = LangChainAdapter(manager)  # Drop-in replacement

API Reference

ContextManager

Method Description
add_message(msg) Add a message to context
get_context() Get compressed context as message list
token_count Current token count
usage_percent Percentage of context used
is_near_overflow() Check if approaching limit
compress() Manually trigger compression
clear() Clear all messages

Strategies

Strategy Best For
SlidingWindowStrategy Simple agents, chatbots
ImportanceStrategy Complex agents with tool use
DeduplicationStrategy Repetitive workflows
HybridStrategy Production systems

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

License

MIT License - see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_context_manager-0.1.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_context_manager-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file agent_context_manager-0.1.0.tar.gz.

File metadata

  • Download URL: agent_context_manager-0.1.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agent_context_manager-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0bde8c10b47d97624f12f4358e86dd031cb83131b952a41e83fb8b4fd6dbdb83
MD5 a12b73f41f5fc4029b212f522d418831
BLAKE2b-256 4cae4d5f5bc7b6a74b81685c4e40c3c386ce7fb226fce9eeca57109148317df2

See more details on using hashes here.

File details

Details for the file agent_context_manager-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_context_manager-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c294b946365ca13d401ebcd8f1eccd257e3589c6b5f91cebe8f520469d77ad57
MD5 f5e2274d725d5f8f02d978cc1ff6c30f
BLAKE2b-256 077384a6609abe0608d5dcfd8510d298909e8a1c9d2e52c9b6f865c5336c8422

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page