Lightweight context window management for AI agents
Project description
agent-context-manager
A lightweight Python library for managing LLM context windows in AI agents. Prevents context overflow, reduces token costs, and maintains conversation coherence.
The Problem
AI agents face a critical challenge: context windows fill up fast. When they overflow:
- Costs explode - token usage grows exponentially
- Performance degrades - LLMs struggle with long contexts ("lost in the middle")
- Coherence breaks - agents forget important context while keeping noise
Current solutions are either too complex (require LLM calls for summarization) or too naive (just truncate old messages).
The Solution
agent-context-manager provides intelligent context compression without requiring additional LLM calls:
- Token-aware management - Track usage, warn before overflow
- Multiple compression strategies - Choose what fits your use case
- Framework agnostic - Works with any LLM provider
- Zero LLM dependencies - No API calls needed for compression
Installation
pip install agent-context-manager
Quick Start
from agent_context_manager import ContextManager, SlidingWindowStrategy
# Create a context manager with 8K token limit
manager = ContextManager(
max_tokens=8000,
strategy=SlidingWindowStrategy(keep_system=True, keep_recent=10)
)
# Add messages as your agent works
manager.add_message({"role": "system", "content": "You are a helpful assistant."})
manager.add_message({"role": "user", "content": "Hello!"})
manager.add_message({"role": "assistant", "content": "Hi there!"})
# Get compressed context when needed
context = manager.get_context()
# Check token usage
print(f"Tokens used: {manager.token_count}/{manager.max_tokens}")
Compression Strategies
1. Sliding Window (Default)
Keeps the most recent N messages, always preserving system messages.
from agent_context_manager import SlidingWindowStrategy
strategy = SlidingWindowStrategy(
keep_system=True, # Always keep system messages
keep_recent=20, # Keep last 20 messages
keep_first_user=True # Keep the original user request
)
2. Importance Scoring
Scores messages by relevance and keeps the most important ones.
from agent_context_manager import ImportanceStrategy
strategy = ImportanceStrategy(
system_weight=1.0, # System messages always kept
user_weight=0.8, # User messages high priority
assistant_weight=0.6, # Assistant messages medium priority
tool_weight=0.4, # Tool results lower priority
recency_decay=0.95 # Recent messages score higher
)
3. Semantic Deduplication
Removes near-duplicate messages to reduce redundancy.
from agent_context_manager import DeduplicationStrategy
strategy = DeduplicationStrategy(
similarity_threshold=0.85, # Remove if >85% similar
keep_latest=True # Keep the most recent of duplicates
)
4. Hybrid (Recommended for Production)
Combines multiple strategies for best results.
from agent_context_manager import HybridStrategy
strategy = HybridStrategy([
DeduplicationStrategy(similarity_threshold=0.9),
ImportanceStrategy(recency_decay=0.95),
SlidingWindowStrategy(keep_recent=50)
])
Token Counting
Built-in token counting for popular models:
from agent_context_manager import ContextManager
# Auto-detect tokenizer based on model
manager = ContextManager(max_tokens=8000, model="gpt-4")
manager = ContextManager(max_tokens=100000, model="claude-3")
# Or use a custom tokenizer
manager = ContextManager(
max_tokens=8000,
tokenizer=my_custom_tokenizer
)
Overflow Handling
from agent_context_manager import ContextManager, OverflowPolicy
manager = ContextManager(
max_tokens=8000,
overflow_policy=OverflowPolicy.COMPRESS, # Auto-compress when near limit
overflow_threshold=0.9 # Compress at 90% capacity
)
# Or get warnings instead
manager = ContextManager(
max_tokens=8000,
overflow_policy=OverflowPolicy.WARN
)
# Check status
if manager.is_near_overflow():
print(f"Warning: {manager.usage_percent}% of context used")
Memory Blocks (Structured Context)
Organize context into logical blocks with size limits:
from agent_context_manager import ContextManager, MemoryBlock
manager = ContextManager(max_tokens=8000)
# Define memory blocks
manager.add_block(MemoryBlock(
name="system",
max_tokens=500,
priority=1.0, # Highest priority, never compressed
content="You are a helpful coding assistant."
))
manager.add_block(MemoryBlock(
name="user_profile",
max_tokens=200,
priority=0.9,
content="User prefers Python, uses VS Code."
))
manager.add_block(MemoryBlock(
name="conversation",
max_tokens=7000,
priority=0.5, # Can be compressed if needed
strategy=SlidingWindowStrategy(keep_recent=30)
))
# Update blocks as needed
manager.update_block("user_profile", "User prefers Python, uses VS Code, timezone: PST")
Integration Examples
With OpenAI
from openai import OpenAI
from agent_context_manager import ContextManager
client = OpenAI()
manager = ContextManager(max_tokens=8000, model="gpt-4")
manager.add_message({"role": "system", "content": "You are helpful."})
while True:
user_input = input("You: ")
manager.add_message({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4",
messages=manager.get_context() # Auto-compressed if needed
)
assistant_message = response.choices[0].message.content
manager.add_message({"role": "assistant", "content": assistant_message})
print(f"Assistant: {assistant_message}")
With Anthropic
from anthropic import Anthropic
from agent_context_manager import ContextManager
client = Anthropic()
manager = ContextManager(max_tokens=100000, model="claude-3")
# Same pattern works with any provider
With LangChain
from langchain.memory import ConversationBufferMemory
from agent_context_manager import ContextManager, LangChainAdapter
manager = ContextManager(max_tokens=8000)
memory = LangChainAdapter(manager) # Drop-in replacement
API Reference
ContextManager
| Method | Description |
|---|---|
add_message(msg) |
Add a message to context |
get_context() |
Get compressed context as message list |
token_count |
Current token count |
usage_percent |
Percentage of context used |
is_near_overflow() |
Check if approaching limit |
compress() |
Manually trigger compression |
clear() |
Clear all messages |
Strategies
| Strategy | Best For |
|---|---|
SlidingWindowStrategy |
Simple agents, chatbots |
ImportanceStrategy |
Complex agents with tool use |
DeduplicationStrategy |
Repetitive workflows |
HybridStrategy |
Production systems |
Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
License
MIT License - see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_context_manager-0.1.0.tar.gz.
File metadata
- Download URL: agent_context_manager-0.1.0.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bde8c10b47d97624f12f4358e86dd031cb83131b952a41e83fb8b4fd6dbdb83
|
|
| MD5 |
a12b73f41f5fc4029b212f522d418831
|
|
| BLAKE2b-256 |
4cae4d5f5bc7b6a74b81685c4e40c3c386ce7fb226fce9eeca57109148317df2
|
File details
Details for the file agent_context_manager-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_context_manager-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c294b946365ca13d401ebcd8f1eccd257e3589c6b5f91cebe8f520469d77ad57
|
|
| MD5 |
f5e2274d725d5f8f02d978cc1ff6c30f
|
|
| BLAKE2b-256 |
077384a6609abe0608d5dcfd8510d298909e8a1c9d2e52c9b6f865c5336c8422
|