Context window optimization for AI agents. Zero dependencies.
Project description
antaris-context
Context window management and token budget tracking for AI applications. Handles message compression, multiple selection strategies, and budget enforcement with zero external dependencies.
pip install antaris-context
Version: 4.2.0
Dependencies: None (stdlib only)
Python: 3.9+
Core Components
ContextManager
Primary interface for context window management with token budget tracking.
from antaris_context import ContextManager
# Basic usage
manager = ContextManager(total_budget=8000)
manager.add_turn("user", "How do I implement JWT auth?")
manager.add_turn("assistant", "Use the following approach...")
print(f"Token usage: {manager.get_total_used()}/{manager.total_budget}")
print(f"Over budget: {manager.is_over_budget()}")
# Get detailed usage report
report = manager.get_usage_report()
print(f"Utilization: {report['utilization']:.1%}")
ContextWindow
Lower-level context window with turn-based operations.
from antaris_context import ContextWindow
window = ContextWindow(budget=4000)
window.add_turn("user", "Write a Python function")
window.add_turn("assistant", "def process_data():\n return data.upper()")
total_used = window.get_total_used()
over_budget = window.is_over_budget()
usage = window.get_usage_report()
MessageCompressor
Compress messages and tool outputs to fit within token limits.
from antaris_context import MessageCompressor
compressor = MessageCompressor(level='moderate')
# Compress message list
messages = [
{"role": "user", "content": "Long user message..."},
{"role": "assistant", "content": "Long response..."}
]
compressed = compressor.compress_message_list(messages, max_content_length=500)
# Compress tool output
long_output = "..." * 10000
compressed_output = compressor.compress_tool_output(
long_output,
max_lines=50,
keep_first=10,
keep_last=10
)
# Get compression statistics
stats = compressor.get_compression_stats()
print(f"Compression ratio: {stats['compression_ratio']:.2f}")
print(f"Bytes saved: {stats['bytes_saved']}")
Selection Strategies
ContextStrategy (Base)
Abstract base for all selection strategies.
RecencyStrategy
Prioritize recent content over older content.
from antaris_context import ContextManager
manager = ContextManager(total_budget=8000)
manager.set_strategy('recency', prefer_high_priority=True)
# Add content with priorities
manager.add_content('conversation', old_messages, priority='normal')
manager.add_content('conversation', recent_messages, priority='important')
# Recent content is kept when budget is exceeded
manager.optimize_context()
RelevanceStrategy
Select content based on semantic relevance to a query.
manager.set_strategy('relevance')
# Add content with relevance query
manager.add_content('memory', memory_items, query="authentication JWT")
# Content relevant to the query is prioritized
result = manager.optimize_context(query="JWT authentication Flask")
HybridStrategy
Combine recency and relevance scoring.
manager.set_strategy('hybrid', recency_weight=0.4, relevance_weight=0.6)
manager.add_content('conversation', messages, query="JWT Flask")
result = manager.optimize_context(
query="JWT authentication",
target_utilization=0.85
)
BudgetStrategy
Allocate content based on section budget limits.
manager.set_strategy('budget', approach='balanced')
# Set section budgets
manager.set_section_budgets({
'system': 1000,
'memory': 2000,
'conversation': 4000,
'tools': 1000
})
manager.optimize_context()
Advanced Compression
ImportanceWeightedCompressor
Preserve high-importance content during compression.
from antaris_context import ImportanceWeightedCompressor, CompressionResult
compressor = ImportanceWeightedCompressor(
keep_top_n=5,
compress_middle=True,
drop_threshold=0.1
)
# Compress with importance scores
content_items = [
{"text": "Critical system info", "importance": 0.9},
{"text": "Regular conversation", "importance": 0.5},
{"text": "Debug output", "importance": 0.2}
]
result: CompressionResult = compressor.compress(content_items, target_size=2000)
print(f"Items kept: {result.items_kept}")
print(f"Items compressed: {result.items_compressed}")
print(f"Items dropped: {result.items_dropped}")
print(f"Final size: {result.final_size}")
SemanticChunker
Split text at sentence boundaries with configurable overlap.
from antaris_context import SemanticChunker, SemanticChunk
chunker = SemanticChunker(
min_chunk_size=100,
max_chunk_size=500,
overlap_sentences=2
)
chunks: list[SemanticChunk] = chunker.chunk(long_text)
for chunk in chunks:
print(f"Chunk {chunk.index}: {len(chunk.text)} chars")
print(f"Sentences: {chunk.sentence_count}")
print(f"Overlap: {chunk.overlap_size}")
Context Profiling
ContextProfiler
Analyze token usage patterns and identify optimization opportunities.
from antaris_context import ContextProfiler
profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)
# Track usage over multiple operations
profiler.start_session("conversation_flow")
manager.add_turn("user", "Question 1")
manager.add_turn("assistant", "Answer 1")
profiler.record_usage(manager)
manager.add_turn("user", "Question 2")
manager.optimize_context()
profiler.record_usage(manager)
# Get analysis
analysis = profiler.analyze_patterns()
print(f"Average utilization: {analysis['avg_utilization']:.1%}")
print(f"Peak usage: {analysis['peak_usage']} tokens")
for section, stats in analysis['section_stats'].items():
print(f"{section}: avg={stats['avg_tokens']}, max={stats['max_tokens']}")
# Get recommendations
recommendations = profiler.get_optimization_recommendations()
for rec in recommendations:
print(f"- {rec['description']} (impact: {rec['impact']})")
Hard Budget Enforcement (v4.2.0)
render_hard_limited(budget_tokens) enforces a strict token ceiling. It trims content to fit within the budget and raises ContextBudgetExceeded if fitting is impossible (e.g., mandatory system content alone exceeds the budget).
from antaris_context import ContextManager, ContextBudgetExceeded
ctx = ContextManager(budget=1000)
try:
messages = ctx.render_hard_limited(budget_tokens=500)
except ContextBudgetExceeded as e:
print(f"Over budget: {e.used} tokens used, {e.budget} budget")
Unlike optimize_context(), which uses soft limits and may return over-budget results, render_hard_limited() guarantees the returned message list never exceeds budget_tokens. Use it when you must not exceed a model's context limit.
Budget Enforcement
Budget enforcement uses soft limits: tracks usage and warns when exceeded, but does not hard-truncate content.
manager = ContextManager(total_budget=8000, strict_budget=False)
# Add content that exceeds budget
manager.add_content('conversation', large_message_history)
manager.add_content('tools', debug_output)
# Check status
if manager.is_over_budget():
usage = manager.get_usage_report()
print(f"Over budget by {usage['overage']} tokens")
print(f"Utilization: {usage['utilization']:.1%}")
# Optimize to fit within budget
result = manager.optimize_context(target_utilization=0.85)
if result.success:
print(f"Optimized: {result.tokens_freed} tokens freed")
else:
print(f"Could not reach target: {result.final_utilization:.1%}")
Section Organization
Organize content into logical sections with individual budget allocations.
manager = ContextManager(total_budget=8000)
# Set section budgets
manager.set_section_budgets({
'system': 1200, # System prompts, rules
'memory': 1800, # Long-term memory items
'conversation': 4000, # Chat history
'tools': 1000 # Tool outputs, debug info
})
# Add content to sections
manager.add_content('system', "You are a helpful assistant.", priority='critical')
manager.add_content('memory', recalled_memories, priority='important')
manager.add_content('conversation', chat_history, priority='normal')
manager.add_content('tools', tool_outputs, priority='optional')
# Check section usage
for section, usage in manager.get_section_usage().items():
budget = manager.section_budgets[section]
print(f"{section}: {usage}/{budget} tokens")
Crash-Safe Persistence
atomic_write_json
Write JSON files atomically to prevent corruption during crashes.
from antaris_context import atomic_write_json
import json
data = {
'context_state': manager.export_snapshot(),
'timestamp': time.time(),
'version': '4.2.0'
}
# Atomic write (temporary file + rename)
atomic_write_json('context_snapshot.json', data)
# Safe even if process crashes during write
try:
with open('context_snapshot.json', 'r') as f:
restored_data = json.load(f)
except json.JSONDecodeError:
print("File was corrupted, but atomic write prevented partial state")
Cross-Session Persistence
Save and restore context state between application sessions.
# Save snapshot with importance filtering
snapshot = manager.export_snapshot(include_importance_above=0.3)
# Restore from snapshot
new_manager = ContextManager.from_snapshot(snapshot)
# Named snapshots for specific states
manager.save_snapshot("pre_optimization")
manager.save_snapshot("post_compression")
# Restore named snapshot
manager.restore_snapshot("pre_optimization")
# List available snapshots
snapshots = manager.list_snapshots()
for name in snapshots:
print(f"Snapshot: {name}")
Configuration
# JSON configuration
config = {
"total_budget": 8000,
"compression_level": "moderate",
"strategy": "hybrid",
"strategy_params": {
"recency_weight": 0.4,
"relevance_weight": 0.6
},
"section_budgets": {
"system": 1000,
"memory": 2000,
"conversation": 4000,
"tools": 1000
},
"auto_optimize": True,
"target_utilization": 0.85
}
manager = ContextManager.from_config(config)
# Save current configuration
current_config = manager.export_config()
atomic_write_json('context_config.json', current_config)
Complete Example
from antaris_context import (
ContextManager, MessageCompressor, ContextProfiler,
atomic_write_json
)
# Initialize with profiling
profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)
compressor = MessageCompressor('moderate')
# Set strategy and budgets
manager.set_strategy('hybrid', recency_weight=0.3, relevance_weight=0.7)
manager.set_section_budgets({
'system': 1200,
'memory': 1800,
'conversation': 4000,
'tools': 1000
})
# Add system prompt (critical priority)
manager.add_content('system',
"You are a Python coding assistant. Provide working examples.",
priority='critical'
)
# Add conversation history with compression
chat_history = load_chat_history()
compressed_history = compressor.compress_message_list(
chat_history,
max_content_length=300
)
manager.add_content('conversation', compressed_history, priority='normal')
# Add relevant memories
memories = load_memories()
manager.add_content('memory', memories,
query="Python authentication JWT",
priority='important'
)
# Process current query
current_query = "How do I add JWT authentication to Flask?"
manager.add_turn("user", current_query)
# Optimize context
profiler.start_session("query_processing")
result = manager.optimize_context(
query=current_query,
target_utilization=0.85
)
profiler.record_usage(manager)
if result.success:
# Render for LLM
messages = manager.render_messages(format='openai')
# Process with LLM (not included in this library)
# response = openai_client.chat.completions.create(...)
# Add response and save state
manager.add_turn("assistant", "Use Flask-JWT-Extended...")
# Save snapshot
snapshot = manager.export_snapshot()
atomic_write_json('session_state.json', snapshot)
print(f"Context optimized: {result.tokens_freed} tokens freed")
else:
print(f"Optimization incomplete: {result.final_utilization:.1%} utilization")
# Get profiling results
analysis = profiler.analyze_patterns()
print(f"Session efficiency: {analysis['efficiency_score']:.2f}")
Token Estimation
Uses character-based approximation (4 characters per token) for fast budget calculations. For exact counts, plug in your model's tokenizer:
import tiktoken
# Optional: plug in exact tokenizer
enc = tiktoken.encoding_for_model("gpt-4")
manager._estimate_tokens = lambda text: len(enc.encode(text))
# Default approximation is sufficient for budget management
estimated = manager._estimate_tokens("Hello world") # ~3 tokens
actual = len(enc.encode("Hello world")) # 2 tokens (exact)
Performance Characteristics
- Token estimation: ~100,000 characters/second
- Message compression: ~50,000 characters/second
- Strategy selection: ~10,000 messages/second
- Context optimization: ~1,000 content items/second
- Memory usage: Linear with content size
- CPU usage: O(n log n) for relevance ranking, O(n) for other operations
Limitations
- Token estimation is approximate: Use actual tokenizer for exact counts
- No LLM calls: Compression is structural, not semantic (unless using pluggable summarizer)
- Single-threaded: Not designed for concurrent access
- Memory bound: All content held in memory during processing
- No distributed contexts: Manages single context windows only
Testing
git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-context
python -m pytest tests/ -v --cov=antaris_context
150 tests, 95% coverage, zero external dependencies.
Integration with Antaris Suite
# With antaris-memory
from antaris_memory import MemoryClient
memory_client = MemoryClient()
manager.set_memory_client(memory_client)
# With antaris-router
from antaris_router import Router
router = Router()
hints = router.get_routing_hints(query)
manager.set_router_hints(hints)
# With antaris-guard
from antaris_guard import ContentFilter
filter = ContentFilter()
safe_content = filter.scan(content)
manager.add_content('conversation', safe_content)
License
Apache 2.0 License with explicit patent grant clause.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antaris_context-4.9.13.tar.gz.
File metadata
- Download URL: antaris_context-4.9.13.tar.gz
- Upload date:
- Size: 99.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7baeb54434617f9c839a76c84ada97cfd44134f8c249503c7bc652a9298e8ab
|
|
| MD5 |
025f429fa6edb74b65bae439616ffd84
|
|
| BLAKE2b-256 |
fb5aa6feb31ca9cf207a19f8eb3fab8ce4b6abe8e739250224255cb44cc908d0
|
File details
Details for the file antaris_context-4.9.13-py3-none-any.whl.
File metadata
- Download URL: antaris_context-4.9.13-py3-none-any.whl
- Upload date:
- Size: 84.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecc0d39f1d46a759230112950f572a0af0a68a75dc311af7346299efa6e03453
|
|
| MD5 |
0d181aaecb26da66d79fe60475e33586
|
|
| BLAKE2b-256 |
59b51462ae5cff36bd4a0d535b6bc150b647c34aece34da97769e4b9e26ca5e6
|