Skip to main content

Context window optimization for AI agents. Zero dependencies.

Project description

antaris-context

Context window management and token budget tracking for AI applications. Handles message compression, multiple selection strategies, and budget enforcement with zero external dependencies.

pip install antaris-context

Version: 4.2.0
Dependencies: None (stdlib only)
Python: 3.9+

Core Components

ContextManager

Primary interface for context window management with token budget tracking.

from antaris_context import ContextManager

# Basic usage
manager = ContextManager(total_budget=8000)
manager.add_turn("user", "How do I implement JWT auth?")
manager.add_turn("assistant", "Use the following approach...")

print(f"Token usage: {manager.get_total_used()}/{manager.total_budget}")
print(f"Over budget: {manager.is_over_budget()}")

# Get detailed usage report
report = manager.get_usage_report()
print(f"Utilization: {report['utilization']:.1%}")

ContextWindow

Lower-level context window with turn-based operations.

from antaris_context import ContextWindow

window = ContextWindow(budget=4000)
window.add_turn("user", "Write a Python function")
window.add_turn("assistant", "def process_data():\n    return data.upper()")

total_used = window.get_total_used()
over_budget = window.is_over_budget()
usage = window.get_usage_report()

MessageCompressor

Compress messages and tool outputs to fit within token limits.

from antaris_context import MessageCompressor

compressor = MessageCompressor(level='moderate')

# Compress message list
messages = [
    {"role": "user", "content": "Long user message..."},
    {"role": "assistant", "content": "Long response..."}
]
compressed = compressor.compress_message_list(messages, max_content_length=500)

# Compress tool output
long_output = "..." * 10000
compressed_output = compressor.compress_tool_output(
    long_output, 
    max_lines=50, 
    keep_first=10, 
    keep_last=10
)

# Get compression statistics
stats = compressor.get_compression_stats()
print(f"Compression ratio: {stats['compression_ratio']:.2f}")
print(f"Bytes saved: {stats['bytes_saved']}")

Selection Strategies

ContextStrategy (Base)

Abstract base for all selection strategies.

RecencyStrategy

Prioritize recent content over older content.

from antaris_context import ContextManager

manager = ContextManager(total_budget=8000)
manager.set_strategy('recency', prefer_high_priority=True)

# Add content with priorities
manager.add_content('conversation', old_messages, priority='normal')
manager.add_content('conversation', recent_messages, priority='important')

# Recent content is kept when budget is exceeded
manager.optimize_context()

RelevanceStrategy

Select content based on semantic relevance to a query.

manager.set_strategy('relevance')

# Add content with relevance query
manager.add_content('memory', memory_items, query="authentication JWT")

# Content relevant to the query is prioritized
result = manager.optimize_context(query="JWT authentication Flask")

HybridStrategy

Combine recency and relevance scoring.

manager.set_strategy('hybrid', recency_weight=0.4, relevance_weight=0.6)

manager.add_content('conversation', messages, query="JWT Flask")
result = manager.optimize_context(
    query="JWT authentication", 
    target_utilization=0.85
)

BudgetStrategy

Allocate content based on section budget limits.

manager.set_strategy('budget', approach='balanced')

# Set section budgets
manager.set_section_budgets({
    'system': 1000,
    'memory': 2000,
    'conversation': 4000,
    'tools': 1000
})

manager.optimize_context()

Advanced Compression

ImportanceWeightedCompressor

Preserve high-importance content during compression.

from antaris_context import ImportanceWeightedCompressor, CompressionResult

compressor = ImportanceWeightedCompressor(
    keep_top_n=5,
    compress_middle=True,
    drop_threshold=0.1
)

# Compress with importance scores
content_items = [
    {"text": "Critical system info", "importance": 0.9},
    {"text": "Regular conversation", "importance": 0.5},
    {"text": "Debug output", "importance": 0.2}
]

result: CompressionResult = compressor.compress(content_items, target_size=2000)
print(f"Items kept: {result.items_kept}")
print(f"Items compressed: {result.items_compressed}")
print(f"Items dropped: {result.items_dropped}")
print(f"Final size: {result.final_size}")

SemanticChunker

Split text at sentence boundaries with configurable overlap.

from antaris_context import SemanticChunker, SemanticChunk

chunker = SemanticChunker(
    min_chunk_size=100,
    max_chunk_size=500,
    overlap_sentences=2
)

chunks: list[SemanticChunk] = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Chunk {chunk.index}: {len(chunk.text)} chars")
    print(f"Sentences: {chunk.sentence_count}")
    print(f"Overlap: {chunk.overlap_size}")

Context Profiling

ContextProfiler

Analyze token usage patterns and identify optimization opportunities.

from antaris_context import ContextProfiler

profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)

# Track usage over multiple operations
profiler.start_session("conversation_flow")
manager.add_turn("user", "Question 1")
manager.add_turn("assistant", "Answer 1")
profiler.record_usage(manager)

manager.add_turn("user", "Question 2")
manager.optimize_context()
profiler.record_usage(manager)

# Get analysis
analysis = profiler.analyze_patterns()
print(f"Average utilization: {analysis['avg_utilization']:.1%}")
print(f"Peak usage: {analysis['peak_usage']} tokens")

for section, stats in analysis['section_stats'].items():
    print(f"{section}: avg={stats['avg_tokens']}, max={stats['max_tokens']}")

# Get recommendations
recommendations = profiler.get_optimization_recommendations()
for rec in recommendations:
    print(f"- {rec['description']} (impact: {rec['impact']})")

Hard Budget Enforcement (v4.2.0)

render_hard_limited(budget_tokens) enforces a strict token ceiling. It trims content to fit within the budget and raises ContextBudgetExceeded if fitting is impossible (e.g., mandatory system content alone exceeds the budget).

from antaris_context import ContextManager, ContextBudgetExceeded

ctx = ContextManager(budget=1000)
try:
    messages = ctx.render_hard_limited(budget_tokens=500)
except ContextBudgetExceeded as e:
    print(f"Over budget: {e.used} tokens used, {e.budget} budget")

Unlike optimize_context(), which uses soft limits and may return over-budget results, render_hard_limited() guarantees the returned message list never exceeds budget_tokens. Use it when you must not exceed a model's context limit.

Budget Enforcement

Budget enforcement uses soft limits: tracks usage and warns when exceeded, but does not hard-truncate content.

manager = ContextManager(total_budget=8000, strict_budget=False)

# Add content that exceeds budget
manager.add_content('conversation', large_message_history)
manager.add_content('tools', debug_output)

# Check status
if manager.is_over_budget():
    usage = manager.get_usage_report()
    print(f"Over budget by {usage['overage']} tokens")
    print(f"Utilization: {usage['utilization']:.1%}")

# Optimize to fit within budget
result = manager.optimize_context(target_utilization=0.85)
if result.success:
    print(f"Optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Could not reach target: {result.final_utilization:.1%}")

Section Organization

Organize content into logical sections with individual budget allocations.

manager = ContextManager(total_budget=8000)

# Set section budgets
manager.set_section_budgets({
    'system': 1200,    # System prompts, rules
    'memory': 1800,    # Long-term memory items
    'conversation': 4000,  # Chat history
    'tools': 1000      # Tool outputs, debug info
})

# Add content to sections
manager.add_content('system', "You are a helpful assistant.", priority='critical')
manager.add_content('memory', recalled_memories, priority='important')
manager.add_content('conversation', chat_history, priority='normal')
manager.add_content('tools', tool_outputs, priority='optional')

# Check section usage
for section, usage in manager.get_section_usage().items():
    budget = manager.section_budgets[section]
    print(f"{section}: {usage}/{budget} tokens")

Crash-Safe Persistence

atomic_write_json

Write JSON files atomically to prevent corruption during crashes.

from antaris_context import atomic_write_json
import json

data = {
    'context_state': manager.export_snapshot(),
    'timestamp': time.time(),
    'version': '4.2.0'
}

# Atomic write (temporary file + rename)
atomic_write_json('context_snapshot.json', data)

# Safe even if process crashes during write
try:
    with open('context_snapshot.json', 'r') as f:
        restored_data = json.load(f)
except json.JSONDecodeError:
    print("File was corrupted, but atomic write prevented partial state")

Cross-Session Persistence

Save and restore context state between application sessions.

# Save snapshot with importance filtering
snapshot = manager.export_snapshot(include_importance_above=0.3)

# Restore from snapshot
new_manager = ContextManager.from_snapshot(snapshot)

# Named snapshots for specific states
manager.save_snapshot("pre_optimization")
manager.save_snapshot("post_compression")

# Restore named snapshot
manager.restore_snapshot("pre_optimization")

# List available snapshots
snapshots = manager.list_snapshots()
for name in snapshots:
    print(f"Snapshot: {name}")

Configuration

# JSON configuration
config = {
    "total_budget": 8000,
    "compression_level": "moderate",
    "strategy": "hybrid",
    "strategy_params": {
        "recency_weight": 0.4,
        "relevance_weight": 0.6
    },
    "section_budgets": {
        "system": 1000,
        "memory": 2000,
        "conversation": 4000,
        "tools": 1000
    },
    "auto_optimize": True,
    "target_utilization": 0.85
}

manager = ContextManager.from_config(config)

# Save current configuration
current_config = manager.export_config()
atomic_write_json('context_config.json', current_config)

Complete Example

from antaris_context import (
    ContextManager, MessageCompressor, ContextProfiler, 
    atomic_write_json
)

# Initialize with profiling
profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)
compressor = MessageCompressor('moderate')

# Set strategy and budgets
manager.set_strategy('hybrid', recency_weight=0.3, relevance_weight=0.7)
manager.set_section_budgets({
    'system': 1200,
    'memory': 1800,
    'conversation': 4000,
    'tools': 1000
})

# Add system prompt (critical priority)
manager.add_content('system', 
    "You are a Python coding assistant. Provide working examples.",
    priority='critical'
)

# Add conversation history with compression
chat_history = load_chat_history()
compressed_history = compressor.compress_message_list(
    chat_history, 
    max_content_length=300
)
manager.add_content('conversation', compressed_history, priority='normal')

# Add relevant memories
memories = load_memories()
manager.add_content('memory', memories, 
    query="Python authentication JWT", 
    priority='important'
)

# Process current query
current_query = "How do I add JWT authentication to Flask?"
manager.add_turn("user", current_query)

# Optimize context
profiler.start_session("query_processing")
result = manager.optimize_context(
    query=current_query, 
    target_utilization=0.85
)
profiler.record_usage(manager)

if result.success:
    # Render for LLM
    messages = manager.render_messages(format='openai')
    
    # Process with LLM (not included in this library)
    # response = openai_client.chat.completions.create(...)
    
    # Add response and save state
    manager.add_turn("assistant", "Use Flask-JWT-Extended...")
    
    # Save snapshot
    snapshot = manager.export_snapshot()
    atomic_write_json('session_state.json', snapshot)
    
    print(f"Context optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Optimization incomplete: {result.final_utilization:.1%} utilization")

# Get profiling results
analysis = profiler.analyze_patterns()
print(f"Session efficiency: {analysis['efficiency_score']:.2f}")

Token Estimation

Uses character-based approximation (4 characters per token) for fast budget calculations. For exact counts, plug in your model's tokenizer:

import tiktoken

# Optional: plug in exact tokenizer
enc = tiktoken.encoding_for_model("gpt-4")
manager._estimate_tokens = lambda text: len(enc.encode(text))

# Default approximation is sufficient for budget management
estimated = manager._estimate_tokens("Hello world")  # ~3 tokens
actual = len(enc.encode("Hello world"))  # 2 tokens (exact)

Performance Characteristics

  • Token estimation: ~100,000 characters/second
  • Message compression: ~50,000 characters/second
  • Strategy selection: ~10,000 messages/second
  • Context optimization: ~1,000 content items/second
  • Memory usage: Linear with content size
  • CPU usage: O(n log n) for relevance ranking, O(n) for other operations

Limitations

  • Token estimation is approximate: Use actual tokenizer for exact counts
  • No LLM calls: Compression is structural, not semantic (unless using pluggable summarizer)
  • Single-threaded: Not designed for concurrent access
  • Memory bound: All content held in memory during processing
  • No distributed contexts: Manages single context windows only

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-context
python -m pytest tests/ -v --cov=antaris_context

150 tests, 95% coverage, zero external dependencies.

Integration with Antaris Suite

# With antaris-memory
from antaris_memory import MemoryClient
memory_client = MemoryClient()
manager.set_memory_client(memory_client)

# With antaris-router  
from antaris_router import Router
router = Router()
hints = router.get_routing_hints(query)
manager.set_router_hints(hints)

# With antaris-guard
from antaris_guard import ContentFilter
filter = ContentFilter()
safe_content = filter.scan(content)
manager.add_content('conversation', safe_content)

License

Apache 2.0 License with explicit patent grant clause.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_context-4.8.0.tar.gz (63.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_context-4.8.0-py3-none-any.whl (49.1 kB view details)

Uploaded Python 3

File details

Details for the file antaris_context-4.8.0.tar.gz.

File metadata

  • Download URL: antaris_context-4.8.0.tar.gz
  • Upload date:
  • Size: 63.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_context-4.8.0.tar.gz
Algorithm Hash digest
SHA256 7b3324214cfe634c198dcf0b603d04add991941c07585eb9c972f66a592c41b8
MD5 a3f3e23afd1918fb01a5c96312dae898
BLAKE2b-256 21d3eb868bd88d3579b13bb20eddee7d8406d289c3938bd5605228dfdfb429da

See more details on using hashes here.

File details

Details for the file antaris_context-4.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for antaris_context-4.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8baee903c5b1e9e82daea62f61e92c2e65edf9854be79047b6fc4c0ece3739f7
MD5 3434bb5b6721e74463c75316d67ae781
BLAKE2b-256 1f677d8f0014a5077fac35fb1df6b9533f363d570df3336075e215f099dd73f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page