Context window optimization for AI agents. Zero dependencies.

These details have not been verified by PyPI

Project links

Project description

antaris-context

Context window management and token budget tracking for AI applications. Handles message compression, multiple selection strategies, and budget enforcement with zero external dependencies.

pip install antaris-context

Version: 4.2.0
Dependencies: None (stdlib only)
Python: 3.9+

Core Components

ContextManager

Primary interface for context window management with token budget tracking.

from antaris_context import ContextManager

# Basic usage
manager = ContextManager(total_budget=8000)
manager.add_turn("user", "How do I implement JWT auth?")
manager.add_turn("assistant", "Use the following approach...")

print(f"Token usage: {manager.get_total_used()}/{manager.total_budget}")
print(f"Over budget: {manager.is_over_budget()}")

# Get detailed usage report
report = manager.get_usage_report()
print(f"Utilization: {report['utilization']:.1%}")

ContextWindow

Lower-level context window with turn-based operations.

from antaris_context import ContextWindow

window = ContextWindow(budget=4000)
window.add_turn("user", "Write a Python function")
window.add_turn("assistant", "def process_data():\n    return data.upper()")

total_used = window.get_total_used()
over_budget = window.is_over_budget()
usage = window.get_usage_report()

MessageCompressor

Compress messages and tool outputs to fit within token limits.

from antaris_context import MessageCompressor

compressor = MessageCompressor(level='moderate')

# Compress message list
messages = [
    {"role": "user", "content": "Long user message..."},
    {"role": "assistant", "content": "Long response..."}
]
compressed = compressor.compress_message_list(messages, max_content_length=500)

# Compress tool output
long_output = "..." * 10000
compressed_output = compressor.compress_tool_output(
    long_output, 
    max_lines=50, 
    keep_first=10, 
    keep_last=10
)

# Get compression statistics
stats = compressor.get_compression_stats()
print(f"Compression ratio: {stats['compression_ratio']:.2f}")
print(f"Bytes saved: {stats['bytes_saved']}")

Selection Strategies

ContextStrategy (Base)

Abstract base for all selection strategies.

RecencyStrategy

Prioritize recent content over older content.

from antaris_context import ContextManager

manager = ContextManager(total_budget=8000)
manager.set_strategy('recency', prefer_high_priority=True)

# Add content with priorities
manager.add_content('conversation', old_messages, priority='normal')
manager.add_content('conversation', recent_messages, priority='important')

# Recent content is kept when budget is exceeded
manager.optimize_context()

RelevanceStrategy

Select content based on semantic relevance to a query.

manager.set_strategy('relevance')

# Add content with relevance query
manager.add_content('memory', memory_items, query="authentication JWT")

# Content relevant to the query is prioritized
result = manager.optimize_context(query="JWT authentication Flask")

HybridStrategy

Combine recency and relevance scoring.

manager.set_strategy('hybrid', recency_weight=0.4, relevance_weight=0.6)

manager.add_content('conversation', messages, query="JWT Flask")
result = manager.optimize_context(
    query="JWT authentication", 
    target_utilization=0.85
)

BudgetStrategy

Allocate content based on section budget limits.

manager.set_strategy('budget', approach='balanced')

# Set section budgets
manager.set_section_budgets({
    'system': 1000,
    'memory': 2000,
    'conversation': 4000,
    'tools': 1000
})

manager.optimize_context()

Advanced Compression

ImportanceWeightedCompressor

Preserve high-importance content during compression.

from antaris_context import ImportanceWeightedCompressor, CompressionResult

compressor = ImportanceWeightedCompressor(
    keep_top_n=5,
    compress_middle=True,
    drop_threshold=0.1
)

# Compress with importance scores
content_items = [
    {"text": "Critical system info", "importance": 0.9},
    {"text": "Regular conversation", "importance": 0.5},
    {"text": "Debug output", "importance": 0.2}
]

result: CompressionResult = compressor.compress(content_items, target_size=2000)
print(f"Items kept: {result.items_kept}")
print(f"Items compressed: {result.items_compressed}")
print(f"Items dropped: {result.items_dropped}")
print(f"Final size: {result.final_size}")

SemanticChunker

Split text at sentence boundaries with configurable overlap.

from antaris_context import SemanticChunker, SemanticChunk

chunker = SemanticChunker(
    min_chunk_size=100,
    max_chunk_size=500,
    overlap_sentences=2
)

chunks: list[SemanticChunk] = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Chunk {chunk.index}: {len(chunk.text)} chars")
    print(f"Sentences: {chunk.sentence_count}")
    print(f"Overlap: {chunk.overlap_size}")

Context Profiling

ContextProfiler

Analyze token usage patterns and identify optimization opportunities.

from antaris_context import ContextProfiler

profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)

# Track usage over multiple operations
profiler.start_session("conversation_flow")
manager.add_turn("user", "Question 1")
manager.add_turn("assistant", "Answer 1")
profiler.record_usage(manager)

manager.add_turn("user", "Question 2")
manager.optimize_context()
profiler.record_usage(manager)

# Get analysis
analysis = profiler.analyze_patterns()
print(f"Average utilization: {analysis['avg_utilization']:.1%}")
print(f"Peak usage: {analysis['peak_usage']} tokens")

for section, stats in analysis['section_stats'].items():
    print(f"{section}: avg={stats['avg_tokens']}, max={stats['max_tokens']}")

# Get recommendations
recommendations = profiler.get_optimization_recommendations()
for rec in recommendations:
    print(f"- {rec['description']} (impact: {rec['impact']})")

Hard Budget Enforcement (v4.2.0)

render_hard_limited(budget_tokens) enforces a strict token ceiling. It trims content to fit within the budget and raises ContextBudgetExceeded if fitting is impossible (e.g., mandatory system content alone exceeds the budget).

from antaris_context import ContextManager, ContextBudgetExceeded

ctx = ContextManager(budget=1000)
try:
    messages = ctx.render_hard_limited(budget_tokens=500)
except ContextBudgetExceeded as e:
    print(f"Over budget: {e.used} tokens used, {e.budget} budget")

Unlike optimize_context(), which uses soft limits and may return over-budget results, render_hard_limited() guarantees the returned message list never exceeds budget_tokens. Use it when you must not exceed a model's context limit.

Budget Enforcement

Budget enforcement uses soft limits: tracks usage and warns when exceeded, but does not hard-truncate content.

manager = ContextManager(total_budget=8000, strict_budget=False)

# Add content that exceeds budget
manager.add_content('conversation', large_message_history)
manager.add_content('tools', debug_output)

# Check status
if manager.is_over_budget():
    usage = manager.get_usage_report()
    print(f"Over budget by {usage['overage']} tokens")
    print(f"Utilization: {usage['utilization']:.1%}")

# Optimize to fit within budget
result = manager.optimize_context(target_utilization=0.85)
if result.success:
    print(f"Optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Could not reach target: {result.final_utilization:.1%}")

Section Organization

Organize content into logical sections with individual budget allocations.

manager = ContextManager(total_budget=8000)

# Set section budgets
manager.set_section_budgets({
    'system': 1200,    # System prompts, rules
    'memory': 1800,    # Long-term memory items
    'conversation': 4000,  # Chat history
    'tools': 1000      # Tool outputs, debug info
})

# Add content to sections
manager.add_content('system', "You are a helpful assistant.", priority='critical')
manager.add_content('memory', recalled_memories, priority='important')
manager.add_content('conversation', chat_history, priority='normal')
manager.add_content('tools', tool_outputs, priority='optional')

# Check section usage
for section, usage in manager.get_section_usage().items():
    budget = manager.section_budgets[section]
    print(f"{section}: {usage}/{budget} tokens")

Crash-Safe Persistence

atomic_write_json

Write JSON files atomically to prevent corruption during crashes.

from antaris_context import atomic_write_json
import json

data = {
    'context_state': manager.export_snapshot(),
    'timestamp': time.time(),
    'version': '4.2.0'
}

# Atomic write (temporary file + rename)
atomic_write_json('context_snapshot.json', data)

# Safe even if process crashes during write
try:
    with open('context_snapshot.json', 'r') as f:
        restored_data = json.load(f)
except json.JSONDecodeError:
    print("File was corrupted, but atomic write prevented partial state")

Cross-Session Persistence

Save and restore context state between application sessions.

# Save snapshot with importance filtering
snapshot = manager.export_snapshot(include_importance_above=0.3)

# Restore from snapshot
new_manager = ContextManager.from_snapshot(snapshot)

# Named snapshots for specific states
manager.save_snapshot("pre_optimization")
manager.save_snapshot("post_compression")

# Restore named snapshot
manager.restore_snapshot("pre_optimization")

# List available snapshots
snapshots = manager.list_snapshots()
for name in snapshots:
    print(f"Snapshot: {name}")

Configuration

# JSON configuration
config = {
    "total_budget": 8000,
    "compression_level": "moderate",
    "strategy": "hybrid",
    "strategy_params": {
        "recency_weight": 0.4,
        "relevance_weight": 0.6
    },
    "section_budgets": {
        "system": 1000,
        "memory": 2000,
        "conversation": 4000,
        "tools": 1000
    },
    "auto_optimize": True,
    "target_utilization": 0.85
}

manager = ContextManager.from_config(config)

# Save current configuration
current_config = manager.export_config()
atomic_write_json('context_config.json', current_config)

Complete Example

from antaris_context import (
    ContextManager, MessageCompressor, ContextProfiler, 
    atomic_write_json
)

# Initialize with profiling
profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)
compressor = MessageCompressor('moderate')

# Set strategy and budgets
manager.set_strategy('hybrid', recency_weight=0.3, relevance_weight=0.7)
manager.set_section_budgets({
    'system': 1200,
    'memory': 1800,
    'conversation': 4000,
    'tools': 1000
})

# Add system prompt (critical priority)
manager.add_content('system', 
    "You are a Python coding assistant. Provide working examples.",
    priority='critical'
)

# Add conversation history with compression
chat_history = load_chat_history()
compressed_history = compressor.compress_message_list(
    chat_history, 
    max_content_length=300
)
manager.add_content('conversation', compressed_history, priority='normal')

# Add relevant memories
memories = load_memories()
manager.add_content('memory', memories, 
    query="Python authentication JWT", 
    priority='important'
)

# Process current query
current_query = "How do I add JWT authentication to Flask?"
manager.add_turn("user", current_query)

# Optimize context
profiler.start_session("query_processing")
result = manager.optimize_context(
    query=current_query, 
    target_utilization=0.85
)
profiler.record_usage(manager)

if result.success:
    # Render for LLM
    messages = manager.render_messages(format='openai')
    
    # Process with LLM (not included in this library)
    # response = openai_client.chat.completions.create(...)
    
    # Add response and save state
    manager.add_turn("assistant", "Use Flask-JWT-Extended...")
    
    # Save snapshot
    snapshot = manager.export_snapshot()
    atomic_write_json('session_state.json', snapshot)
    
    print(f"Context optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Optimization incomplete: {result.final_utilization:.1%} utilization")

# Get profiling results
analysis = profiler.analyze_patterns()
print(f"Session efficiency: {analysis['efficiency_score']:.2f}")

Token Estimation

Uses character-based approximation (4 characters per token) for fast budget calculations. For exact counts, plug in your model's tokenizer:

import tiktoken

# Optional: plug in exact tokenizer
enc = tiktoken.encoding_for_model("gpt-4")
manager._estimate_tokens = lambda text: len(enc.encode(text))

# Default approximation is sufficient for budget management
estimated = manager._estimate_tokens("Hello world")  # ~3 tokens
actual = len(enc.encode("Hello world"))  # 2 tokens (exact)

Performance Characteristics

Token estimation: ~100,000 characters/second
Message compression: ~50,000 characters/second
Strategy selection: ~10,000 messages/second
Context optimization: ~1,000 content items/second
Memory usage: Linear with content size
CPU usage: O(n log n) for relevance ranking, O(n) for other operations

Limitations

Token estimation is approximate: Use actual tokenizer for exact counts
No LLM calls: Compression is structural, not semantic (unless using pluggable summarizer)
Single-threaded: Not designed for concurrent access
Memory bound: All content held in memory during processing
No distributed contexts: Manages single context windows only

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-context
python -m pytest tests/ -v --cov=antaris_context

150 tests, 95% coverage, zero external dependencies.

Integration with Antaris Suite

# With antaris-memory
from antaris_memory import MemoryClient
memory_client = MemoryClient()
manager.set_memory_client(memory_client)

# With antaris-router  
from antaris_router import Router
router = Router()
hints = router.get_routing_hints(query)
manager.set_router_hints(hints)

# With antaris-guard
from antaris_guard import ContentFilter
filter = ContentFilter()
safe_content = filter.scan(content)
manager.add_content('conversation', safe_content)

License

Apache 2.0 License with explicit patent grant clause.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.0.1

Mar 10, 2026

4.9.20

Mar 8, 2026

4.9.18

Mar 7, 2026

4.9.17

Mar 7, 2026

4.9.16

Mar 6, 2026

4.9.15

Mar 6, 2026

4.9.14

Mar 5, 2026

4.9.13

Mar 5, 2026

4.9.12

Mar 5, 2026

4.9.11

Mar 5, 2026

4.9.10

Mar 4, 2026

4.9.5

Mar 3, 2026

4.9.4

Mar 3, 2026

4.9.3

Mar 3, 2026

4.9.2

Mar 3, 2026

4.9.1

Mar 3, 2026

4.9.0

Mar 3, 2026

This version

4.8.0

Mar 3, 2026

4.7.1

Mar 3, 2026

4.7.0

Mar 3, 2026

4.6.8

Mar 2, 2026

4.6.6

Mar 2, 2026

4.6.5

Mar 2, 2026

4.6.0

Mar 2, 2026

4.5.3

Mar 1, 2026

4.5.2

Mar 1, 2026

4.2.0

Feb 27, 2026

4.1.0

Feb 26, 2026

4.0.0

Feb 23, 2026

3.1.0

Feb 21, 2026

3.0.0

Feb 21, 2026

2.2.0

Feb 21, 2026

2.1.1

Feb 20, 2026

2.0.0

Feb 19, 2026

1.1.0

Feb 17, 2026

1.0.1

Feb 17, 2026

1.0.0

Feb 23, 2026

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_context-4.8.0.tar.gz (63.1 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

antaris_context-4.8.0-py3-none-any.whl (49.1 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file antaris_context-4.8.0.tar.gz.

File metadata

Download URL: antaris_context-4.8.0.tar.gz
Upload date: Mar 3, 2026
Size: 63.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_context-4.8.0.tar.gz
Algorithm	Hash digest
SHA256	`7b3324214cfe634c198dcf0b603d04add991941c07585eb9c972f66a592c41b8`
MD5	`a3f3e23afd1918fb01a5c96312dae898`
BLAKE2b-256	`21d3eb868bd88d3579b13bb20eddee7d8406d289c3938bd5605228dfdfb429da`

See more details on using hashes here.

File details

Details for the file antaris_context-4.8.0-py3-none-any.whl.

File metadata

Download URL: antaris_context-4.8.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 49.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_context-4.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8baee903c5b1e9e82daea62f61e92c2e65edf9854be79047b6fc4c0ece3739f7`
MD5	`3434bb5b6721e74463c75316d67ae781`
BLAKE2b-256	`1f677d8f0014a5077fac35fb1df6b9533f363d570df3336075e215f099dd73f2`

See more details on using hashes here.

antaris-context 4.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

antaris-context

Core Components

ContextManager

ContextWindow

MessageCompressor

Selection Strategies

ContextStrategy (Base)

RecencyStrategy

RelevanceStrategy

HybridStrategy

BudgetStrategy

Advanced Compression

ImportanceWeightedCompressor

SemanticChunker

Context Profiling

ContextProfiler

Hard Budget Enforcement (v4.2.0)

Budget Enforcement

Section Organization

Crash-Safe Persistence

atomic_write_json

Cross-Session Persistence

Configuration

Complete Example

Token Estimation

Performance Characteristics

Limitations

Testing

Integration with Antaris Suite

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes