Skip to main content

Production-ready LLM context window optimization and management

Project description

Context Window Manager

Production-ready LLM context window optimization and management. Automatically handles token counting, message pruning, and content compression to keep conversations within context limits.

Features

  • Token Counting: Accurate token counting with tiktoken or approximate fallback
  • Automatic Pruning: Multiple strategies (FIFO, relevance, recency-weighted, etc.)
  • Content Compression: Truncation, summarization, bullet point extraction
  • Priority System: Pin important messages, set priorities for retention
  • Model Presets: Built-in configs for GPT-4, GPT-4o, GPT-3.5, Claude
  • Usage Statistics: Track token usage and pruning operations
  • Zero Dependencies Core: Works without tiktoken (with approximation)

Installation

pip install context-window-manager           # Core (approximate counting)
pip install context-window-manager[tiktoken] # Accurate token counting

Quick Start

Basic Usage

from context_window_manager import ContextWindowManager, create_manager

# Create manager for GPT-4
manager = create_manager("gpt-4")

# Set system message
manager.set_system_message("You are a helpful assistant.")

# Add messages
manager.add_message("user", "Hello, how are you?")
manager.add_message("assistant", "I'm doing well, thank you!")
manager.add_message("user", "Tell me about Python.")

# Get messages for API call
messages = manager.get_messages()

# Check budget
budget = manager.get_budget()
print(f"Used: {budget.used_tokens}/{budget.total_tokens}")

Automatic Pruning

from context_window_manager import ContextWindowManager, WindowConfig, PruningStrategy

config = WindowConfig(
    max_tokens=16000,
    pruning_strategy=PruningStrategy.RECENCY_WEIGHTED,
    pruning_threshold=0.85,  # Start pruning at 85% utilization
    preserve_recent_turns=2   # Always keep last 2 exchanges
)

manager = ContextWindowManager(config)

# Messages are automatically pruned when threshold is reached
for i in range(100):
    manager.add_message("user", f"Message {i}: " + "x" * 500)
    manager.add_message("assistant", f"Response {i}: " + "y" * 500)

print(f"Messages: {manager.conversation.message_count}")
print(f"Utilization: {manager.conversation.utilization:.1%}")

Manual Pruning

from context_window_manager import PruningStrategy

# Prune to specific token count
result = manager.prune(target_tokens=8000)
print(f"Removed {result.removed_messages} messages")
print(f"Saved {result.tokens_saved} tokens")

# Prune with specific strategy
result = manager.prune(strategy=PruningStrategy.RELEVANCE)

Message Priorities

from context_window_manager import Priority

# Add important message
manager.add_message("user", "Critical instruction", priority=Priority.CRITICAL)

# Pin a message (never pruned)
manager.pin_message(0)

# Set priority after creation
manager.set_priority(1, Priority.HIGH)

Content Compression

from context_window_manager import CompressionMethod

# Compress a specific message
result = manager.compress_message(
    message_index=5,
    target_tokens=100,
    method=CompressionMethod.BULLET_POINTS
)

print(f"Compressed from {result.original_tokens} to {result.compressed_tokens}")

Conversation Buffer

from context_window_manager import ConversationBuffer

# Simple buffer with limits
buffer = ConversationBuffer(
    max_tokens=8000,
    max_messages=50
)

buffer.add("user", "Hello")
buffer.add("assistant", "Hi there!")

messages = buffer.get_messages()
print(f"Tokens: {buffer.token_count}")

Model Configurations

from context_window_manager import ModelConfig, ContextWindowManager

# Use preset
config = ModelConfig.gpt4o()
manager = ContextWindowManager(model_config=config)

# Or create custom
custom_config = ModelConfig(
    name="custom-model",
    max_context_tokens=32000,
    max_output_tokens=4096,
    tokenizer=TokenizerType.TIKTOKEN_CL100K
)

Pruning Strategies

Strategy Description
FIFO Remove oldest messages first
LIFO Remove newest (except recent turns)
SLIDING_WINDOW Keep only most recent N messages
RELEVANCE Remove by relevance score
IMPORTANCE Remove by importance score
RECENCY_WEIGHTED Combine recency + relevance

Compression Methods

Method Description
TRUNCATE Cut content at sentence boundary
BULLET_POINTS Extract key sentences as bullets
EXTRACT_KEY_INFO Keep first/last paragraphs

API Reference

ContextWindowManager

manager = ContextWindowManager(config, model_config)

# Add messages
manager.add_message(role, content, **kwargs)
manager.set_system_message(content)

# Get messages
messages = manager.get_messages()

# Budget management
budget = manager.get_budget()
snapshot = manager.get_snapshot()

# Pruning
result = manager.prune(target_tokens, strategy)

# Compression
result = manager.compress_message(index, target_tokens, method)

# Message management
manager.pin_message(index)
manager.set_priority(index, priority)
manager.clear()

# Utilities
fits = manager.fits(content)
tokens = manager.tokens_for(content)

WindowConfig

config = WindowConfig(
    max_tokens=128000,
    reserved_output_tokens=4096,
    max_history_ratio=0.7,
    pruning_strategy=PruningStrategy.RECENCY_WEIGHTED,
    compression_method=CompressionMethod.NONE,
    tokenizer_type=TokenizerType.TIKTOKEN_CL100K,
    min_messages_to_keep=2,
    always_keep_system=True,
    preserve_recent_turns=2,
    pruning_threshold=0.85,
    on_prune=callback_function
)

License

MIT License - Pranay M

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

context_window_manager-0.1.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

context_window_manager-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file context_window_manager-0.1.0.tar.gz.

File metadata

  • Download URL: context_window_manager-0.1.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for context_window_manager-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ae748bb365b9f83b5abf63672f0d9515813be1175bd53d7224ed97265afb07f
MD5 7429f4a7b61d9ffa7d0e08ed67fcd25f
BLAKE2b-256 1dd50936ca822f690d2efd2d1d0ee8395bae9c877b9a08571b751bae75438109

See more details on using hashes here.

File details

Details for the file context_window_manager-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for context_window_manager-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29e15c1b0ede24aef2171b4f0ed3cf809e7c9a312aaacf15b92af9c01fa736c1
MD5 9cdb71806bd46eb5c8c010d460b0dd79
BLAKE2b-256 cb1ab2809ab44e94baa80b547df3bd7bf76e0b973ad0dd97fc65cb14a5c8f283

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page