Skip to main content

Context window management utilities for LLM-based applications

Project description

harnessutils

Python library for managing LLM context windows in long-running conversations. Enables indefinite conversation length while staying within token limits.

Installation

uv add harness-utils

Features

  • Three-tier context management - Truncation, pruning, and LLM-powered summarization
  • Turn processing - Stream event handling with hooks and doom loop detection
  • Message lifecycle hooks - Pre/post hooks on add_message() for guardrails, redaction, audit logging
  • Semantic memory protocol - Plug in your own vector store via SemanticMemoryBackend
  • Workspace management - Stable project UUID under .harness/ for cross-session identity
  • Pluggable storage - Filesystem and in-memory backends
  • Zero dependencies - No external runtime requirements
  • Type-safe - Full Python 3.12+ type hints

Quick Start

from harnessutils import ConversationManager, Message, TextPart, generate_id

manager = ConversationManager()
conv = manager.create_conversation()

# Add message
msg = Message(id=generate_id("msg"), role="user")
msg.add_part(TextPart(text="Help me debug"))
manager.add_message(conv.id, msg)

# Prune old outputs
manager.prune_before_turn(conv.id)

# Get messages for LLM
model_messages = manager.to_model_format(conv.id)

Context Management

Three tiers handle context overflow:

1. Truncation - Limits tool output size (instant, free)

output = manager.truncate_tool_output(large_output, "tool_name")

2. Pruning - Removes old tool outputs (fast, ~50ms)

result = manager.prune_before_turn(conv.id)
# Keeps recent 40K tokens, removes older outputs

3. Summarization - LLM compression when needed (slow, ~3-5s)

if manager.needs_compaction(conv.id, usage):
    manager.compact(conv.id, llm_client, parent_msg_id)

Turn Processing

Process streaming LLM responses with hooks:

from harnessutils import TurnProcessor, TurnHooks

hooks = TurnHooks(
    on_tool_call=execute_tool,
    on_doom_loop=handle_loop,
)

processor = TurnProcessor(message, hooks)
for event in llm_stream:
    processor.process_stream_event(event)

Includes:

  • Tool state machine
  • Doom loop detection (3 identical calls)
  • Snapshot tracking

Message Hooks

Intercept every add_message() call with pre and post hooks:

from harnessutils import ConversationManager, MessageHooks
from harnessutils.models.message import Message

# Pre-hook: inspect, modify, or raise to reject
def guardrail(conv_id: str, msg: Message) -> Message:
    for part in msg.parts:
        if part.type == "text" and "ignore instructions" in part.text.lower():
            raise ValueError("Blocked: prompt injection attempt")
    return msg

# Post-hook: side effects after successful storage
def audit_log(conv_id: str, msg: Message) -> None:
    print(f"stored {msg.id} in {conv_id}")

manager = ConversationManager(
    message_hooks=MessageHooks(
        on_before_add_message=guardrail,
        on_after_add_message=audit_log,
    )
)

See docs/message-hooks.md for the full guide including PII redaction, semantic memory indexing, Prometheus metrics, and hook execution order.

Configuration

from harnessutils import HarnessConfig

config = HarnessConfig()
config.truncation.max_lines = 2000
config.pruning.prune_protect = 40_000  # Keep recent 40K tokens
config.model_limits.default_context_limit = 200_000

Storage

from harnessutils import FilesystemStorage, MemoryStorage

# Filesystem (production)
storage = FilesystemStorage(config.storage)

# In-memory (testing)
storage = MemoryStorage()

# Custom (implement StorageBackend protocol)
# See examples/custom_storage_example.py
storage = YourCustomStorage()

Examples

  • basic_usage.py - Simple conversation
  • ollama_example.py - Ollama integration
  • ollama_with_summarization.py - Full three-tier demo
  • turn_processing_example.py - Stream processing
  • custom_storage_example.py - Custom storage adapter (SQLite)

Development

uv sync                          # Install deps
uv run pytest                    # Run unit tests
uv run mypy src/                 # Type check
uv run python -m evals.runner    # Run evals (quality, budget, performance)

Evals test real-world behavior beyond unit tests:

  • Information preservation after compaction
  • Token budget compliance
  • Performance benchmarks (latency, throughput)

See evals/README.md for details.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harness_utils-1.2.2.tar.gz (473.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harness_utils-1.2.2-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file harness_utils-1.2.2.tar.gz.

File metadata

  • Download URL: harness_utils-1.2.2.tar.gz
  • Upload date:
  • Size: 473.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.17

File hashes

Hashes for harness_utils-1.2.2.tar.gz
Algorithm Hash digest
SHA256 fbca1c4f3d652126e505a1edc50bc594fd262914f7dc8edfd561e44633cbfe47
MD5 f71c3c1c198da5aa948c0b044cb1e9bd
BLAKE2b-256 84b6e2dc8b649a1919aea7dd717932f8b37d15268fc4c30071bdf63ccb6727e5

See more details on using hashes here.

File details

Details for the file harness_utils-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for harness_utils-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 14551b8354da98a03f88f14936e83b503b4a23821de3eba4837bf2f680ac3c7e
MD5 f213e44ba2733af8567d76c27760ec6a
BLAKE2b-256 33acdb00cb7a9c72b410eb3cf3b5ff4b46beec8ec47dea760255df1cc5275ea7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page