Skip to main content

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

Project description

HookedLLM

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

Python 3.10+ License: MIT Documentation

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

✨ Key Features

  • 🎯 Scoped Isolation: Named scopes prevent hook interference across application contexts
  • 🔧 SOLID/DI Compliant: Full dependency injection support for testing and customization
  • 📦 Minimal Surface: Single import, simple API: import hookedllm
  • ⚡ Async-First: Built for modern async LLM SDKs
  • 🎨 Type-Safe: Full type hints and IDE autocomplete support
  • 🛡️ Resilient: Hook failures never break your LLM calls
  • 🔀 Conditional Execution: Run hooks only when rules match (model, tags, metadata)
  • ⚙️ Config or Code: Define hooks programmatically or via YAML

🚀 Quick Start

Installation

# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With all optional dependencies
pip install hookedllm[all]

Basic Usage

import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

📚 Core Concepts

Scopes

Scopes isolate hooks to specific parts of your application:

# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!

Hook Types

Four hook types cover the entire call lifecycle:

# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)

Conditional Rules

Execute hooks only when conditions match:

# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)

Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

Multiple Scopes

Clients can use multiple scopes:

hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate

🧪 Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called

🏗️ Architecture

HookedLLM follows SOLID principles with full dependency injection:

  • Single Responsibility: Separate storage, execution, and registry
  • Dependency Inversion: Depends on Protocol abstractions
  • Liskov Substitution: Any implementation of protocols works
  • Interface Segregation: Focused, minimal interfaces
  • Open/Closed: Extend via hooks and rules without modifying core

See ARCHITECTURE.md for detailed design documentation.

📖 Advanced Usage

Custom Error Handling

def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())

Evaluation Hook Example

async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)

Metrics Collection

metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)

Tags and Metadata

Pass tags and metadata to enable conditional hooks:

response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

🤝 Contributing

Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.

📄 License

MIT License - see LICENSE file for details.

🔒 Security

Please see SECURITY.md for security policy and reporting vulnerabilities.

🙏 Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hookedllm-0.1.0.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hookedllm-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file hookedllm-0.1.0.tar.gz.

File metadata

  • Download URL: hookedllm-0.1.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for hookedllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 46d3c4cb9ea0f25b38f87e1ed9185258805f625374e5e54d9c58b496f69298c7
MD5 80b5683393046289062cb431bb2f0a34
BLAKE2b-256 32be8939106deae6c2950f8dc6ae647920e52d18a099bf9dbd92c116c4405f3b

See more details on using hashes here.

File details

Details for the file hookedllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hookedllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for hookedllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b063a71ffdb6ec7ad0355a569ee8a983d6611cba9f32b3f685c981376ef78ea0
MD5 3f4b8f5ce52d142d52d21f0a7daf11c9
BLAKE2b-256 18599aa60f92ff733b3f2df64a35ecadd49de4ac0146859b31c7e5963e015858

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page