Skip to main content

Async-first, scoped hook system for LLM observability

Project description

HookedLLM

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

Python 3.10+ License: MIT Documentation

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

✨ Key Features

  • 🎯 Scoped Isolation: Named scopes prevent hook interference across application contexts
  • 🔧 SOLID/DI Compliant: Full dependency injection support for testing and customization
  • 📦 Minimal Surface: Single import, simple API: import hookedllm
  • ⚡ Async-First: Built for modern async LLM SDKs
  • 🎨 Type-Safe: Full type hints and IDE autocomplete support
  • 🛡️ Resilient: Hook failures never break your LLM calls
  • 🔀 Conditional Execution: Run hooks only when rules match (model, tags, metadata)
  • ⚙️ Config or Code: Define hooks programmatically or via YAML

🚀 Quick Start

Installation

# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With Anthropic/Claude support
pip install hookedllm[anthropic]

# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]

# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]

Basic Usage

With OpenAI:

import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

With Anthropic/Claude:

import hookedllm
from anthropic import AsyncAnthropic

# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
    print(f"Provider: {context.provider}, Model: {call_input.model}")
    if call_output.usage:
        total = call_output.usage.get("total_tokens", 0)
        print(f"Tokens: {total}")

# Register hook
hookedllm.scope("evaluation").after(log_usage)

# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"hookedllm_tags": ["example"]}  # Note: Anthropic uses metadata, not extra_body
)

📖 Examples

Explore the examples/ directory for complete, runnable demonstrations:

Getting Started

  • simple_demo.py - Your first hookedllm program

    • Complete working example with real LLM calls
    • Automatic metrics tracking with MetricsHook
    • Response evaluation with EvaluationHook
    • Perfect starting point for new users
  • basic_usage.py - Core concepts walkthrough

    • Simple hook registration
    • Scoped vs global hooks
    • Conditional rules with when
    • Multiple scope usage

Advanced Features

  • global_hooks_demo.py - Global hooks in action

    • 5 different LLM calls with global before/after hooks
    • Shows all data provided by the framework
    • Demonstrates hook execution flow
    • Metrics aggregation across calls
  • scopes_demo.py - Scope isolation deep dive

    • Prevents hook interference across contexts
    • Development vs production vs evaluation scopes
    • Multi-scope client usage
    • Real-world use case examples
  • evaluation_and_metrics.py - Built-in helpers

    • Using MetricsHook for automatic tracking
    • Using EvaluationHook for quality scoring
    • Conditional evaluation (only for specific models)
    • Multiple scope combinations

Integrations

  • integrations/langfuse_integration.py - Langfuse observability

    • Automatic trace and generation tracking
    • Token usage and cost monitoring
    • Error tracking with full context
    • Metadata enrichment
  • integrations/opentelemetry_integration.py - OpenTelemetry tracing

    • Distributed tracing for LLM calls
    • Semantic conventions for LLM observability
    • Span creation with attributes and events
    • Integration with existing OTel infrastructure

Running the Examples

# Install with OpenAI support
pip install -e .[openai]

# Or install with Anthropic support
pip install -e .[anthropic]

# Or install with both
pip install -e .[openai,anthropic]

# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here

# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py  # Anthropic example
python examples/integrations/langfuse_integration.py

Each example includes:

  • ✅ Complete, runnable code
  • 📝 Detailed inline comments
  • 🚀 Setup instructions
  • 💡 Real-world use cases
  • 🎯 Best practices

📚 Core Concepts

Scopes

Scopes isolate hooks to specific parts of your application:

# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!

Hook Types

Four hook types cover the entire call lifecycle:

# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)

Conditional Rules

Execute hooks only when conditions match:

# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)

Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

Multiple Scopes

Clients can use multiple scopes:

hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate

🧪 Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called

🏗️ Architecture

HookedLLM follows SOLID principles with full dependency injection:

  • Single Responsibility: Separate storage, execution, and registry
  • Dependency Inversion: Depends on Protocol abstractions
  • Liskov Substitution: Any implementation of protocols works
  • Interface Segregation: Focused, minimal interfaces
  • Open/Closed: Extend via hooks and rules without modifying core

See ARCHITECTURE.md for detailed design documentation.

📖 Advanced Usage

Custom Error Handling

def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())

Evaluation Hook Example

async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)

Metrics Collection

metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)

Tags and Metadata

Pass tags and metadata to enable conditional hooks:

OpenAI (uses extra_body):

response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

Anthropic (uses metadata):

response = await client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[...],
    metadata={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

🤝 Contributing

Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.

📄 License

MIT License - see LICENSE file for details.

🔒 Security

Please see SECURITY.md for security policy and reporting vulnerabilities.

🙏 Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hookedllm-0.2.1.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hookedllm-0.2.1-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file hookedllm-0.2.1.tar.gz.

File metadata

  • Download URL: hookedllm-0.2.1.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hookedllm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 04fbdd3e0e50d0b880adce63a429d39caf8a26cbf0e9d1cd123443ae145dc625
MD5 7315c7d5e62053a97ac1547a8a9ac9d7
BLAKE2b-256 52c7932e6ca5e43bd037f2e7fe1db736b436ba55fa93287a8d4edf2c7815dcaa

See more details on using hashes here.

File details

Details for the file hookedllm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: hookedllm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hookedllm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1f14af7c085a9914b3bed7def5c2db7579c147fa10de772dec6428d528bd1ff
MD5 585a7692e70a26e2e7b51762a283caca
BLAKE2b-256 c9a056006b53f4b861296fb8dac280de21bb4f459436d88686960a1813d19517

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page