Async-first, scoped hook system for LLM observability

These details have not been verified by PyPI

Project links

Project description

HookedLLM

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

✨ Key Features

🎯 Scoped Isolation: Named scopes prevent hook interference across application contexts
🔧 SOLID/DI Compliant: Full dependency injection support for testing and customization
📦 Minimal Surface: Single import, simple API: import hookedllm
⚡ Async-First: Built for modern async LLM SDKs
🎨 Type-Safe: Full type hints and IDE autocomplete support
🛡️ Resilient: Hook failures never break your LLM calls
🔀 Conditional Execution: Run hooks only when rules match (model, tags, metadata)
⚙️ Config or Code: Define hooks programmatically or via YAML

🚀 Quick Start

Installation

# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With Anthropic/Claude support
pip install hookedllm[anthropic]

# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]

# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]

Basic Usage

With OpenAI:

import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

With Anthropic/Claude:

import hookedllm
from anthropic import AsyncAnthropic

# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
    print(f"Provider: {context.provider}, Model: {call_input.model}")
    if call_output.usage:
        total = call_output.usage.get("total_tokens", 0)
        print(f"Tokens: {total}")

# Register hook
hookedllm.scope("evaluation").after(log_usage)

# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"hookedllm_tags": ["example"]}  # Note: Anthropic uses metadata, not extra_body
)

📖 Examples

Explore the examples/ directory for complete, runnable demonstrations:

Getting Started

simple_demo.py - Your first hookedllm program
- Complete working example with real LLM calls
- Automatic metrics tracking with MetricsHook
- Response evaluation with EvaluationHook
- Perfect starting point for new users
basic_usage.py - Core concepts walkthrough
- Simple hook registration
- Scoped vs global hooks
- Conditional rules with when
- Multiple scope usage

Advanced Features

global_hooks_demo.py - Global hooks in action
- 5 different LLM calls with global before/after hooks
- Shows all data provided by the framework
- Demonstrates hook execution flow
- Metrics aggregation across calls
scopes_demo.py - Scope isolation deep dive
- Prevents hook interference across contexts
- Development vs production vs evaluation scopes
- Multi-scope client usage
- Real-world use case examples
evaluation_and_metrics.py - Built-in helpers
- Using MetricsHook for automatic tracking
- Using EvaluationHook for quality scoring
- Conditional evaluation (only for specific models)
- Multiple scope combinations

Integrations

integrations/langfuse_integration.py - Langfuse observability
- Automatic trace and generation tracking
- Token usage and cost monitoring
- Error tracking with full context
- Metadata enrichment
integrations/opentelemetry_integration.py - OpenTelemetry tracing
- Distributed tracing for LLM calls
- Semantic conventions for LLM observability
- Span creation with attributes and events
- Integration with existing OTel infrastructure

Running the Examples

# Install with OpenAI support
pip install -e .[openai]

# Or install with Anthropic support
pip install -e .[anthropic]

# Or install with both
pip install -e .[openai,anthropic]

# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here

# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py  # Anthropic example
python examples/integrations/langfuse_integration.py

Each example includes:

✅ Complete, runnable code
📝 Detailed inline comments
🚀 Setup instructions
💡 Real-world use cases
🎯 Best practices

📚 Core Concepts

Scopes

Scopes isolate hooks to specific parts of your application:

# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!

Hook Types

Four hook types cover the entire call lifecycle:

# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)

Conditional Rules

Execute hooks only when conditions match:

# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)

Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

Multiple Scopes

Clients can use multiple scopes:

hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate

🧪 Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called

🏗️ Architecture

HookedLLM follows SOLID principles with full dependency injection:

Single Responsibility: Separate storage, execution, and registry
Dependency Inversion: Depends on Protocol abstractions
Liskov Substitution: Any implementation of protocols works
Interface Segregation: Focused, minimal interfaces
Open/Closed: Extend via hooks and rules without modifying core

See ARCHITECTURE.md for detailed design documentation.

📖 Advanced Usage

Custom Error Handling

def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())

Evaluation Hook Example

async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)

Metrics Collection

metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)

Tags and Metadata

Pass tags and metadata to enable conditional hooks:

OpenAI (uses extra_body):

response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

Anthropic (uses metadata):

response = await client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[...],
    metadata={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

🤝 Contributing

Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.

📄 License

MIT License - see LICENSE file for details.

🔒 Security

Please see SECURITY.md for security policy and reporting vulnerabilities.

🙏 Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Jan 28, 2026

0.1.0

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hookedllm-0.2.1.tar.gz (33.3 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hookedllm-0.2.1-py3-none-any.whl (30.7 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file hookedllm-0.2.1.tar.gz.

File metadata

Download URL: hookedllm-0.2.1.tar.gz
Upload date: Jan 28, 2026
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hookedllm-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`04fbdd3e0e50d0b880adce63a429d39caf8a26cbf0e9d1cd123443ae145dc625`
MD5	`7315c7d5e62053a97ac1547a8a9ac9d7`
BLAKE2b-256	`52c7932e6ca5e43bd037f2e7fe1db736b436ba55fa93287a8d4edf2c7815dcaa`

See more details on using hashes here.

File details

Details for the file hookedllm-0.2.1-py3-none-any.whl.

File metadata

Download URL: hookedllm-0.2.1-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hookedllm-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1f14af7c085a9914b3bed7def5c2db7579c147fa10de772dec6428d528bd1ff`
MD5	`585a7692e70a26e2e7b51762a283caca`
BLAKE2b-256	`c9a056006b53f4b861296fb8dac280de21bb4f459436d88686960a1813d19517`

See more details on using hashes here.

hookedllm 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HookedLLM

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

📖 Examples

Getting Started

Advanced Features

Integrations

Running the Examples

📚 Core Concepts

Scopes

Hook Types

Conditional Rules

Global + Scoped Hooks

Multiple Scopes

🧪 Testing with Dependency Injection

🏗️ Architecture

📖 Advanced Usage

Custom Error Handling

Evaluation Hook Example

Metrics Collection

Tags and Metadata

🤝 Contributing

📄 License

🔒 Security

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes