Skip to main content

Production-ready LLM client with audit trail, deterministic caching, and provenance tracking

Project description

LLM Ledger

A production-ready Python SDK wrapping LiteLLM with deterministic caching, provenance tracking, and full audit trail.

Features

  • Deterministic Caching - SHA256-based content hashing for reproducible results
  • Provenance Tracking - Full metadata and source tracking
  • Audit Trail - Complete ledger of all LLM interactions
  • Token Accounting - Automatic token usage and cost estimation
  • Multi-Provider - Supports all LiteLLM providers (Anthropic, OpenAI, Azure, etc.)
  • Request Persistence - SQLite/PostgreSQL storage with full audit trail
  • Retry Logic - Automatic retry with exponential backoff via LiteLLM
  • Async Support - Full async/await support for high throughput
  • Type Safe - Complete type hints with Pydantic models

Installation

pip install llm-ledger

Or install from source:

git clone <repo>
cd llm-ledger
pip install -e .

Quick Start

Basic Usage

from llm_ledger import LedgerClient

client = LedgerClient()
response = client.quick_complete("Explain quantum computing in simple terms")
print(response)

Fluent Builder Pattern

response = (
    client.completion()
    .model("claude-sonnet-4")
    .system("You are a helpful assistant")
    .user("Summarize the key points from this document...")
    .temperature(0.0)
    .max_tokens(4000)
    .with_metadata(
        workflow_id="document_processing",
        chunk_id="section_3"
    )
    .execute()
)

print(f"Response: {response.content}")
print(f"Tokens: {response.usage.total_tokens}")
print(f"Cost: ${response.cost_estimate:.4f}")

With Full Provenance

from llm_ledger import LLMRequest, ProvenanceMetadata

metadata = ProvenanceMetadata(
    workflow_id="data_processing",
    chunk_id="chunk_12",
    section_id="section_3.2",
    source_id="document_v2.json",
    character_range=(1500, 2300),
    tags={"stage": "extraction"}
)

request = LLMRequest(
    model="claude-sonnet-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Analyze this text..."}
    ],
    temperature=0.0,
    metadata=metadata
)

response = client.complete(request)

Async Usage

import asyncio

async def process_chunks():
    requests = [create_request(chunk) for chunk in chunks]
    responses = await asyncio.gather(*[
        client.complete_async(req) for req in requests
    ])
    return responses

responses = asyncio.run(process_chunks())

Configuration

Environment Variables

Create a .env file:

# API Keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Database
LLM_DATABASE_URL=postgresql://user:pass@localhost/llm_gateway

# Cache
LLM_CACHE_BACKEND=redis  # or "memory"
REDIS_URL=redis://localhost:6379/0
LLM_CACHE_TTL=3600  # seconds, 0 = no expiration

# Features
LLM_ENABLE_CACHE=true
LLM_ENABLE_PERSISTENCE=true

# Defaults
LLM_DEFAULT_MODEL=claude-sonnet-4
LLM_DEFAULT_TEMPERATURE=0.0
LLM_DEFAULT_MAX_TOKENS=4000

Programmatic Configuration

from llm_ledger import LedgerClient
from llm_ledger.cache import RedisCache

client = LedgerClient(
    cache_backend=RedisCache("redis://localhost:6379/0"),
    database_url="postgresql://localhost/llm_gateway",
    enable_cache=True,
    enable_persistence=True,
    default_model="claude-sonnet-4"
)

Caching

The SDK uses deterministic SHA256-based caching:

# First call - executes against LLM
response1 = client.quick_complete("What is machine learning?", temperature=0.0)
print(response1.from_cache)  # False
print(response1.latency_ms)  # e.g., 1200ms

# Second call - returns from cache
response2 = client.quick_complete("What is machine learning?", temperature=0.0)
print(response2.from_cache)  # True
print(response2.cache_key)   # SHA256 hash

# Different parameters = cache miss
response3 = client.quick_complete("What is machine learning?", temperature=0.5)
print(response3.from_cache)  # False

Cache statistics:

stats = client.get_stats()
print(stats["cache_hit_rate"])  # e.g., 0.67

Persistence & Querying

All requests and responses are automatically persisted:

# Query by provenance metadata
requests = client.query_requests(
    workflow_id="data_processing",
    chunk_id="chunk_12"
)

# Retrieve specific request/response
request = client.get_request(request_id)
response = client.get_response(response_id)

# Get token usage summary
from datetime import datetime
usage = client.persistence.get_token_usage_summary(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 12, 31)
)

Advanced Features

Custom Retry Logic

LiteLLM handles retries automatically, configure via:

response = client.complete(
    request,
    num_retries=5,
    timeout=600  # 10 minutes
)

Batch Processing

requests = [create_request(text) for text in texts]
responses = await asyncio.gather(*[
    client.complete_async(req) for req in requests
])

Cache Control

# Bypass cache
response = client.complete(request, use_cache=False)

# Clear entire cache
client.clear_cache()

# Invalidate specific request
client.cache.invalidate(request)

Production Deployment

With PostgreSQL and Redis

from llm_ledger import LedgerClient
from llm_ledger.cache import RedisCache

client = LedgerClient(
    cache_backend=RedisCache("redis://prod-redis:6379/0"),
    database_url="postgresql://user:pass@prod-db/llm_gateway",
    enable_cache=True,
    enable_persistence=True,
    cache_ttl=86400  # 24 hours
)

With Docker Compose

services:
  app:
    environment:
      - LLM_DATABASE_URL=postgresql://postgres:password@db/llm_gateway
      - REDIS_URL=redis://redis:6379/0
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
  
  db:
    image: postgres:15
    
  redis:
    image: redis:7

Testing

from llm_ledger.testing import MockLLMClient

# For unit tests
mock_client = MockLLMClient()
mock_client.add_response("test prompt", "test response")

response = mock_client.quick_complete("test prompt")
assert response == "test response"

License

Apache 2.0 License

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_ledger-0.1.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_ledger-0.1.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_ledger-0.1.1.tar.gz.

File metadata

  • Download URL: llm_ledger-0.1.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for llm_ledger-0.1.1.tar.gz
Algorithm Hash digest
SHA256 885b4d2879673e77a6dbc8410d05d2789376206ffeb743de27c76529cbe62db3
MD5 423833fecd91b75e390a90005960b821
BLAKE2b-256 b628ae2b5f0c945068a97386ea1142e5e44d84fa580eb1b78da626acbe9c0ac9

See more details on using hashes here.

File details

Details for the file llm_ledger-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_ledger-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for llm_ledger-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 08f75b6bfdc8f32e960de307267696170e2b02e9cabc56a7eb4dcf702c24da17
MD5 603e90381d1debfe6cf6d1bc1a9b97fc
BLAKE2b-256 cd8703de8e36718f6b52fe0c31d79bf8bb699262d882e4f27ae0d1bbdfbf9e3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page