Skip to main content

Production-ready LLM client with audit trail, deterministic caching, and provenance tracking

Project description

LLM Ledger

A production-ready Python SDK wrapping LiteLLM with deterministic caching, provenance tracking, and full audit trail.

Features

  • Deterministic Caching - SHA256-based content hashing for reproducible results
  • Provenance Tracking - Full metadata and source tracking
  • Audit Trail - Complete ledger of all LLM interactions
  • Token Accounting - Automatic token usage and cost estimation
  • Multi-Provider - Supports all LiteLLM providers (Anthropic, OpenAI, Azure, etc.)
  • Request Persistence - SQLite/PostgreSQL storage with full audit trail
  • Retry Logic - Automatic retry with exponential backoff via LiteLLM
  • Async Support - Full async/await support for high throughput
  • Type Safe - Complete type hints with Pydantic models

Installation

pip install llm-ledger

Or install from source:

git clone <repo>
cd llm-ledger
pip install -e .

Quick Start

Basic Usage

from llm_ledger import LedgerClient

client = LedgerClient()
response = client.quick_complete("Explain quantum computing in simple terms")
print(response)

Fluent Builder Pattern

response = (
    client.completion()
    .model("claude-sonnet-4")
    .system("You are a helpful assistant")
    .user("Summarize the key points from this document...")
    .temperature(0.0)
    .max_tokens(4000)
    .with_metadata(
        workflow_id="document_processing",
        chunk_id="section_3"
    )
    .execute()
)

print(f"Response: {response.content}")
print(f"Tokens: {response.usage.total_tokens}")
print(f"Cost: ${response.cost_estimate:.4f}")

With Full Provenance

from llm_ledger import LLMRequest, ProvenanceMetadata

metadata = ProvenanceMetadata(
    workflow_id="data_processing",
    chunk_id="chunk_12",
    section_id="section_3.2",
    source_id="document_v2.json",
    character_range=(1500, 2300),
    tags={"stage": "extraction"}
)

request = LLMRequest(
    model="claude-sonnet-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Analyze this text..."}
    ],
    temperature=0.0,
    metadata=metadata
)

response = client.complete(request)

Async Usage

import asyncio

async def process_chunks():
    requests = [create_request(chunk) for chunk in chunks]
    responses = await asyncio.gather(*[
        client.complete_async(req) for req in requests
    ])
    return responses

responses = asyncio.run(process_chunks())

Configuration

Environment Variables

Create a .env file:

# API Keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Database
LLM_DATABASE_URL=postgresql://user:pass@localhost/llm_gateway

# Cache
LLM_CACHE_BACKEND=redis  # or "memory"
REDIS_URL=redis://localhost:6379/0
LLM_CACHE_TTL=3600  # seconds, 0 = no expiration

# Features
LLM_ENABLE_CACHE=true
LLM_ENABLE_PERSISTENCE=true

# Defaults
LLM_DEFAULT_MODEL=claude-sonnet-4
LLM_DEFAULT_TEMPERATURE=0.0
LLM_DEFAULT_MAX_TOKENS=4000

Programmatic Configuration

from llm_ledger import LedgerClient
from llm_ledger.cache import RedisCache

client = LedgerClient(
    cache_backend=RedisCache("redis://localhost:6379/0"),
    database_url="postgresql://localhost/llm_gateway",
    enable_cache=True,
    enable_persistence=True,
    default_model="claude-sonnet-4"
)

Caching

The SDK uses deterministic SHA256-based caching:

# First call - executes against LLM
response1 = client.quick_complete("What is machine learning?", temperature=0.0)
print(response1.from_cache)  # False
print(response1.latency_ms)  # e.g., 1200ms

# Second call - returns from cache
response2 = client.quick_complete("What is machine learning?", temperature=0.0)
print(response2.from_cache)  # True
print(response2.cache_key)   # SHA256 hash

# Different parameters = cache miss
response3 = client.quick_complete("What is machine learning?", temperature=0.5)
print(response3.from_cache)  # False

Cache statistics:

stats = client.get_stats()
print(stats["cache_hit_rate"])  # e.g., 0.67

Persistence & Querying

All requests and responses are automatically persisted:

# Query by provenance metadata
requests = client.query_requests(
    workflow_id="data_processing",
    chunk_id="chunk_12"
)

# Retrieve specific request/response
request = client.get_request(request_id)
response = client.get_response(response_id)

# Get token usage summary
from datetime import datetime
usage = client.persistence.get_token_usage_summary(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 12, 31)
)

Advanced Features

Custom Retry Logic

LiteLLM handles retries automatically, configure via:

response = client.complete(
    request,
    num_retries=5,
    timeout=600  # 10 minutes
)

Batch Processing

requests = [create_request(text) for text in texts]
responses = await asyncio.gather(*[
    client.complete_async(req) for req in requests
])

Cache Control

# Bypass cache
response = client.complete(request, use_cache=False)

# Clear entire cache
client.clear_cache()

# Invalidate specific request
client.cache.invalidate(request)

Production Deployment

With PostgreSQL and Redis

from llm_ledger import LedgerClient
from llm_ledger.cache import RedisCache

client = LedgerClient(
    cache_backend=RedisCache("redis://prod-redis:6379/0"),
    database_url="postgresql://user:pass@prod-db/llm_gateway",
    enable_cache=True,
    enable_persistence=True,
    cache_ttl=86400  # 24 hours
)

With Docker Compose

services:
  app:
    environment:
      - LLM_DATABASE_URL=postgresql://postgres:password@db/llm_gateway
      - REDIS_URL=redis://redis:6379/0
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
  
  db:
    image: postgres:15
    
  redis:
    image: redis:7

Testing

from llm_ledger.testing import MockLLMClient

# For unit tests
mock_client = MockLLMClient()
mock_client.add_response("test prompt", "test response")

response = mock_client.quick_complete("test prompt")
assert response == "test response"

License

Apache 2.0 License

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_ledger-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_ledger-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_ledger-0.1.0.tar.gz.

File metadata

  • Download URL: llm_ledger-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for llm_ledger-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1501327888301219eabe0dfec90bc605a5ff6b2f48bda6fa11ae949fe47fabe
MD5 9f288b80a0ae0b6575a898fbda38d0fa
BLAKE2b-256 d356aeeefc1439f7b6ac47dbb82842860f039cef8b25d5468ff27998c7d3ef12

See more details on using hashes here.

File details

Details for the file llm_ledger-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_ledger-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for llm_ledger-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f3d8fb38881aa46fb4801e735e109a0ee6f2fed7f261a61bbbde131c4d04f02
MD5 41892e79605826f2feeeb4e1c312528b
BLAKE2b-256 4b293557febe6d20c2db50599958c6e37a07f4a90b572f5882a5934a481b1d42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page