Skip to main content

Unified Python interface for multiple LLM providers with cost tracking

Project description

majordomo-llm

PyPI version Python 3.12+ License: MIT Docs

A unified Python interface for multiple LLM providers with automatic cost tracking, retry logic, and structured output support.

Features

  • Unified API - Same interface for OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, and Cohere
  • Streaming - Real-time token-by-token output via get_response_stream() with async iteration
  • Cost Tracking - Automatic calculation of input/output token costs per request
  • Structured Outputs - Native support for Pydantic models as response schemas
  • Automatic Retries - Built-in exponential backoff retry logic using tenacity
  • Automatic Fallback - Cascade across providers with LLMCascade for resilience
  • Request Logging - Optional async logging to PostgreSQL/MySQL/SQLite with S3 or local file storage for request/response bodies
  • API Key Tracking - Log hashed API keys and optional aliases for usage attribution
  • Async First - Fully async/await compatible for high-performance applications
  • Type Safe - Complete type annotations and py.typed marker for IDE support

Installation

pip install majordomo-llm

Or with uv:

uv add majordomo-llm

Optional: Request Logging

To enable request logging to PostgreSQL, MySQL, or S3:

pip install majordomo-llm[logging]

Quick Start

Basic Text Response

import asyncio
from majordomo_llm import get_llm_instance

async def main():
    # Create an LLM instance
    llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")

    # Get a response
    response = await llm.get_response(
        user_prompt="What is the capital of France?",
        system_prompt="You are a helpful geography assistant.",
    )

    print(response.content)
    print(f"Tokens: {response.input_tokens} in, {response.output_tokens} out")
    print(f"Cost: ${response.total_cost:.6f}")

asyncio.run(main())

JSON Response

response = await llm.get_json_response(
    user_prompt="List the top 3 largest countries by area as JSON",
    system_prompt="Respond with valid JSON only.",
)

# response.content is a parsed Python dict
for country in response.content["countries"]:
    print(country["name"])

Streaming

stream = await llm.get_response_stream(
    user_prompt="Explain quantum computing",
    system_prompt="Be concise.",
)

async for chunk in stream:
    print(chunk, end="", flush=True)

print(f"\nCost: ${stream.usage.total_cost:.6f}")

# Or collect the full response:
stream = await llm.get_response_stream("Summarize this document...")
response = await stream.collect()  # Returns an LLMResponse
print(response.content)

Structured Output with Pydantic

from pydantic import BaseModel

class CountryInfo(BaseModel):
    name: str
    capital: str
    population: int
    area_km2: float

response = await llm.get_structured_json_response(
    response_model=CountryInfo,
    user_prompt="Give me information about Japan",
)

# response.content is a validated CountryInfo instance
country = response.content
print(f"{country.name}: {country.capital}, pop. {country.population:,}")

Configuration

Environment Variables

Set API keys for the providers you want to use:

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini
export GEMINI_API_KEY="..."

# DeepSeek
export DEEPSEEK_API_KEY="sk-..."

# Cohere
export CO_API_KEY="..."

For local development, copy .env.example to .env and fill in your keys. Never commit .env.

Available Models

OpenAI

  • gpt-5, gpt-5-mini, gpt-5-nano
  • gpt-4o, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano

Anthropic

  • claude-sonnet-4-5-20250929, claude-opus-4-1-20250805
  • claude-opus-4-20250514, claude-sonnet-4-20250514
  • claude-3-7-sonnet-latest, claude-3-5-haiku-latest

Gemini

  • gemini-2.5-flash, gemini-2.5-flash-lite
  • gemini-2.0-flash, gemini-2.0-flash-lite

DeepSeek

  • deepseek-chat, deepseek-reasoner

Cohere

  • command-a-03-2025, command-r-plus-08-2024
  • command-r-08-2024, command-r7b-12-2024

API Reference

Factory Functions

get_llm_instance(provider: str, model: str) -> LLM

Create an LLM instance for the specified provider and model.

from majordomo_llm import get_llm_instance

llm = get_llm_instance("openai", "gpt-4o")

LLM Methods

All LLM instances support these async methods:

get_response(user_prompt, system_prompt=None, temperature=0.3, top_p=1.0) -> LLMResponse

Get a plain text response.

get_json_response(user_prompt, system_prompt=None, temperature=0.3, top_p=1.0) -> LLMJSONResponse

Get a JSON response (automatically parsed).

get_response_stream(user_prompt, system_prompt=None, temperature=0.3, top_p=1.0) -> LLMStreamResponse

Get a streaming text response. Yields chunks via async iteration; usage metrics are available after the stream completes.

get_structured_json_response(response_model, user_prompt, system_prompt=None, temperature=0.3, top_p=1.0) -> LLMStructuredResponse

Get a response validated against a Pydantic model.

Response Objects

All response objects include usage metrics:

Field Type Description
content str / dict / BaseModel The response content
input_tokens int Number of input tokens
output_tokens int Number of output tokens
cached_tokens int Number of cached tokens (if applicable)
input_cost float Cost for input tokens (USD)
output_cost float Cost for output tokens (USD)
total_cost float Total cost (USD)
response_time float Response time in seconds

Advanced Usage

Automatic Fallback with LLMCascade

Use LLMCascade for automatic failover between providers:

from majordomo_llm import LLMCascade

# Providers are tried in order - first is primary, rest are fallbacks
cascade = LLMCascade([
    ("anthropic", "claude-sonnet-4-20250514"),  # Primary
    ("openai", "gpt-4o"),                        # First fallback
    ("gemini", "gemini-2.5-flash"),              # Last resort
])

# If Anthropic fails, automatically tries OpenAI, then Gemini
response = await cascade.get_response("Hello!")

All response methods (get_response, get_json_response, get_structured_json_response, get_response_stream) support automatic fallback.

Direct Provider Access

You can also instantiate providers directly for more control:

from majordomo_llm import Anthropic

llm = Anthropic(
    model="claude-sonnet-4-20250514",
    input_cost=3.0,    # per million tokens
    output_cost=15.0,  # per million tokens
)

Web Search (Anthropic)

Enable web search for supported Claude models:

from majordomo_llm.providers.anthropic import Anthropic

llm = Anthropic(
    model="claude-sonnet-4-5-20250929",
    input_cost=3.0,
    output_cost=15.0,
    use_web_search=True,
)

Request Logging

Log all LLM requests asynchronously to a database with optional storage for request/response bodies. Logging is fire-and-forget and does not block your main request flow.

from majordomo_llm import get_llm_instance
from majordomo_llm.logging import LoggingLLM, PostgresAdapter, S3Adapter

async def main():
    # Create your LLM instance
    llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")

    # Set up database adapter (PostgreSQL, MySQL, or SQLite)
    db = await PostgresAdapter.create(
        host="localhost",
        port=5432,
        database="llm_logs",
        user="postgres",
        password="password",
    )

    # Optional: Set up S3 for storing request/response bodies
    storage = await S3Adapter.create(
        bucket="my-llm-logs",
        prefix="requests",  # optional, defaults to "llm-logs"
    )

    # Wrap your LLM with logging
    logged_llm = LoggingLLM(llm, db, storage)

    # Use as normal - all requests are logged automatically
    response = await logged_llm.get_response("Hello!")

    # Don't forget to close connections when done
    await logged_llm.close()

Local Development Setup

For local development and testing, use SQLite and local file storage:

from majordomo_llm import get_llm_instance
from majordomo_llm.logging import LoggingLLM, SqliteAdapter, FileStorageAdapter

async def main():
    llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")

    # SQLite for metrics (auto-creates database and table)
    db = await SqliteAdapter.create("llm_logs.db")

    # Local file storage for request/response bodies
    storage = await FileStorageAdapter.create("./request_logs")

    logged_llm = LoggingLLM(llm, db, storage)
    response = await logged_llm.get_response("Hello!")

    await logged_llm.close()

API Key Tracking

Track which API key was used for each request with optional human-readable aliases:

from majordomo_llm.providers.anthropic import Anthropic

# Create LLM with API key alias for attribution
llm = Anthropic(
    model="claude-sonnet-4-20250514",
    input_cost=3.0,
    output_cost=15.0,
    api_key_alias="production-team-1",  # Optional human-readable name
)

# The LoggingLLM wrapper automatically logs:
# - api_key_hash: First 16 chars of SHA256 hash (safe for logging)
# - api_key_alias: Your custom name (e.g., "production-team-1")

This is useful for:

  • Tracking costs per team or application
  • Debugging which key was used for specific requests
  • Auditing API key usage patterns

Database Schema

Create the logging table using the included schema:

CREATE TABLE IF NOT EXISTS llm_requests (
    request_id VARCHAR(36) PRIMARY KEY,
    provider VARCHAR(50) NOT NULL,
    model VARCHAR(100) NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    response_time FLOAT,
    input_tokens INTEGER,
    output_tokens INTEGER,
    cached_tokens INTEGER,
    input_cost DECIMAL(10, 8),
    output_cost DECIMAL(10, 8),
    total_cost DECIMAL(10, 8),
    s3_request_key VARCHAR(255),
    s3_response_key VARCHAR(255),
    status VARCHAR(20) NOT NULL,
    error_message TEXT,
    api_key_hash VARCHAR(16),
    api_key_alias VARCHAR(100)
);

Available Adapters

Database Adapters:

  • PostgresAdapter - PostgreSQL via asyncpg
  • MySQLAdapter - MySQL via aiomysql
  • SqliteAdapter - SQLite via aiosqlite (great for local development)

Storage Adapters:

  • S3Adapter - AWS S3 via aioboto3
  • FileStorageAdapter - Local filesystem (great for local development)

Development

Setup

git clone https://github.com/superset-studio/majordomo-llm.git
cd majordomo-llm
uv sync --all-extras

Running Tests

uv run pytest

Type Checking

uv run mypy src/majordomo_llm

Linting

uv run ruff check src/majordomo_llm

Documentation

Build and preview the docs locally:

uv add --dev mkdocs mkdocs-material mkdocstrings[python] pymdown-extensions
uv run mkdocs serve

Pre-commit Hooks & Checks

Enable local checks (using uvx):

uvx pre-commit install
uvx pre-commit run --all-files

Hooks include private-key detection and basic hygiene checks. See .pre-commit-config.yaml.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

majordomo_llm-0.3.1.tar.gz (184.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

majordomo_llm-0.3.1-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file majordomo_llm-0.3.1.tar.gz.

File metadata

  • Download URL: majordomo_llm-0.3.1.tar.gz
  • Upload date:
  • Size: 184.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for majordomo_llm-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9e4d34b2079fc65d1e627d26b1c258f0e8828d426327756cbd97fb4fcfbebe39
MD5 dc2ffe2b6a3004d01e7939277b7e870b
BLAKE2b-256 ac4f517f3e7fa782b33b7e224be9c1deb96dbcc2e8cd12e7ab4698f5884e035b

See more details on using hashes here.

File details

Details for the file majordomo_llm-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: majordomo_llm-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for majordomo_llm-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2dfd09c3ef899ac15959e02aa55aa6ef06cda268cd21fdad8acf076c03c0c2e4
MD5 ab647d8e155f2ae21b7d2c29d30fdfd3
BLAKE2b-256 2e541b0f4512c9e9dfa218993a969d78f1c085a29715e957b2d7a1fabbf302fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page