Shared LLM calling and event-sourced storage primitive for benchmark systems.

These details have not been verified by PyPI

Project links

Project description

dr-llm

dr-llm is a shared primitive for:

provider-agnostic LLM calls (API and headless)
canonical PostgreSQL recording/query storage
event-sourced multistep sessions with tool-calling
worker-safe parallel tool execution with queue claiming
generic typed sample pools for benchmark storage
isolated per-project databases with backup/restore

It is intentionally domain-neutral so repos like nl_latents and unitbench can reuse it.

Core Capabilities

Unified call interface:
- LlmClient.query(LlmRequest) -> LlmResponse
Canonical storage (PostgreSQL):
- runs, calls, request/response payloads, artifacts
Session runtime:
- start, step, resume, cancel
- native tool strategy (if provider supports tools) + brokered fallback
Tool queue + workers:
- idempotent tool call enqueue
- concurrent worker claims via FOR UPDATE SKIP LOCKED
Replay:
- reconstruct message history from session_events
Sample pools:
- schema-driven typed key dimensions with auto-generated DDL
- no-replacement acquisition via claims table
- pending sample lifecycle (claim/promote/fail with FOR UPDATE SKIP LOCKED)
- top-up orchestration: acquire, wait for pending, generate, re-acquire
Project management:
- isolated per-project Postgres containers via Docker
- backup/restore with atomic swap

Install

uv add dr-llm

Quick verification:

uv run python -c "import dr_llm"

For maintainers, see the release runbook: docs/releasing.md.

Quick Start

1. Query a provider (no database required)

uv run dr-llm query \
  --provider openai \
  --model gpt-4.1 \
  --message "Hello, what's 2+2?" \
  --no-record

The --no-record flag skips database recording, so you can test providers without Postgres.

2. Start Postgres (for catalog and recording)

source ./scripts/start-test-postgres.sh

This starts a local Postgres container, applies schema migrations, and exports DR_LLM_DATABASE_URL and DR_LLM_TEST_DATABASE_URL into your shell. Use source (not ./) so the env vars persist.

3. Sync and list models

uv run dr-llm models sync --provider openai
uv run dr-llm models list --provider openai

Use --json on models list for full metadata output.

Available Providers

Provider	Aliases	Type	API Key Env
`openai`	—	OpenAI API	`OPENAI_API_KEY`
`openrouter`	—	OpenRouter API	`OPENROUTER_API_KEY`
`minimax`	—	MiniMax OpenAI-compat API	`MINIMAX_API_KEY`
`anthropic`	—	Anthropic API	`ANTHROPIC_API_KEY`
`google`	—	Google Gemini API	`GOOGLE_API_KEY`
`glm`	—	GLM (ZAI) API	`ZAI_API_KEY`
`codex`	`codex-cli`	Codex CLI (headless)	`OPENAI_API_KEY`
`claude-code`	`claude`	Claude Code CLI (headless)	`ANTHROPIC_API_KEY`
`claude-code-minimax`	`claude-minimax`	Claude Code via MiniMax	`MINIMAX_API_KEY`
`claude-code-kimi`	`claude-kimi`	Claude Code via Kimi	`KIMI_API_KEY`

Headless providers shell out to CLI tools (codex, claude) and redirect API traffic via environment variables. The MiniMax and Kimi variants point Claude Code at third-party Anthropic-compatible endpoints.

Some providers (MiniMax, Codex, Claude Code, Kimi) use static model lists for models sync since they don't expose a /models endpoint. The CLI will note when a list may be out of date and link to the provider's docs.

Configuration

Required for DB-backed workflows: DR_LLM_DATABASE_URL
Provider API keys: see the table above
GLM provider defaults to: https://api.z.ai/api/coding/paas/v4
MiniMax API provider defaults to: https://api.minimax.io/v1
Claude headless coding-plan presets:
- claude-code-minimax: routes via https://api.minimax.io/anthropic
- claude-code-kimi: routes via https://api.kimi.com/coding/
Model catalog overrides: config/model_overrides.json (YAML also supported)

CLI Reference

dr-llm providers

dr-llm models sync
dr-llm models list --provider openai
dr-llm models list --supports-reasoning --json
dr-llm models show --provider openrouter --model openai/o3-mini

dr-llm query \
  --provider openai \
  --model gpt-4.1 \
  --message "hello" \
  --no-record

dr-llm query \
  --provider openai \
  --model gpt-4.1 \
  --reasoning-json '{"effort":"high"}' \
  --message "hello"

dr-llm run start --run-type benchmark
dr-llm run finish --run-id <run_id> --status success
dr-llm run benchmark \
  --workers 128 \
  --total-operations 200000 \
  --warmup-operations 10000 \
  --max-in-flight 128 \
  --operation-mix-json '{"record_call":2,"session_roundtrip":1,"read_calls":1}' \
  --artifact-path .dr_llm/benchmarks/release-baseline.json

dr-llm session start \
  --provider openai \
  --model gpt-4.1 \
  --message "You are helpful" \
  --message "Solve this task"

dr-llm session step --session-id <session_id> --message "next"
dr-llm session resume --session-id <session_id>
dr-llm session cancel --session-id <session_id> --reason "stopped"

# brokered tool calls are queued by default; use workers
dr-llm tool worker run --tool-loader mypkg.tools:register_tools
# optional synchronous override for a single step:
dr-llm session step --session-id <session_id> --inline-tool-execution

dr-llm replay session --session-id <session_id>

dr-llm project create my-experiment
dr-llm project list
dr-llm project use my-experiment    # prints export DR_LLM_DATABASE_URL=...
dr-llm project start my-experiment
dr-llm project stop my-experiment
dr-llm project backup my-experiment
dr-llm project restore my-experiment backups/my-experiment-20260325.sql.gz
dr-llm project destroy my-experiment --yes-really-delete-everything

Benchmark command output:

{
  "artifact_path": ".dr_llm/benchmarks/release-baseline.json",
  "failed_operations": 0,
  "operations_per_second": 4231.8,
  "p50_latency_ms": 20.0,
  "p95_latency_ms": 200.0,
  "run_id": "run_abc123",
  "status": "success"
}

Reasoning + cost notes:

OpenAI-compatible adapters now accept LlmRequest.reasoning / --reasoning-json.
Reasoning text/details and reasoning token counts are normalized on LlmResponse.
Provider-returned cost fields (e.g. OpenRouter usage.cost variants) are normalized into LlmResponse.cost.
These are persisted in llm_call_responses alongside standard token usage.

Generation transcript logging (default on):

DR_LLM_GENERATION_LOG_ENABLED=true
DR_LLM_GENERATION_LOG_DIR=.dr_llm/generation_logs
DR_LLM_GENERATION_LOG_ROTATE_BYTES=104857600
DR_LLM_GENERATION_LOG_BACKUPS=10
DR_LLM_GENERATION_LOG_REDACT_SECRETS=true
DR_LLM_GENERATION_LOG_MAX_EVENT_BYTES=10485760

Python Example

from dr_llm import LlmClient, LlmRequest, Message, PostgresRepository, ToolRegistry

repo = PostgresRepository()
client = LlmClient(repository=repo)

response = client.query(
    LlmRequest(
        provider="openai",
        model="gpt-4.1",
        messages=[Message(role="user", content="hello")],
    )
)
print(response.text)

Adapter lifecycle note:

If you instantiate provider adapters directly, call adapter.close() when done (or use context manager form with ... as adapter:) to release underlying HTTP connections.

Pool Example

Pools provide schema-driven sample storage with no-replacement acquisition for benchmarks.

from dr_llm import (
    ColumnType, KeyColumn, PoolSchema, PoolStore, PoolService,
    PoolAcquireQuery, PoolAcquireResult,
)
from dr_llm.pool.models import PoolSample
from dr_llm.storage._runtime import StorageConfig, StorageRuntime

# 1. Declare a pool schema with typed key dimensions
schema = PoolSchema(
    name="my_benchmark",
    key_columns=[
        KeyColumn(name="provider"),
        KeyColumn(name="difficulty", type=ColumnType.integer),
    ],
)

# 2. Connect and create tables
runtime = StorageRuntime(StorageConfig(dsn="postgresql://..."))
store = PoolStore(schema, runtime)
store.init_schema()  # idempotent CREATE TABLE IF NOT EXISTS

# 3. Insert samples
store.insert_samples([
    PoolSample(
        key_values={"provider": "openai", "difficulty": 1},
        sample_idx=0,
        payload={"prompt": "What is 2+2?", "expected": "4"},
    ),
    PoolSample(
        key_values={"provider": "openai", "difficulty": 1},
        sample_idx=1,
        payload={"prompt": "What is 3+3?", "expected": "6"},
    ),
])

# 4. Acquire samples (no-replacement within a run)
result = store.acquire(PoolAcquireQuery(
    run_id="run_001",
    key_values={"provider": "openai", "difficulty": 1},
    n=2,
))
for sample in result.samples:
    print(sample.payload)

# 5. Or use PoolService for automatic top-up generation
service = PoolService(store)
result = service.acquire_or_generate(
    PoolAcquireQuery(
        run_id="run_002",
        key_values={"provider": "openai", "difficulty": 2},
        n=5,
    ),
    generator_fn=lambda key_values, deficit: [
        PoolSample(key_values=key_values, payload={"generated": True})
        for _ in range(deficit)
    ],
)

Testing

uv run ruff format
uv run ruff check --fix src/ tests/ scripts/
uv run ty check src
uv run pytest tests/ -v

Integration tests

Integration tests require a running Postgres instance. If the test container (dr-llm-pg-test on port 5433) is already running, tests work automatically — conftest.py sets the default DR_LLM_TEST_DATABASE_URL.

To start the test container from scratch:

source ./scripts/start-test-postgres.sh

Then run integration tests:

uv run pytest tests/ -v -m integration

If integration tests are skipped unexpectedly, include skip reasons:

uv run pytest tests/ -v -m integration -rs

Demo Scripts

End-to-end query flow

Creates a project, records queries, verifies backup/restore:

./scripts/demo-query-flow.sh

Requires Docker and at least one of OPENAI_API_KEY or ANTHROPIC_API_KEY.

Pool provider demo

Queries all available LLM providers (API and headless) and stores results in a typed pool:

uv run python scripts/demo-pool-providers.py

Auto-detects available providers by checking API key env vars and CLI tool availability (claude, codex). For each provider: syncs the model catalog, selects a model, sends a query, and inserts the result into a pool. Prints a summary table at the end.

Options:

uv run python scripts/demo-pool-providers.py --project-name my-demo --prompt "Explain gravity in one sentence."

Requires Docker. Works with any combination of providers — set API keys and/or install CLI tools for the providers you want to test.

Milestone Closeout Artifacts

Milestone status: docs/milestones.md
Consumer rollout checklist: docs/consumer-rollout-checklist.md
M2b operations checklist: docs/ops/m2b-hardening-checklist.md
Compatibility contract: docs/compatibility-contract.md
Migration guide: docs/migration-guide.md
Integration notes:
- docs/integrations/nl_latents.md
- docs/integrations/unitbench.md
Example gateways:
- examples/nl_latents_gateway.py
- examples/unitbench_gateway.py

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.2.0

Apr 15, 2026

2.1.0

Apr 13, 2026

2.0.0

Apr 13, 2026

1.4.0

Apr 13, 2026

1.3.0

Apr 13, 2026

1.2.1

Apr 8, 2026

1.2.0

Apr 8, 2026

1.1.0

Apr 8, 2026

1.0.2

Mar 30, 2026

1.0.1

Mar 30, 2026

1.0.0

Mar 29, 2026

0.4.0

Mar 25, 2026

0.3.0

Mar 25, 2026

This version

0.2.0

Mar 25, 2026

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dr_llm-0.2.0.tar.gz (124.6 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dr_llm-0.2.0-py3-none-any.whl (103.9 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file dr_llm-0.2.0.tar.gz.

File metadata

Download URL: dr_llm-0.2.0.tar.gz
Upload date: Mar 25, 2026
Size: 124.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.0

File hashes

Hashes for dr_llm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`48aaa7620ad6e6409fc954ce63b58aab04184386bab01f8d335375bea113e97e`
MD5	`2ec5d78ac0e01de263c87670dcee8c0f`
BLAKE2b-256	`4ceaf4c4fc908630ad44177653a8658dc07f7e5377942dfba1599e8a6cd49289`

See more details on using hashes here.

File details

Details for the file dr_llm-0.2.0-py3-none-any.whl.

File metadata

Download URL: dr_llm-0.2.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 103.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.0

File hashes

Hashes for dr_llm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5aff5caae0b82497e2aec89800f7ae4e2bf3a8744e38d53158f15ea59401c8a`
MD5	`1ab3b6ac7e49cfb8d28e3b5187bd9a18`
BLAKE2b-256	`1d6f46fce86b31e3ebdd7a1d0dccd468c3b789a672d56b3f800d43c7aa87357f`

See more details on using hashes here.

dr-llm 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dr-llm

Core Capabilities

Install

Quick Start

1. Query a provider (no database required)

2. Start Postgres (for catalog and recording)

3. Sync and list models

Available Providers

Configuration

CLI Reference

Python Example

Pool Example

Testing

Integration tests

Demo Scripts

End-to-end query flow

Pool provider demo

Milestone Closeout Artifacts

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes