Shared LLM calling and event-sourced storage primitive for benchmark systems.
Project description
dr-llm
dr-llm is a shared primitive for:
- provider-agnostic LLM calls (API and headless)
- canonical PostgreSQL recording/query storage
- event-sourced multistep sessions with tool-calling
- worker-safe parallel tool execution with queue claiming
- generic typed sample pools for benchmark storage
- isolated per-project databases with backup/restore
It is intentionally domain-neutral so repos like nl_latents and unitbench can reuse it.
Core Capabilities
- Unified call interface:
LlmClient.query(LlmRequest) -> LlmResponse
- Canonical storage (PostgreSQL):
- runs, calls, request/response payloads, artifacts
- Session runtime:
start,step,resume,cancel- native tool strategy (if provider supports tools) + brokered fallback
- Tool queue + workers:
- idempotent tool call enqueue
- concurrent worker claims via
FOR UPDATE SKIP LOCKED
- Replay:
- reconstruct message history from
session_events
- reconstruct message history from
- Sample pools:
- schema-driven typed key dimensions with auto-generated DDL
- no-replacement acquisition via claims table
- pending sample lifecycle (claim/promote/fail with
FOR UPDATE SKIP LOCKED) - top-up orchestration: acquire, wait for pending, generate, re-acquire
- Project management:
- isolated per-project Postgres containers via Docker
- backup/restore with atomic swap
Install
uv add dr-llm
Quick verification:
uv run python -c "import dr_llm"
For maintainers, see the release runbook: docs/releasing.md.
Quick Start
1. Query a provider (no database required)
uv run dr-llm query \
--provider openai \
--model gpt-4.1 \
--message "Hello, what's 2+2?" \
--no-record
The --no-record flag skips database recording, so you can test providers without Postgres.
2. Start Postgres (for catalog and recording)
source ./scripts/start-test-postgres.sh
This starts a local Postgres container, applies schema migrations, and exports DR_LLM_DATABASE_URL and DR_LLM_TEST_DATABASE_URL into your shell. Use source (not ./) so the env vars persist.
3. Sync and list models
uv run dr-llm models sync --provider openai
uv run dr-llm models list --provider openai
Use --json on models list for full metadata output.
Available Providers
| Provider | Aliases | Type | API Key Env |
|---|---|---|---|
openai |
— | OpenAI API | OPENAI_API_KEY |
openrouter |
— | OpenRouter API | OPENROUTER_API_KEY |
minimax |
— | MiniMax OpenAI-compat API | MINIMAX_API_KEY |
anthropic |
— | Anthropic API | ANTHROPIC_API_KEY |
google |
— | Google Gemini API | GOOGLE_API_KEY |
glm |
— | GLM (ZAI) API | ZAI_API_KEY |
codex |
codex-cli |
Codex CLI (headless) | OPENAI_API_KEY |
claude-code |
claude |
Claude Code CLI (headless) | ANTHROPIC_API_KEY |
claude-code-minimax |
claude-minimax |
Claude Code via MiniMax | MINIMAX_API_KEY |
claude-code-kimi |
claude-kimi |
Claude Code via Kimi | KIMI_API_KEY |
Headless providers shell out to CLI tools (codex, claude) and redirect API traffic via environment variables. The MiniMax and Kimi variants point Claude Code at third-party Anthropic-compatible endpoints.
Some providers (MiniMax, Codex, Claude Code, Kimi) use static model lists for models sync since they don't expose a /models endpoint. The CLI will note when a list may be out of date and link to the provider's docs.
Configuration
- Required for DB-backed workflows:
DR_LLM_DATABASE_URL - Provider API keys: see the table above
- GLM provider defaults to:
https://api.z.ai/api/coding/paas/v4 - MiniMax API provider defaults to:
https://api.minimax.io/v1 - Claude headless coding-plan presets:
claude-code-minimax: routes viahttps://api.minimax.io/anthropicclaude-code-kimi: routes viahttps://api.kimi.com/coding/
- Model catalog overrides:
config/model_overrides.json(YAML also supported)
CLI Reference
dr-llm providers
dr-llm models sync
dr-llm models list --provider openai
dr-llm models list --supports-reasoning --json
dr-llm models show --provider openrouter --model openai/o3-mini
dr-llm query \
--provider openai \
--model gpt-4.1 \
--message "hello" \
--no-record
dr-llm query \
--provider openai \
--model gpt-4.1 \
--reasoning-json '{"effort":"high"}' \
--message "hello"
dr-llm run start --run-type benchmark
dr-llm run finish --run-id <run_id> --status success
dr-llm run benchmark \
--workers 128 \
--total-operations 200000 \
--warmup-operations 10000 \
--max-in-flight 128 \
--operation-mix-json '{"record_call":2,"session_roundtrip":1,"read_calls":1}' \
--artifact-path .dr_llm/benchmarks/release-baseline.json
dr-llm session start \
--provider openai \
--model gpt-4.1 \
--message "You are helpful" \
--message "Solve this task"
dr-llm session step --session-id <session_id> --message "next"
dr-llm session resume --session-id <session_id>
dr-llm session cancel --session-id <session_id> --reason "stopped"
# brokered tool calls are queued by default; use workers
dr-llm tool worker run --tool-loader mypkg.tools:register_tools
# optional synchronous override for a single step:
dr-llm session step --session-id <session_id> --inline-tool-execution
dr-llm replay session --session-id <session_id>
dr-llm project create my-experiment
dr-llm project list
dr-llm project use my-experiment # prints export DR_LLM_DATABASE_URL=...
dr-llm project start my-experiment
dr-llm project stop my-experiment
dr-llm project backup my-experiment
dr-llm project restore my-experiment backups/my-experiment-20260325.sql.gz
dr-llm project destroy my-experiment --yes-really-delete-everything
Benchmark command output:
{
"artifact_path": ".dr_llm/benchmarks/release-baseline.json",
"failed_operations": 0,
"operations_per_second": 4231.8,
"p50_latency_ms": 20.0,
"p95_latency_ms": 200.0,
"run_id": "run_abc123",
"status": "success"
}
Reasoning + cost notes:
- OpenAI-compatible adapters now accept
LlmRequest.reasoning/--reasoning-json. - Reasoning text/details and reasoning token counts are normalized on
LlmResponse. - Provider-returned cost fields (e.g. OpenRouter
usage.costvariants) are normalized intoLlmResponse.cost. - These are persisted in
llm_call_responsesalongside standard token usage.
Generation transcript logging (default on):
DR_LLM_GENERATION_LOG_ENABLED=trueDR_LLM_GENERATION_LOG_DIR=.dr_llm/generation_logsDR_LLM_GENERATION_LOG_ROTATE_BYTES=104857600DR_LLM_GENERATION_LOG_BACKUPS=10DR_LLM_GENERATION_LOG_REDACT_SECRETS=trueDR_LLM_GENERATION_LOG_MAX_EVENT_BYTES=10485760
Python Example
from dr_llm import LlmClient, LlmRequest, Message, PostgresRepository, ToolRegistry
repo = PostgresRepository()
client = LlmClient(repository=repo)
response = client.query(
LlmRequest(
provider="openai",
model="gpt-4.1",
messages=[Message(role="user", content="hello")],
)
)
print(response.text)
Adapter lifecycle note:
- If you instantiate provider adapters directly, call
adapter.close()when done (or use context manager formwith ... as adapter:) to release underlying HTTP connections.
Pool Example
Pools provide schema-driven sample storage with no-replacement acquisition for benchmarks.
from dr_llm import (
ColumnType, KeyColumn, PoolSchema, PoolStore, PoolService,
PoolAcquireQuery, PoolAcquireResult,
)
from dr_llm.pool.models import PoolSample
from dr_llm.storage._runtime import StorageConfig, StorageRuntime
# 1. Declare a pool schema with typed key dimensions
schema = PoolSchema(
name="my_benchmark",
key_columns=[
KeyColumn(name="provider"),
KeyColumn(name="difficulty", type=ColumnType.integer),
],
)
# 2. Connect and create tables
runtime = StorageRuntime(StorageConfig(dsn="postgresql://..."))
store = PoolStore(schema, runtime)
store.init_schema() # idempotent CREATE TABLE IF NOT EXISTS
# 3. Insert samples
store.insert_samples([
PoolSample(
key_values={"provider": "openai", "difficulty": 1},
sample_idx=0,
payload={"prompt": "What is 2+2?", "expected": "4"},
),
PoolSample(
key_values={"provider": "openai", "difficulty": 1},
sample_idx=1,
payload={"prompt": "What is 3+3?", "expected": "6"},
),
])
# 4. Acquire samples (no-replacement within a run)
result = store.acquire(PoolAcquireQuery(
run_id="run_001",
key_values={"provider": "openai", "difficulty": 1},
n=2,
))
for sample in result.samples:
print(sample.payload)
# 5. Or use PoolService for automatic top-up generation
service = PoolService(store)
result = service.acquire_or_generate(
PoolAcquireQuery(
run_id="run_002",
key_values={"provider": "openai", "difficulty": 2},
n=5,
),
generator_fn=lambda key_values, deficit: [
PoolSample(key_values=key_values, payload={"generated": True})
for _ in range(deficit)
],
)
Testing
uv run ruff format
uv run ruff check --fix src/ tests/ scripts/
uv run ty check src
uv run pytest tests/ -v
Integration tests
Integration tests require a running Postgres instance. If the test container (dr-llm-pg-test on port 5433) is already running, tests work automatically — conftest.py sets the default DR_LLM_TEST_DATABASE_URL.
To start the test container from scratch:
source ./scripts/start-test-postgres.sh
Then run integration tests:
uv run pytest tests/ -v -m integration
If integration tests are skipped unexpectedly, include skip reasons:
uv run pytest tests/ -v -m integration -rs
Demo Scripts
End-to-end query flow
Creates a project, records queries, verifies backup/restore:
./scripts/demo-query-flow.sh
Requires Docker and at least one of OPENAI_API_KEY or ANTHROPIC_API_KEY.
Pool provider demo
Queries all available LLM providers (API and headless) and stores results in a typed pool:
uv run python scripts/demo-pool-providers.py
Auto-detects available providers by checking API key env vars and CLI tool availability (claude, codex). For each provider: syncs the model catalog, selects a model, sends a query, and inserts the result into a pool. Prints a summary table at the end.
Options:
uv run python scripts/demo-pool-providers.py --project-name my-demo --prompt "Explain gravity in one sentence."
Requires Docker. Works with any combination of providers — set API keys and/or install CLI tools for the providers you want to test.
Milestone Closeout Artifacts
- Milestone status:
docs/milestones.md - Consumer rollout checklist:
docs/consumer-rollout-checklist.md - M2b operations checklist:
docs/ops/m2b-hardening-checklist.md - Compatibility contract:
docs/compatibility-contract.md - Migration guide:
docs/migration-guide.md - Integration notes:
docs/integrations/nl_latents.mddocs/integrations/unitbench.md
- Example gateways:
examples/nl_latents_gateway.pyexamples/unitbench_gateway.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dr_llm-0.4.0.tar.gz.
File metadata
- Download URL: dr_llm-0.4.0.tar.gz
- Upload date:
- Size: 125.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
027dfb7d285d5caaedb0d85b2ef20a6b6c21835ae5f0f5fefd5833aea5b451a9
|
|
| MD5 |
554e08b88f5573316e9370b71d37d859
|
|
| BLAKE2b-256 |
abc5e4d6bb29d672cedaef10045981af59e3c82963116c3cbe7ff3f7200ddbe5
|
File details
Details for the file dr_llm-0.4.0-py3-none-any.whl.
File metadata
- Download URL: dr_llm-0.4.0-py3-none-any.whl
- Upload date:
- Size: 104.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4134c0281dccb6e92293ae579b535cef22877aee3860238f8e9f304ff52b7a2f
|
|
| MD5 |
ecbbcd96bd2b14ff1c9adeb15ba99278
|
|
| BLAKE2b-256 |
91299d8e804d073d58a65d7ed1ebd118f9707cf327c0ddada816c046751324c7
|