Shared LLM calling and event-sourced storage primitive for benchmark systems.
Project description
dr-llm
dr-llm is a shared primitive for:
- provider-agnostic LLM calls (API and headless)
- canonical PostgreSQL recording/query storage
- event-sourced multistep sessions with tool-calling
- worker-safe parallel tool execution with queue claiming
It is intentionally domain-neutral so repos like nl_latents and unitbench can reuse it.
Core Capabilities
- Unified call interface:
LlmClient.query(LlmRequest) -> LlmResponse
- Canonical storage (PostgreSQL):
- runs, calls, request/response payloads, artifacts
- Session runtime:
start,step,resume,cancel- native tool strategy (if provider supports tools) + brokered fallback
- Tool queue + workers:
- idempotent tool call enqueue
- concurrent worker claims via
FOR UPDATE SKIP LOCKED
- Replay:
- reconstruct message history from
session_events
- reconstruct message history from
Install
uv add dr-llm
Quick verification:
uv run python -c "import dr_llm"
For maintainers, see the release runbook: docs/releasing.md.
Configuration
- Required for DB-backed workflows:
DR_LLM_DATABASE_URL - Optional provider keys:
OPENAI_API_KEYOPENROUTER_API_KEYANTHROPIC_API_KEYGOOGLE_API_KEYZAI_API_KEYMINIMAX_API_KEYKIMI_API_KEY
- GLM provider defaults to the international Coding Plan endpoint:
https://api.z.ai/api/coding/paas/v4
- MiniMax API provider defaults to:
https://api.minimax.io/v1
- Claude headless coding-plan presets:
claude-code-minimax: routes Claude Code viahttps://api.minimax.io/anthropicand mapsMINIMAX_API_KEYto Anthropic auth envsclaude-code-kimi: routes Claude Code viahttps://api.kimi.com/coding/and mapsKIMI_API_KEYto Anthropic auth envs
- Model catalog overrides default file:
config/model_overrides.json - YAML override parsing is supported and requires
PyYAML(included as a core dependency).
CLI
dr-llm providers
dr-llm models sync
dr-llm models list --supports-reasoning --json
dr-llm models show --provider openrouter --model openai/o3-mini
dr-llm query \
--provider openai \
--model gpt-4.1 \
--reasoning-json '{"effort":"high"}' \
--message "hello"
dr-llm run start --run-type benchmark
dr-llm run finish --run-id <run_id> --status success
dr-llm run benchmark \
--workers 128 \
--total-operations 200000 \
--warmup-operations 10000 \
--max-in-flight 128 \
--operation-mix-json '{"record_call":2,"session_roundtrip":1,"read_calls":1}' \
--artifact-path .dr_llm/benchmarks/release-baseline.json
dr-llm session start \
--provider openai \
--model gpt-4.1 \
--message "You are helpful" \
--message "Solve this task"
dr-llm session step --session-id <session_id> --message "next"
dr-llm session resume --session-id <session_id>
dr-llm session cancel --session-id <session_id> --reason "stopped"
# brokered tool calls are queued by default; use workers
dr-llm tool worker run --tool-loader mypkg.tools:register_tools
# optional synchronous override for a single step:
dr-llm session step --session-id <session_id> --inline-tool-execution
dr-llm replay session --session-id <session_id>
Benchmark command output:
{
"artifact_path": ".dr_llm/benchmarks/release-baseline.json",
"failed_operations": 0,
"operations_per_second": 4231.8,
"p50_latency_ms": 20.0,
"p95_latency_ms": 200.0,
"run_id": "run_abc123",
"status": "success"
}
Reasoning + cost notes:
- OpenAI-compatible adapters now accept
LlmRequest.reasoning/--reasoning-json. - Reasoning text/details and reasoning token counts are normalized on
LlmResponse. - Provider-returned cost fields (e.g. OpenRouter
usage.costvariants) are normalized intoLlmResponse.cost. - These are persisted in
llm_call_responsesalongside standard token usage.
Generation transcript logging (default on):
DR_LLM_GENERATION_LOG_ENABLED=trueDR_LLM_GENERATION_LOG_DIR=.dr_llm/generation_logsDR_LLM_GENERATION_LOG_ROTATE_BYTES=104857600DR_LLM_GENERATION_LOG_BACKUPS=10DR_LLM_GENERATION_LOG_REDACT_SECRETS=trueDR_LLM_GENERATION_LOG_MAX_EVENT_BYTES=10485760
Python Example
from dr_llm import LlmClient, LlmRequest, Message, PostgresRepository, ToolRegistry
repo = PostgresRepository()
client = LlmClient(repository=repo)
response = client.query(
LlmRequest(
provider="openai",
model="gpt-4.1",
messages=[Message(role="user", content="hello")],
)
)
print(response.text)
Adapter lifecycle note:
- If you instantiate provider adapters directly, call
adapter.close()when done (or use context manager formwith ... as adapter:) to release underlying HTTP connections.
Testing
uv run ruff format
uv run ruff check --fix src/ tests/ scripts/
uv run ty check src
uv run pytest tests/ -v
Postgres integration tests are env-gated:
- set
DR_LLM_TEST_DATABASE_URL(orDR_LLM_DATABASE_URL) - run
uv run pytest tests/ -v -m integration
Local integration recommendation (test-only DSN):
- Start a dedicated Postgres test container on
5433:
docker run -d \
--name dr-llm-pg-test \
-e POSTGRES_DB=dr_llm_test \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-p 5433:5432 \
postgres:16
- Set a test-only URL (avoid using your app/runtime DB URL):
export DR_LLM_TEST_DATABASE_URL='postgresql://postgres:postgres@localhost:5433/dr_llm_test'
- Run the helper:
./scripts/run-integration-local.sh
Preflight check (recommended before running integration tests):
psql "$DR_LLM_TEST_DATABASE_URL" -c "select current_user, current_database();"
If integration tests are skipped unexpectedly, include skip reasons:
uv run pytest tests/ -v -m integration -rs
CI
GitHub Actions workflows:
ci: runs on PRs tomainand pushes tomainquality-unitjob: format check, lint, type-check, non-integration testssecurityjob:uv lock --checkanduvx pip-audit
integration: runs on pushes tomain, manual dispatch, and PRs tomainonly when labelrun-integrationis present- starts
postgres:16service and runspytest -m integration
- starts
Branch protection recommendation:
- require
ci / quality-unit - require
ci / security - keep
integration / postgres-integrationnon-required for all PRs (opt-in via label, always onmain)
Milestone Closeout Artifacts
- Milestone status:
docs/milestones.md - Consumer rollout checklist:
docs/consumer-rollout-checklist.md - M2b operations checklist:
docs/ops/m2b-hardening-checklist.md - Compatibility contract:
docs/compatibility-contract.md - Migration guide:
docs/migration-guide.md - Integration notes:
docs/integrations/nl_latents.mddocs/integrations/unitbench.md
- Example gateways:
examples/nl_latents_gateway.pyexamples/unitbench_gateway.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dr_llm-0.1.0.tar.gz.
File metadata
- Download URL: dr_llm-0.1.0.tar.gz
- Upload date:
- Size: 105.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ae3431fe47f9a79af358509cc9c3f1ab92d106ddd3ebc27bff0e439445d9651
|
|
| MD5 |
11a1ba7e1d41b3aaa9891c2a73922bf3
|
|
| BLAKE2b-256 |
70bb37959d3ad2bfef1d3c22f988ff039232c3f1b1a2d6ce3e7ba76b88f57ce5
|
File details
Details for the file dr_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dr_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 83.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eee4dbe67ffeeea5829bb0ed833120b11e7dab02bf8fc9f947eb684aed379b49
|
|
| MD5 |
9377a35a4be8d6cce9e1d763c1376e52
|
|
| BLAKE2b-256 |
8e541592c5eb0728d667f7b4c479781918e4279b07eab281e83727212aa78eb2
|