Provider-agnostic single-query LLM client with PostgreSQL recording and catalog storage.

These details have not been verified by PyPI

Project links

Project description

dr-llm

Provider-agnostic LLM primitives: call any model, browse catalogs, run batch experiments with typed sample pools.

Domain-neutral by design — shared across repos like nl_latents and unitbench.

Two Flows

Flow 1 — Standalone (no database): Call providers, sync model catalogs, browse available models. File-based catalog cache, zero infrastructure.

Flow 2 — Pool (Postgres-backed): Schema-driven sample pools with a unified two-table design (pool_<name>_samples + pool_<name>_leases), no-replacement acquisition, and per-project isolated databases via Docker.

Install

uv add dr-llm

For the optional marimo pool-inspection notebooks in nbs/inspect/, install the notebook extra:

uv add "dr-llm[notebooks]"

Quick Start

1. Query a provider

uv run dr-llm query \
  --provider openai \
  --model gpt-4.1 \
  --message "Hello, what's 2+2?"

No database needed.

2. List providers

uv run dr-llm providers         # human-readable table
uv run dr-llm providers --json  # machine-readable

3. Sync and browse model catalogs

uv run dr-llm models sync --provider openai
uv run dr-llm models list --provider openai
uv run dr-llm models show --provider openai --model gpt-4.1

Catalog data is cached locally at ~/.dr_llm/catalog_cache/. No database required. Human-readable and JSON model listings also include the repo's curated blacklist, and OpenRouter listings are filtered through the local reasoning-policy allowlist.

Available Providers

Provider	Type	Requirements
`openai`	OpenAI API	`OPENAI_API_KEY`
`openrouter`	OpenRouter API	`OPENROUTER_API_KEY`
`minimax`	MiniMax Anthropic-compatible API	`MINIMAX_API_KEY`
`anthropic`	Anthropic API	`ANTHROPIC_API_KEY`
`google`	Google Gemini API	`GOOGLE_API_KEY`
`glm`	GLM (ZAI) API	`ZAI_API_KEY`
`codex`	Codex CLI (headless)	`codex` executable
`claude-code`	Claude Code CLI (headless)	`claude` executable
`kimi-code`	Kimi Code API (Anthropic-compatible)	`KIMI_API_KEY`

Headless providers shell out to CLI tools. minimax and kimi-code are direct Anthropic-compatible /messages API providers. Headless input shapes do not expose temperature, top_p, or max_tokens. kimi-code rejects temperature and top_p, but still requires max_tokens.

Some providers use static model lists for models sync (no /models endpoint). The CLI notes when a list may be out of date and links to docs.

Python API

OpenAILlmRequest / OpenAILlmConfig are the concrete request/config shapes for provider="openai". ApiLlmRequest / ApiLlmConfig are the concrete shapes for the remaining sampling-capable API providers. KimiCodeLlmRequest / KimiCodeLlmConfig are the concrete shapes for kimi-code. HeadlessLlmRequest / HeadlessLlmConfig are the concrete shapes for CLI-backed providers. LlmRequest and LlmConfig remain available as unions, and parse_llm_request(...) / parse_llm_config(...) validate raw payloads into the correct concrete model by provider.

For generic sampling-capable API providers, omitted sampling controls default to temperature=1.0 and top_p=0.95. OpenAI omits those fields unless you set them explicitly. kimi-code and headless providers reject those fields entirely.

Calling a provider

from dr_llm.llm import OpenAILlmRequest, build_default_registry
from dr_llm.llm.messages import Message

registry = build_default_registry()
adapter = registry.get("openai")

response = adapter.generate(
    OpenAILlmRequest(
        provider="openai",
        model="gpt-4.1",
        messages=[Message(role="user", content="hello")],
    )
)
print(response.text)

Filling a pool with LLM calls (requires Docker)

The recommended way to populate a pool: declare each variant axis (LLM configs, prompts, datasets, …), pass them to seed_llm_grid, and let parallel workers make the actual provider calls. seed_llm_grid walks the cross product, builds per-cell payloads in the shape make_llm_process_fn consumes, and bulk-inserts the unfilled sample rows in one round-trip. Docker is used to auto-manage a Postgres project.

import time

from dr_llm import DbConfig, PoolSchema, PoolStore
from dr_llm.llm import build_default_registry
from dr_llm.llm.config import ApiLlmConfig, LlmConfig, OpenAILlmConfig
from dr_llm.llm.messages import Message
from dr_llm.llm.providers.reasoning import GoogleReasoning, ThinkingLevel
from dr_llm.pool.backend import LlmPoolBackend, LlmPoolBackendConfig, make_llm_process_fn
from dr_llm.pool.db.runtime import DbRuntime
from dr_llm.pool.seed_grid import Axis, AxisMember, GridCell, seed_llm_grid
from dr_llm.project.models import CreateProjectRequest
from dr_llm.project.project_service import create_project
from dr_llm.workers import WorkerConfig, start_workers

# 1. Create a Docker-managed Postgres project
project = create_project(CreateProjectRequest(project_name="my_eval"))

# 2. Build a schema whose key columns match the axis names
schema = PoolSchema.from_axis_names("my_eval", ["llm_config", "prompt"])
runtime = DbRuntime(DbConfig(dsn=project.dsn))
store = PoolStore(schema, runtime)
store.ensure_schema()

# 3. Declare each axis as a list of AxisMembers
llm_config_axis = Axis[LlmConfig](
    name="llm_config",
    members=[
        AxisMember(
            id="gpt-4.1-mini",
            value=OpenAILlmConfig(
                provider="openai",
                model="gpt-4.1-mini",
                max_tokens=64,
            ),
        ),
        AxisMember(
            id="gemini-flash",
            value=ApiLlmConfig(
                provider="google",
                model="gemini-2.5-flash",
                max_tokens=64,
                reasoning=GoogleReasoning(
                    thinking_level=ThinkingLevel.BUDGET,
                    budget_tokens=512,
                ),
            ),
        ),
    ],
)
prompt_axis = Axis[list[Message]](
    name="prompt",
    members=[
        AxisMember(
            id="haiku",
            value=[Message(role="user", content="Write a haiku about programming.")],
        ),
        AxisMember(
            id="math",
            value=[Message(role="user", content="What is 17 * 23?")],
        ),
    ],
)

# 4. Seed the cross product. seed_llm_grid handles payload shaping,
#    sample_idx expansion, and bulk insert.
def build_request(cell: GridCell) -> tuple[list[Message], LlmConfig]:
    return cell.values["prompt"], cell.values["llm_config"]

seed_result = seed_llm_grid(
    store,
    axes=[llm_config_axis, prompt_axis],
    build_request=build_request,
    n=2,  # 2 configs × 2 prompts × n=2 = 8 sample rows
)
print(f"Seeded {seed_result.inserted} sample rows")

# 5. Start workers — they call the real providers
registry = build_default_registry()
controller = start_workers(
    LlmPoolBackend(store, config=LlmPoolBackendConfig(max_retries=1)),
    process_fn=make_llm_process_fn(registry),
    config=WorkerConfig(num_workers=4, thread_name_prefix="pool-fill"),
)

# 6. Drain to idle
try:
    while True:
        snapshot = controller.snapshot()
        state = snapshot.backend_state
        if state is not None:
            print(f"incomplete={state.incomplete} complete={state.complete}")
            if state.incomplete == 0:
                break
        time.sleep(0.5)
finally:
    controller.stop()
    controller.join()

# 7. Acquire completed samples (no-replacement, per-consumer)
#    Sample acquisition lives in dr_llm.sampling. Each consumer gets its
#    own claims table; setup_consumer/teardown_consumer manages it.
from dr_llm.sampling.acquisition import AcquireQuery
from dr_llm.sampling.sampling_store import SamplingStore

sampling = SamplingStore.from_pool_store(store)
sampling.setup_consumer("eval_consumer_1")
result = sampling.acquire(
    AcquireQuery(
        run_id="eval_run_1",
        key_values={"llm_config": "gpt-4.1-mini", "prompt": "math"},
        n=2,
    ),
    "eval_consumer_1",
)

# 8. Clean up when done
sampling.teardown_consumer("eval_consumer_1")
registry.close()
runtime.close()

See scripts/demo-pool-fill.py for a complete runnable example.

Fair worker claiming

LlmPoolBackend claims incomplete samples in creation order by default. When a worker pool should interleave work across a key dimension, use RoundRobinClaimer from dr_llm.pool.claim_strategy to cycle through explicit key values while still relying on PoolStore.claim_lease(...) for lease safety.

from dr_llm.pool.claim_strategy import ClaimOrder, RoundRobinClaimer

claimer = RoundRobinClaimer(
    store,
    round_robin_key="llm_config",
    round_robin_values=["gpt-4.1-mini", "gemini-flash"],
    order=ClaimOrder(kind="random", seed=7),
)

sample = claimer.claim(worker_id="worker-1", lease_seconds=60)

Reading an existing pool

Once a pool has been seeded and filled, PoolReader gives consumers a typed read-only handle for inspection without re-wiring DbRuntime / PoolSchema / PoolStore by hand. The reader composes a private PoolStore and exposes only its read-side methods.

from dr_llm import PoolReader
from dr_llm.pool.db.runtime import DbConfig, DbRuntime
from dr_llm.project.project_service import maybe_get_project

project = maybe_get_project("my_eval")
runtime = DbRuntime(DbConfig(dsn=project.dsn))
try:
    with PoolReader.open("provider_queries", runtime=runtime) as reader:
        progress = reader.progress()
        print(
            f"total={progress.total} "
            f"complete={progress.complete} "
            f"incomplete={progress.incomplete} "
            f"leased={progress.leased} "
            f"error={progress.error}"
        )

        # Typed PoolSample iterator/list with optional key + completion filters
        for sample in reader.samples_list(
            key_filter={"llm_config": "gpt-4.1-mini"},
        ):
            print(sample.sample_id, sample.request, sample.response)
finally:
    runtime.close()

PoolReader.open(pool, runtime=runtime) reads the pool's PoolSchema from the project-global pool_catalog table, where PoolStore.ensure_schema() persists it. Use PoolReader.from_runtime(runtime, schema=...) when you already have the schema in hand or want to control schema construction in tests.

CLI Reference

# Providers
dr-llm providers [--json]

# Model catalog (file-based, no DB needed)
dr-llm models sync [--provider NAME] [--verbose]
dr-llm models list [--provider NAME] [--supports-reasoning] [--model-contains TEXT] [--json]
dr-llm models sync-list [--provider NAME] [--supports-reasoning] [--model-contains TEXT] [--json]
dr-llm models show --provider NAME --model NAME

# Query
dr-llm query --provider NAME --model NAME --message TEXT
dr-llm query --provider openai --model gpt-5-mini --reasoning-json '{"kind":"openai","thinking_level":"high"}' --message TEXT
dr-llm query --provider codex --model gpt-5.1-codex-mini --reasoning-json '{"kind":"codex","thinking_level":"xhigh"}' --message TEXT
dr-llm query --provider google --model gemini-2.5-flash --reasoning-json '{"kind":"google","thinking_level":"budget","budget_tokens":512}' --message TEXT
dr-llm query --provider openrouter --model openai/gpt-oss-20b --reasoning-json '{"kind":"openrouter","effort":"high"}' --message TEXT

# Sampling / token controls
# Generic sampling API providers default omitted sampling controls to temperature=1.0 and top_p=0.95.
# OpenAI omits temperature/top_p unless you set them explicitly.
# OpenAI GPT-5 custom temperature/top_p controls are only supported on gpt-5.2/gpt-5.4 with reasoning off.
# --temperature, --top-p, and --max-tokens are rejected for headless providers (codex, claude-code)
# --temperature and --top-p are also rejected for kimi-code; --max-tokens is required there

# Projects (Docker-managed Postgres)
dr-llm project create NAME
dr-llm project list
dr-llm project use NAME
dr-llm project start|stop NAME
dr-llm pool destroy PROJECT_NAME POOL_NAME --yes-really-delete-everything
dr-llm pool destroy-testish PROJECT_NAME --yes-really-delete-everything
dr-llm pool destroy-testish PROJECT_NAME --dry-run
dr-llm project backup NAME
dr-llm project restore NAME BACKUP_PATH  # BACKUP_PATH must be .sql.gz
dr-llm project destroy NAME --yes-really-delete-everything

Deleting pools and projects

Deletion now uses one standard primitive: pool deletion.

dr-llm pool destroy PROJECT_NAME POOL_NAME --yes-really-delete-everything deletes the fixed pool table set for that pool name (pool_<name>_samples and pool_<name>_leases), any consumer claim tables (pool_<name>_claims_<consumer_id>), and the pool's row from pool_catalog.
dr-llm pool destroy-testish PROJECT_NAME --yes-really-delete-everything discovers pools in that project and deletes only the ones whose underscore-delimited lowercase name tokens include test, tst, smoke, or demo
dr-llm pool destroy-testish PROJECT_NAME --dry-run previews the matched pools and returns the same structured result shape without deleting anything
direct pool deletion requires the project to be running, but leased rows do not block deletion
legacy pools without persisted pool_catalog metadata can still be deleted, because deletion targets the derived table names directly rather than loading PoolSchema from pool_catalog

dr-llm project destroy is now an orchestrator over pool deletion rather than a blind Docker destroy.

if the project is stopped, it is started temporarily for pool discovery and deletion
discovered pools are deleted with bounded parallelism, but result ordering is deterministic and follows pool discovery order rather than completion order
if any pool deletion fails, project container and volume deletion are skipped
if the project had to be started temporarily and deletion fails, it is stopped again to restore the original state

Both destroy commands now emit structured JSON results. For project deletion, the payload includes discovered_pool_names, ordered pool_results, temporarily_started, and destroyed_project_resources.

Configuration

Generation transcript logging (default on, used for LLM call debugging):

Variable	Default
`DR_LLM_GENERATION_LOG_ENABLED`	`true`
`DR_LLM_GENERATION_LOG_DIR`	`.dr_llm/generation_logs`
`DR_LLM_GENERATION_LOG_ROTATE_BYTES`	`104857600` (100MB)
`DR_LLM_GENERATION_LOG_BACKUPS`	`10`
`DR_LLM_GENERATION_LOG_REDACT_ENABLED`	`true`

Provider endpoint defaults:

GLM: https://api.z.ai/api/coding/paas/v4
MiniMax API: https://api.minimax.io/anthropic/v1/messages
Kimi Code API: https://api.kimi.com/coding/v1/messages

Testing

uv run ruff format && uv run ruff check --fix .
uv run ty check
uv run pytest tests/ -v -m "not integration"

Integration tests (requires Docker)

./scripts/run-tests-local.sh

pytest now defaults to pytest-xdist, so uv run pytest tests/ -v -m "not integration" runs the safe non-integration suite in parallel. run-tests-local.sh forces -n 0, auto-creates a temporary Docker Postgres project, runs pytest -m integration, and destroys it on exit. Pass extra pytest args for targeted runs: ./scripts/run-tests-local.sh -k test_pool_store.

Demo Scripts

Provider discovery (no DB needed)

uv run python scripts/demo-providers.py

Lists all supported providers, syncs and displays model catalogs for each available one.

Pool provider demo (requires Docker)

uv run python scripts/demo-pool-providers.py

Creates a project, queries every available provider, stores results in a typed pool, prints a summary table. Run with --help for options.

Pool fill worker demo (requires Docker + API keys)

uv run python scripts/demo-pool-fill.py

Auto-creates a Docker Postgres project, seeds an (llm_config, prompt) pool via seed_llm_grid from declared Axis instances, starts workers that make real LLM calls via make_llm_process_fn, drains the queue to idle with drain, shows response snippets, and destroys the project on exit. Pass --dsn to use an existing database instead. Run with --help for options.

Reasoning and effort demo (live API / CLI checks)

uv run python scripts/demo_thinking_and_effort.py

Exercises the branch's provider-specific reasoning and effort validation against curated model sets for OpenAI, OpenRouter, Google, Codex, Claude Code, MiniMax, and Kimi Code. Use --provider to limit the run to one provider.

Reasoning configs are validated before dispatch. For example, OpenAI GPT-5 family models use configs like {"kind":"openai","thinking_level":"high"}, Codex reasoning-capable models also accept {"kind":"codex","thinking_level":"xhigh"}, Google 2.5 models accept budget configs like {"kind":"google","thinking_level":"budget","budget_tokens":512}, minimax requires {"kind":"anthropic","thinking_level":"na"} together with an explicit --effort, kimi-code uses Anthropic-compatible reasoning like {"kind":"anthropic","thinking_level":"adaptive"} together with an explicit --effort and --max-tokens, and OpenRouter reasoning-capable models use {"kind":"openrouter", ...} with either enabled or effort depending on the repo's curated model policy.

The OpenRouter policy layer is based on direct API observations from the provider endpoint.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.3.0

Jun 9, 2026

4.2.0

May 11, 2026

4.1.0

May 11, 2026

4.0.1

May 10, 2026

This version

4.0.0

May 10, 2026

2.3.1

Apr 24, 2026

2.2.0

Apr 15, 2026

2.1.0

Apr 13, 2026

2.0.0

Apr 13, 2026

1.4.0

Apr 13, 2026

1.3.0

Apr 13, 2026

1.2.1

Apr 8, 2026

1.2.0

Apr 8, 2026

1.1.0

Apr 8, 2026

1.0.2

Mar 30, 2026

1.0.1

Mar 30, 2026

1.0.0

Mar 29, 2026

0.4.0

Mar 25, 2026

0.3.0

Mar 25, 2026

0.2.0

Mar 25, 2026

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dr_llm-4.0.0.tar.gz (367.1 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dr_llm-4.0.0-py3-none-any.whl (172.2 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file dr_llm-4.0.0.tar.gz.

File metadata

Download URL: dr_llm-4.0.0.tar.gz
Upload date: May 10, 2026
Size: 367.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.0

File hashes

Hashes for dr_llm-4.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2fd9ccf8e2e812a994396c3fadc8dabd76197b6eec37748d72f17a905c990024`
MD5	`e2537cf053253bb2b57a8d0ccd7ced83`
BLAKE2b-256	`abf46e07d92aac44d8102c6f3fb59433453a3bfaa6c8deed8a0b3ba4ded72a10`

See more details on using hashes here.

File details

Details for the file dr_llm-4.0.0-py3-none-any.whl.

File metadata

Download URL: dr_llm-4.0.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 172.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.0

File hashes

Hashes for dr_llm-4.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d37c0a9e9bc60aaaa1920a94c30127e68aa015a27bb787829aa2aaa0ccfc102`
MD5	`d32db987eca402c5a9def48e47e0c624`
BLAKE2b-256	`3ef7189489db47781756e5fd3c698af75759a7229b34c3baf21d631c857d1ec7`

See more details on using hashes here.

dr-llm 4.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dr-llm

Two Flows

Install

Quick Start

1. Query a provider

2. List providers

3. Sync and browse model catalogs

Available Providers

Python API

Calling a provider

Filling a pool with LLM calls (requires Docker)

Fair worker claiming

Reading an existing pool

CLI Reference

Deleting pools and projects

Configuration

Testing

Integration tests (requires Docker)

Demo Scripts

Provider discovery (no DB needed)

Pool provider demo (requires Docker)

Pool fill worker demo (requires Docker + API keys)

Reasoning and effort demo (live API / CLI checks)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes