Skip to main content

A thin, opinionated, local-first structured-output + logging layer over LiteLLM

Project description

llmkit

A thin, opinionated, local-first layer over LiteLLM (with instructor for structured output). It gives an application one provider-agnostic call surface across OpenRouter, Google, Anthropic, and local Ollama, with validated structured output, a global async rate limiter, transient-error retries, and agent-readable per-call logging out of the box.

LiteLLM is the implementation of the HTTP providers; llmkit owns the ergonomic call surface, the structured-output mode pinning, the rate-limit policy, and the logging convention. It is not a gateway and does not reimplement transport — that is solved, and reimplementing it is the thing this library deliberately does not do.

Why llmkit

  • Structured output that actually validates. Each provider is pinned to its native JSON-schema mode (never instructor's auto-Mode.TOOLS, which silently regresses Gemini to empty shapes), and instructor's in-call validation-retry repairs truncated JSON. You pass a Pydantic model; you get a validated instance back.
  • Provider switching is config, not code. OpenRouter / Google / Anthropic / Ollama behind one Provider enum and one LLMClientConfig. Call sites never change when you switch.
  • Logging tuned for coding agents. Every call is logged verdict-first (see below) — the design assumption is that the reader is usually an LLM coding agent debugging a run, not a dashboard.
  • Local-first, zero infra. The default sink writes plain files to a directory. No collector, no account, no network. A pluggable LogSink lets you ship records anywhere later without touching call sites.

Install

uv add omg-llmkit          # or: pip install omg-llmkit

The distribution is published as omg-llmkit (the bare llmkit name was already taken on PyPI), but the import name is just llmkit:

import llmkit

Requires Python ≥ 3.13.

Quick start

from pydantic import BaseModel
from llmkit import (
    LLMClientConfig,
    Provider,
    configure_llm_client,
    structured_llm_call,
)

# Point the library at a provider once, at startup.
configure_llm_client(lambda: LLMClientConfig(
    provider=Provider.OPENROUTER,
    model="google/gemini-2.5-flash",
    api_key="sk-or-...",
))

class Summary(BaseModel):
    title: str
    bullets: list[str]

result: Summary = await structured_llm_call(
    prompt="Summarize the attached report.",
    schema=Summary,
    feature="reports",      # groups calls in the logs
    label="exec_summary",   # names this specific call in the logs
)

The public call surface:

Function Use
structured_llm_call(prompt, schema, feature, label, ...) Async, returns a validated Pydantic instance
structured_llm_call_sync(...) Synchronous wrapper around the above
text_llm_call(prompt, feature, label, ...) Async, returns plain text (coerces provider list-content blocks)
stream_text_with_log(prompt, feature, label, ...) Async generator yielding text chunks, logged on completion

configure_rate_limit(...) sets the process-global concurrency cap; configure_llm_logging(sink) swaps the log sink (below).

Logging: agent-readable by default

LocalYamlLogSink (the default) writes two things to data/llm-logs/:

  1. One YAML file per call, laid out verdict-first. The file opens with a one-line # header — ok/ERROR, feature/label, resolved model, schema, duration, approximate cost — so head -1 *.yaml triages a whole run. Small metadata is next; the large response and prompt blobs are last, so the head of the file is the whole story for most reads.
  2. A compact append-only index.jsonl — one JSON line per call (file, timestamp, feature, label, model, provider, schema, duration, cost, error). Cross-call questions — "which calls errored / were slowest / most expensive / the last call for feature X" — are a single small scan instead of globbing and parsing every YAML.
# ok | reports/exec_summary | google/gemini-2.5-flash | Summary | 1840ms | $0.0007
# 2026-06-05T14:22:31.004512

timestamp: '2026-06-05T14:22:31.004512'
feature: reports
label: exec_summary
model: google/gemini-2.5-flash
provider: openrouter
schema: Summary
temperature: 0.0
duration_ms: 1840.2
approximate_cost: 0.0007
error: null
response: ...
prompt: ...

approximate_cost is LiteLLM's per-response estimate for budget visibility — not a billing figure (and None when the provider does not report it, e.g. streamed calls).

Write your own LogSink

LogSink is a one-method Protocol. Records (LLMCallRecord, a frozen dataclass) are handed to your sink for every call; failures are swallowed so logging can never break a call. To send records somewhere other than local YAML — a database, an HTTP collector, structured stdout — implement write and register it:

import logging
from pathlib import Path
from llmkit import LLMCallRecord, configure_llm_logging

logger = logging.getLogger("llm-calls")

class StructuredStdoutSink:
    def write(self, record: LLMCallRecord) -> Path | None:
        logger.info(
            "llm_call",
            extra={
                "feature": record.feature,
                "label": record.label,
                "model": record.model,
                "provider": record.provider,
                "schema": record.schema,
                "duration_ms": record.duration_ms,
                "approximate_cost": record.approximate_cost,
                "error": record.error,
            },
        )
        return None  # nothing persisted to a path

configure_llm_logging(StructuredStdoutSink())   # pass None to disable logging entirely

An OpenTelemetry exporter (e.g. to Langfuse/Phoenix) is a natural future llmkit[otel] extra; the pluggable seam makes it a non-breaking addition.

Configuration

LLMClientConfig is flat and carries only what a call needs:

@dataclass(frozen=True)
class LLMClientConfig:
    provider: Provider          # OPENROUTER | OLLAMA | GOOGLE | ANTHROPIC
    model: str                  # the provider's default model
    api_key: str | None = None
    base_url: str | None = None

Per-call model= overrides the default, so "strong/small/current" model roles are the host's concern — resolve them to a model string and pass it at the call site. The library has no opinion about roles.

Register the config with configure_llm_client(source), where source is a zero-arg callable returning an LLMClientConfig (re-read on each provider construction, so it tracks live settings changes).

Retries

Two retry layers, kept deliberately separate:

  • with_retries() (retry.py) handles transient provider errors (429 / 503 / 5xx; the recoverable set is LLM_RECOVERABLE_ERRORS).
  • instructor's own low max_retries handles schema-validation repair (re-ask the model to fix malformed JSON).

Development

uv sync --extra dev
uv run ruff check . && uv run ruff format --check .
uv run basedpyright          # 0 errors, 0 warnings — no baseline
uv run pytest

Status & support

llmkit is a small, opinionated, best-effort project, extracted from a real application and maintained in the open. It is used in production by its author but carries no support SLA. Bug reports and focused pull requests are welcome — see CONTRIBUTING.md. For security issues, see SECURITY.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omg_llmkit-0.1.1.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omg_llmkit-0.1.1-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file omg_llmkit-0.1.1.tar.gz.

File metadata

  • Download URL: omg_llmkit-0.1.1.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omg_llmkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 58e9d1c7270267b506c21acfc8bcac0d54aa97c663c4d0ea53a057f977b41a78
MD5 eb3222b11fdf7ebd922467f9976b71bc
BLAKE2b-256 3add2527fff232514e21c1d333449a939384002b6d59a7f3a6afc0b2950192ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for omg_llmkit-0.1.1.tar.gz:

Publisher: publish.yml on OMGBrews/llmkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omg_llmkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: omg_llmkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omg_llmkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 560d2ff792ad4509b648b5e359d5d10012d14ce77af131500f4cd58c88a28475
MD5 1fd5fe83654657d4e1727172f7fd3a83
BLAKE2b-256 f3cd945d00f05e9e3bc0a29ee81f5aaf575eb7d4fdb2fe68c0867c31b9079195

See more details on using hashes here.

Provenance

The following attestation bundles were made for omg_llmkit-0.1.1-py3-none-any.whl:

Publisher: publish.yml on OMGBrews/llmkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page