A thin, opinionated, local-first structured-output + logging layer over LiteLLM

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

llmkit

A thin, opinionated, local-first layer over LiteLLM (with instructor for structured output). It gives an application one provider-agnostic call surface across OpenRouter, Google, Anthropic, and local Ollama, with validated structured output, a global async rate limiter, transient-error retries, and agent-readable per-call logging out of the box.

LiteLLM is the implementation of the HTTP providers; llmkit owns the ergonomic call surface, the structured-output mode pinning, the rate-limit policy, and the logging convention. It is not a gateway and does not reimplement transport — that is solved, and reimplementing it is the thing this library deliberately does not do.

Why llmkit

Structured output that actually validates. Each provider is pinned to its native JSON-schema mode (never instructor's auto-Mode.TOOLS, which silently regresses Gemini to empty shapes), and instructor's in-call validation-retry repairs truncated JSON. You pass a Pydantic model; you get a validated instance back.
Provider switching is config, not code. OpenRouter / Google / Anthropic / Ollama behind one Provider enum and one LLMClientConfig. Call sites never change when you switch.
Logging tuned for coding agents. Every call is logged verdict-first (see below) — the design assumption is that the reader is usually an LLM coding agent debugging a run, not a dashboard.
Local-first, zero infra. The default sink writes plain files to a directory. No collector, no account, no network. A pluggable LogSink lets you ship records anywhere later without touching call sites.

Install

uv add omg-llmkit          # or: pip install omg-llmkit

The distribution is published as omg-llmkit (the bare llmkit name was already taken on PyPI), but the import name is just llmkit:

import llmkit

Requires Python ≥ 3.13.

Quick start

from pydantic import BaseModel
from llmkit import (
    LLMClientConfig,
    Provider,
    configure_llm_client,
    structured_llm_call,
)

# Point the library at a provider once, at startup.
configure_llm_client(lambda: LLMClientConfig(
    provider=Provider.OPENROUTER,
    model="google/gemini-2.5-flash",
    api_key="sk-or-...",
))

class Summary(BaseModel):
    title: str
    bullets: list[str]

result: Summary = await structured_llm_call(
    prompt="Summarize the attached report.",
    schema=Summary,
    feature="reports",      # groups calls in the logs
    label="exec_summary",   # names this specific call in the logs
)

The public call surface:

Function	Use
`structured_llm_call(prompt, schema, feature, label, ...)`	Async, returns a validated Pydantic instance
`structured_llm_call_sync(...)`	Synchronous wrapper around the above
`text_llm_call(prompt, feature, label, ...)`	Async, returns plain text (coerces provider list-content blocks)
`stream_text_with_log(prompt, feature, label, ...)`	Async generator yielding text chunks, logged on completion

configure_rate_limit(...) sets the process-global concurrency cap; configure_llm_logging(sink) swaps the log sink (below).

Logging: agent-readable by default

LocalYamlLogSink (the default) writes two things to data/llm-logs/:

One YAML file per call, laid out verdict-first. The file opens with a one-line # header — ok/ERROR, feature/label, resolved model, schema, duration, approximate cost — so head -1 *.yaml triages a whole run. Small metadata is next; the large response and prompt blobs are last, so the head of the file is the whole story for most reads.
A compact append-only index.jsonl — one JSON line per call (file, timestamp, feature, label, model, provider, schema, duration, cost, error). Cross-call questions — "which calls errored / were slowest / most expensive / the last call for feature X" — are a single small scan instead of globbing and parsing every YAML.

# ok | reports/exec_summary | google/gemini-2.5-flash | Summary | 1840ms | $0.0007
# 2026-06-05T14:22:31.004512

timestamp: '2026-06-05T14:22:31.004512'
feature: reports
label: exec_summary
model: google/gemini-2.5-flash
provider: openrouter
schema: Summary
temperature: 0.0
duration_ms: 1840.2
approximate_cost: 0.0007
error: null
response: ...
prompt: ...

approximate_cost is LiteLLM's per-response estimate for budget visibility — not a billing figure (and None when the provider does not report it, e.g. streamed calls).

Write your own `LogSink`

LogSink is a one-method Protocol. Records (LLMCallRecord, a frozen dataclass) are handed to your sink for every call; failures are swallowed so logging can never break a call. To send records somewhere other than local YAML — a database, an HTTP collector, structured stdout — implement write and register it:

import logging
from pathlib import Path
from llmkit import LLMCallRecord, configure_llm_logging

logger = logging.getLogger("llm-calls")

class StructuredStdoutSink:
    def write(self, record: LLMCallRecord) -> Path | None:
        logger.info(
            "llm_call",
            extra={
                "feature": record.feature,
                "label": record.label,
                "model": record.model,
                "provider": record.provider,
                "schema": record.schema,
                "duration_ms": record.duration_ms,
                "approximate_cost": record.approximate_cost,
                "error": record.error,
            },
        )
        return None  # nothing persisted to a path

configure_llm_logging(StructuredStdoutSink())   # pass None to disable logging entirely

An OpenTelemetry exporter (e.g. to Langfuse/Phoenix) is a natural future llmkit[otel] extra; the pluggable seam makes it a non-breaking addition.

Configuration

LLMClientConfig is flat and carries only what a call needs:

@dataclass(frozen=True)
class LLMClientConfig:
    provider: Provider          # OPENROUTER | OLLAMA | GOOGLE | ANTHROPIC
    model: str                  # the provider's default model
    api_key: str | None = None
    base_url: str | None = None

Per-call model= overrides the default, so "strong/small/current" model roles are the host's concern — resolve them to a model string and pass it at the call site. The library has no opinion about roles.

Register the config with configure_llm_client(source), where source is a zero-arg callable returning an LLMClientConfig (re-read on each provider construction, so it tracks live settings changes).

Retries

Two retry layers, kept deliberately separate:

with_retries() (retry.py) handles transient provider errors (429 / 503 / 5xx; the recoverable set is LLM_RECOVERABLE_ERRORS).
instructor's own low max_retries handles schema-validation repair (re-ask the model to fix malformed JSON).

Development

uv sync --extra dev
uv run ruff check . && uv run ruff format --check .
uv run basedpyright          # 0 errors, 0 warnings — no baseline
uv run pytest

Status & support

llmkit is a small, opinionated, best-effort project, extracted from a real application and maintained in the open. It is used in production by its author but carries no support SLA. Bug reports and focused pull requests are welcome — see CONTRIBUTING.md. For security issues, see SECURITY.md.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

csanford

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Jun 9, 2026

This version

0.1.2

Jun 5, 2026

0.1.1

Jun 5, 2026

0.1.0

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omg_llmkit-0.1.2.tar.gz (28.0 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omg_llmkit-0.1.2-py3-none-any.whl (25.3 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file omg_llmkit-0.1.2.tar.gz.

File metadata

Download URL: omg_llmkit-0.1.2.tar.gz
Upload date: Jun 5, 2026
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omg_llmkit-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`dd426ceae451f10fdfa1292ba8ee03631b043a8acdf4f76890ebdf6be79f5126`
MD5	`8219ebb68e1879688d13ae52e053b257`
BLAKE2b-256	`058a8c53250b9c4939e7da320e38c66a399c89c86be51bb3910bb8d3b18170bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omg_llmkit-0.1.2.tar.gz:

Publisher: publish.yml on OMGBrews/llmkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omg_llmkit-0.1.2.tar.gz
- Subject digest: dd426ceae451f10fdfa1292ba8ee03631b043a8acdf4f76890ebdf6be79f5126
- Sigstore transparency entry: 1736934850
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: OMGBrews/llmkit@94f3688db406a6ce250986eea23aeadad1a2a16c
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/OMGBrews
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@94f3688db406a6ce250986eea23aeadad1a2a16c
- Trigger Event: release

File details

Details for the file omg_llmkit-0.1.2-py3-none-any.whl.

File metadata

Download URL: omg_llmkit-0.1.2-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 25.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omg_llmkit-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eae617f9d6d5ae58db6fea2f4e2e48097f70cc6c955eab4e459810132bfbd68`
MD5	`a3f00298814ccb86b7a34ef630a52d5e`
BLAKE2b-256	`097ea52c3d6a6eca8a57f82df86abc7095d640fc92f4fa2d37b909ca3c6a804c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omg_llmkit-0.1.2-py3-none-any.whl:

Publisher: publish.yml on OMGBrews/llmkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omg_llmkit-0.1.2-py3-none-any.whl
- Subject digest: 4eae617f9d6d5ae58db6fea2f4e2e48097f70cc6c955eab4e459810132bfbd68
- Sigstore transparency entry: 1736934864
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: OMGBrews/llmkit@94f3688db406a6ce250986eea23aeadad1a2a16c
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/OMGBrews
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@94f3688db406a6ce250986eea23aeadad1a2a16c
- Trigger Event: release

omg-llmkit 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llmkit

Why llmkit

Install

Quick start

Logging: agent-readable by default

Write your own LogSink

Configuration

Retries

Development

Status & support

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Write your own `LogSink`