Skip to main content

Provider-neutral, low-level foundation for LLM APIs: one canonical representation, exact serde, adapters for OpenAI/Anthropic/Gemini and every Chat Completions-compatible server — stdlib-only.

Project description

lm15

lm15 is a small, typed, provider-neutral interface for foundation-model requests, responses, streams, tools, media parts, endpoint APIs, errors, and canonical JSON serialization. This repository is its Python reference implementation.

What lm15 is — and deliberately is not. lm15 is a low-level foundation library: one canonical representation, exact serde for it, and adapters that translate it to and from each provider's wire format — stdlib-only, with its own HTTP transport (websockets is the single optional extra, for live sessions). It is NOT an opinionated user-facing API: no magic call(), no automatic tool loops, no DSL. lm15 is meant to be the dependency for libraries that want to build their own take on the right way to talk to AI systems in Python — you bring the opinions, lm15 brings every provider.

The public API is the top-level package: from lm15 import AnthropicLM, Request, Message, ... (see lm15/__init__.py for the full curated surface). Transport plumbing stays under lm15.transports, live sessions under lm15.live, and the conformance shim under lm15.vet.

The code blocks below are documentation that runs: every output block is the real, captured output of the example above it.

Install

The package name is lm15. It is not on PyPI yet — publishing 1.0 there is the plan. Until then, install from source:

git clone https://github.com/MaximeRivest/lm15-python2 && cd lm15-python2
python3 -m pip install -e .
# Optional extra for websocket live sessions:
python3 -m pip install -e '.[live]'

lm15 has zero required dependencies — it is stdlib-only, including its HTTP transports.

Quickstart

import os

from lm15 import Config, Message, OpenAILM, Request

lm = OpenAILM(api_key=os.environ["OPENAI_API_KEY"])

response = lm.complete(
    Request(
        model="gpt-4.1-mini",
        system="You are terse.",
        messages=(Message.user("Say hello in three words."),),
        config=Config(max_tokens=50, temperature=0.2),
    )
)

print(response.text)
print(response.finish_reason)
print(response.usage.total_tokens)
Hello there, friend.
stop
27

The mental model is one straight line:

Message parts → Message → Request → ProviderLM → Response
                              │
                              └── stream() → StreamEvent → materialized Response

One Request, every provider

The exact same Request shape drives the three first-party adapters:

import os

from lm15 import AnthropicLM, GeminiLM, Message, Request

providers = [
    AnthropicLM(api_key=os.environ["ANTHROPIC_API_KEY"]),
    GeminiLM(api_key=os.environ["GEMINI_API_KEY"]),
]

for lm in providers:
    response = lm.complete(
        Request(
            model={
                "anthropic": "claude-sonnet-4-5",
                "gemini": "gemini-3-flash-preview",
            }[lm.provider],
            messages=(Message.user("Say hello."),),
        )
    )
    print(lm.provider, response.text)
anthropic Hello! How can I help you today?
gemini Hello! How can I help you today?

And the same shape reaches every OpenAI-compatible server through OpenAIChatLM, the Chat Completions dialect adapter. A compat preset name — "ollama", "groq", "openrouter", "vllm", "sglang", ... — bundles that server's wire-format quirks and its default base_url, so a local Ollama is one constructor argument away:

from lm15 import Config, Message, OpenAIChatLM, Request

lm = OpenAIChatLM(api_key="ollama", compat="ollama")  # base_url -> http://localhost:11434/v1

response = lm.complete(
    Request(
        model="qwen3.5:0.8b",
        messages=(Message.user("Say hello in five words or fewer."),),
        config=Config(max_tokens=80, extensions={"reasoning_effort": "none"}),
    )
)

print(response.text)
Hello there! I'm ready to help. What would you like me to discuss?

Swap compat="groq" (plus your Groq key) or compat="openrouter" and the same request hits those servers; pass an explicit base_url to point a preset anywhere. Server-specific knobs ride in Config.extensions and pass through verbatim.

Streaming

stream() yields typed StreamEvent objects. Text arrives as StreamDeltaEvent(delta=TextDelta(...)), and the stream is normalized across providers: exactly one StreamEndEvent ends the stream, carrying finish_reason and usage (mapping rule MAP-3).

import os

from lm15 import Message, OpenAILM, Request, StreamDeltaEvent, TextDelta

lm = OpenAILM(api_key=os.environ["OPENAI_API_KEY"])
request = Request(
    model="gpt-4.1-mini",
    messages=(Message.user("Write one short sentence about Montreal."),),
)

for event in lm.stream(request):
    if isinstance(event, StreamDeltaEvent) and isinstance(event.delta, TextDelta):
        print(event.delta.text, end="", flush=True)
Montreal is a vibrant, multicultural city in Canada known for its rich history and festivals.

To consume a stream into a full Response:

from lm15 import materialize_response

response = materialize_response(lm.stream(request), request)
print(response.text)
Montreal is a vibrant, multicultural city in Canada known for its rich history and cuisine.

The materialized Response is identical in shape to one from complete() — same message, finish_reason, usage, and provider_data.

Tools: the full round-trip

lm15 distinguishes function tools that your application executes from provider-native built-in tools like web search. Here is the complete function-tool round-trip — model asks, you run your function, you answer back:

import os

from lm15 import FunctionTool, Message, OpenAILM, Request

lm = OpenAILM(api_key=os.environ["OPENAI_API_KEY"])

def get_weather(city: str) -> str:
    return f"Sunny and 22°C in {city}."

weather_tool = FunctionTool(
    name="get_weather",
    description="Get the current weather for a city.",
    parameters={
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
)

messages = (Message.user("What is the weather in Montreal?"),)
request = Request(model="gpt-4.1-mini", messages=messages, tools=(weather_tool,))

response = lm.complete(request)
for call in response.tool_calls:
    print(call.name, call.input)
get_weather {'city': 'Montreal'}

Now run your function and hand the result back. The model's tool-call turn is response.message; your answer is Message.tool({call_id: result}):

call = response.tool_calls[0]
result = get_weather(**call.input)

messages = (*messages, response.message, Message.tool({call.id: result}))
final = lm.complete(Request(model="gpt-4.1-mini", messages=messages, tools=(weather_tool,)))
print(final.text)
The weather in Montreal is sunny with a temperature of 22°C. Would you like to know the forecast for the coming days or any other information?

lm15 will never run the loop for you — that's your layer. This is the whole loop.

Built-in tools are provider-executed; you just declare them and read the results (citations come back as typed parts):

from lm15 import BuiltinTool, Message, Request

response = lm.complete(
    Request(
        model="gpt-4.1-mini",
        messages=(Message.user("Where will the 2028 Summer Olympics be held? One sentence, cite a source."),),
        tools=(BuiltinTool("web_search"),),
    )
)

print(response.text)
for citation in response.citations:
    print(citation.title, citation.url)
The 2028 Summer Olympics are scheduled to be held in Los Angeles, California, United States, from July 14 to 30, 2028. ([britannica.com](https://www.britannica.com/event/Los-Angeles-2028-Summer-Olympic-Games?utm_source=openai))
Los Angeles 2028 Summer Olympic Games | Bidding, Host, Venues, Planning, Sports, Marketing, & Facts | Britannica https://www.britannica.com/event/Los-Angeles-2028-Summer-Olympic-Games?utm_source=openai

Async

Every adapter has an async mirror — AsyncOpenAILM, AsyncAnthropicLM, AsyncGeminiLM, AsyncOpenAIChatLM, AsyncClaudeCodeLM, AsyncOpenAICodexLM — with the same constructor fields, the same canonical Request in, and the same Response/stream events out. await is the only difference: complete() is async def, and stream() is an async for-able iterator of the same events.

import asyncio

from lm15 import (
    AsyncOpenAIChatLM,
    Config,
    Message,
    Request,
    StreamDeltaEvent,
    TextDelta,
)

async def main() -> None:
    lm = AsyncOpenAIChatLM(api_key="ollama", compat="ollama")
    request = Request(
        model="qwen3.5:0.8b",
        messages=(Message.user("Name two colors."),),
        config=Config(max_tokens=80, extensions={"reasoning_effort": "none"}),
    )

    response = await lm.complete(request)
    print(response.text)

    async for event in lm.stream(request):
        if isinstance(event, StreamDeltaEvent) and isinstance(event.delta, TextDelta):
            print(event.delta.text, end="", flush=True)
    print()

asyncio.run(main())
Two examples of natural and artificial colors are **red** and **blue**.
Two common names for a color are **red** (or crimson) and **blue** (often called indigo, cobalt, or azure). Other examples include green, yellow, purple, and brown.

The non-chat endpoints (embeddings, files, batch, image, audio, live) are sync-only for now; the async classes raise UnsupportedFeatureError for them rather than pretending. Async endpoint mirrors are planned.

Local subscription adapters

The ordinary provider adapters use API keys that callers pass explicitly: OpenAILM(api_key=...), AnthropicLM(api_key=...), and GeminiLM(api_key=...).

lm15 also has explicit local-developer subscription adapters for users who are already signed in to provider CLIs. These adapters do not read API-key environment variables. They read local OAuth credentials created by the CLI and send provider-specific OAuth headers.

Claude Code subscription auth

Use ClaudeCodeLM.from_claude_code() when Claude Code is installed and logged in as the same OS user:

from lm15 import ClaudeCodeLM, Config, Message, Request

lm = ClaudeCodeLM.from_claude_code()

response = lm.complete(
    Request(
        model="claude-fable-5",
        messages=(Message.user("Say hello briefly."),),
        config=Config(max_tokens=128),
    )
)

print(response.text)

The default credential path is ~/.claude/.credentials.json. If the credential is missing or expired, run Claude Code and log in again (claude, then /login if prompted).

ClaudeCodeLM always prepends the Claude Code system prompt required by this OAuth route:

You are Claude Code, Anthropic's official CLI for Claude.

If Request.system is also provided, lm15 keeps both: the required Claude Code prompt comes first, then the caller's system instruction.

Fable 5 note: Fable may spend part of max_tokens on hidden thinking, so a too-small budget can return no visible text with finish_reason="length". Use Config(max_tokens=128) or higher for non-trivial prompts.

OpenAI Codex / ChatGPT subscription auth

Use OpenAICodexLM.from_codex_cli() when Codex CLI is installed and signed in with ChatGPT:

from lm15 import Message, OpenAICodexLM, Request

lm = OpenAICodexLM.from_codex_cli()

response = lm.complete(
    Request(
        model="gpt-5.5",
        messages=(Message.user("Say hello briefly."),),
    )
)

print(response.text)

The default credential path is ~/.codex/auth.json. OpenAICodexLM reads the local ChatGPT OAuth access token and account id from that file, then calls the Codex subscription endpoint. The Codex subscription backend is streaming-first, so complete() internally streams and materializes a normal Response.

Current Codex route note: lm15 intentionally omits max-token fields here because the verified local Codex route accepts the request shape without them; set output limits in your application layer if you need a hard cap.

These subscription adapters are intended for local interactive development, not server or CI deployments. Treat the credential files as secrets; do not print or log their bearer tokens.

Media and non-chat endpoints

Multimodal input uses typed media parts (ImagePart, AudioPart, DocumentPart, ...):

import os

from lm15 import ImagePart, Message, OpenAILM, Request, TextPart

lm = OpenAILM(api_key=os.environ["OPENAI_API_KEY"])

request = Request(
    model="gpt-4.1-mini",
    messages=(
        Message.user([
            TextPart("Describe this image in a few words."),
            ImagePart(
                url="https://raw.githubusercontent.com/github/explore/main/topics/react/react.png",
                media_type="image/png",
                detail="low",
            ),
        ]),
    ),
)

print(lm.complete(request).text)
This image shows a blue atomic symbol, often used to represent an atom or atomic energy.

Non-chat endpoints have separate request/response types — EmbeddingRequest, ImageGenerationRequest, AudioGenerationRequest, FileUploadRequest, BatchRequest, LiveConfig:

from lm15 import EmbeddingRequest

embeddings = lm.embeddings(
    EmbeddingRequest(
        model="text-embedding-3-small",
        inputs=("hello", "world"),
    )
)
print(len(embeddings.vectors), len(embeddings.vectors[0]))
2 1536

Canonical JSON serialization

The serde functions convert every public lm15 type to canonical JSON-compatible dicts and back, exactly — this is the wire format the conformance corpus pins:

from lm15 import Message, Request, request_from_dict, request_to_dict

request = Request(model="gpt-4.1-mini", messages=(Message.user("Hi"),))
wire = request_to_dict(request)
round_tripped = request_from_dict(wire)
round_tripped == request
True

Error normalization

Provider-specific HTTP/API errors are normalized into one lm15 error hierarchy, so callers handle AuthError, RateLimitError, ContextLengthError, ... identically across providers:

import os

from lm15 import AuthError, Message, OpenAILM, ProviderError, RateLimitError, Request

lm = OpenAILM(api_key="not a key")

try:
    lm.complete(Request(model="gpt-4.1-mini", messages=(Message.user("Hi"),)))
except AuthError as exc:
    print("Check API key:", exc.env_keys)
except RateLimitError as exc:
    print("Retry later:", exc.retry_after)
except ProviderError as exc:
    print(exc.provider, exc.provider_code, exc.status, exc.request_id)
Check API key: ('OPENAI_API_KEY',)

Model metadata

ModelRegistry.discover() hydrates optional, advisory model metadata (pricing, context windows, capability hints) from installed catalog packages via the lm15.model_catalogs entry-point group — the aimo catalog is one such package. Hydrated metadata never changes what an adapter sends: requests are byte-identical with or without it. See docs/model-hydration.md for the contract.

Design notes

  • docs/design-rationale.md — why config=Config(...) instead of kwargs, why there is no automatic tool loop, why request extensions and response provider_data are different names on purpose.
  • docs/serde-rules.md — the canonical JSON omission and round-trip rules.
  • docs/mapping-rules.md — the provider mapping invariants (MAP-1, MAP-2, MAP-3, ...).
  • Behavior is pinned by a cross-language conformance corpus: the sibling lm15-contract repository is the spec; this package is the reference implementation, not the authority.

Contributing

Fixture and conformance workflows, the doc-drift checker, the provider adapter development guide, and the useful-commands cheat sheet live in CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm15-0.3.0.tar.gz (163.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lm15-0.3.0-py3-none-any.whl (138.2 kB view details)

Uploaded Python 3

File details

Details for the file lm15-0.3.0.tar.gz.

File metadata

  • Download URL: lm15-0.3.0.tar.gz
  • Upload date:
  • Size: 163.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for lm15-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3046268ba5224ff0b6cef5b07195a421750f06d04f5a7e7c7a99e1a1ba711040
MD5 ba5f93446e961aa6a623fd5626a33d70
BLAKE2b-256 587d9c0c09c825784afb9aaa8d79fab8f039e3469630b800fd171c5d98085478

See more details on using hashes here.

File details

Details for the file lm15-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: lm15-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 138.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for lm15-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a048c8e9d7ff48762c90f8cf7837ec351e829705063b1d90e1b22704e80378fb
MD5 c419c86370b28c32182e2ac8b6e73b80
BLAKE2b-256 f9cd6ca2193435935f13dead7cffd62cf7d102670cac90d27993973f85b083fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page