slimx

A slim, intuitive, lightweight Python library for calling LLMs (high-level + low-level) with multi-provider support.

These details have not been verified by PyPI

Project description

SlimX — the LLM runtime you can actually read

A tiny, inspectable, vendor-neutral Python library for calling LLMs — one API across OpenAI, Anthropic, Gemini, Ollama, and any OpenAI-compatible server.

from slimx import llm

m = llm("ollama:llama3.2")
print(m("Hello, world").text)

Change the provider by changing the string — the rest of your code stays the same.

Why SlimX

One API, every model — OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible servers (vLLM, llama.cpp, LM Studio, …). No lock-in.
See exactly what's sent — dry-run the precise request before it leaves, hook every call, and save reproducible call records. Glass box, not black box.
Tiny & readable — ~3,000 lines of code, one dependency (httpx), fully typed. Read the whole thing in an afternoon.
Call many models at once — parallel(...) to compare answers, race for the fastest, or let a judge model pick the best.
Multimodal — attach images, documents, and audio with image() / document() / audio(); SlimX serializes each into the provider's native shape and elides base64 from dry-runs and records. See docs/concepts/multimodal.md.
Explicit, with batteries — tools, streaming, structured output with auto-repair, a two-layer high/low API, conformance-tested providers, and a slimx CLI.

from slimx import llm, image

m = llm("anthropic:claude-sonnet-4-6")
print(m("What's in this picture?", images=[image("diagram.png")]).text)

slimx doctor diagnosing providers and listing models in a terminal

# See what SlimX would send — exact URL, headers (secrets redacted), body — no network call:
print(llm("openai:gpt-4.1-nano").inspect("Hello").pretty())

# Ask several models and let one judge the best answer:
from slimx import parallel
best = parallel(
    ["openai:gpt-4.1-mini", "google:gemini-3.5-flash"],
    mode="judge", judge="anthropic:claude-haiku-4-5",
)
print(best("Explain SlimX in one line.").text)

Going deeper: ARCHITECTURE.md is a diagram-driven tour of the runtime; DEVELOPMENT.md is the engineering charter and Provider Contract.

Install

For users

Create a new project and install SlimX:

uv init my-project
cd my-project
uv add slimx

Run Python through uv so it uses the project virtual environment:

uv run python

Or install with pip:

pip install slimx

For contributors

git clone https://github.com/slimx-ai/slimx.git
cd slimx
uv sync --all-extras
uv run pytest -q

uv sync reads pyproject.toml and uv.lock when present.

uv.lock is committed to help contributors reproduce the development environment.

Supported providers

Provider	Prefix	Environment variable	Notes
OpenAI	`openai:`	`OPENAI_API_KEY`	Default provider when no prefix is given
OpenAI-compatible	`oai:`	OpenAI-compatible `/v1/chat/completions` API	vLLM, LM Studio, llama.cpp server, LocalAI, Ollama `/v1`, internal gateways
Google Gemini	`google:`	`GOOGLE_API_KEY` or `GEMINI_API_KEY`	Supports chat, streaming, JSON output, and tools
Anthropic	`anthropic:`	`ANTHROPIC_API_KEY`	Claude Messages API; chat, tools, and native streaming
Ollama	`ollama:`	optional `OLLAMA_BASE_URL`	Local models; chat, streaming, tools, JSON (model-dependent)

Inspect provider capabilities

Check what a provider supports before runtime — no API key or running server required:

from slimx.providers import describe_provider

describe_provider("google")
# {'name': 'google', 'native': True, 'tools': True, 'structured_output': True,
#  'streaming': True, 'async_chat': False, 'async_streaming': False}

from slimx import llm
llm("openai:gpt-4.1-nano").capabilities.tools  # True

Every provider is checked against a shared conformance suite (tests/conformance/), so declared capabilities always match real behavior. See docs: Provider Capabilities and docs: OpenAI-compatible servers.

Configure providers

OpenAI

export OPENAI_API_KEY="..."
# optional:
export OPENAI_BASE_URL="https://api.openai.com/v1"

OpenAI-compatible servers

Use oai: for local or self-hosted servers that expose an OpenAI-compatible /v1/chat/completions API.

export SLIMX_OAI_BASE_URL="http://localhost:8000/v1"
export SLIMX_OAI_API_KEY="EMPTY"

SLIMX_OAI_API_KEY can be a real key for authenticated gateways, or EMPTY for local servers that ignore authentication.

Google Gemini

export GOOGLE_API_KEY="..."
# or:
export GEMINI_API_KEY="..."

# optional:
export GOOGLE_BASE_URL="https://generativelanguage.googleapis.com/v1beta"

Anthropic

export ANTHROPIC_API_KEY="..."
# optional:
export ANTHROPIC_BASE_URL="https://api.anthropic.com"
export ANTHROPIC_VERSION="2023-06-01"

Ollama local models

export OLLAMA_BASE_URL="http://localhost:11434"

For Ollama, make sure the server is running and the model is available:

ollama serve

In another terminal:

ollama pull llama3.2:3b
ollama list

Quickstart

OpenAI

from slimx import llm

m = llm("openai:gpt-4.1-nano", temperature=0.2)
res = m("Write a haiku about fog and streetlights.")

print(res.text)

OpenAI-compatible local/self-hosted server

from slimx import llm

m = llm(
    "oai:Qwen/Qwen2.5-7B-Instruct",
    provider_kwargs={
        "base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY",
    },
    timeout=120,
)

res = m("Explain why compatibility APIs are useful for local model serving.")

print(res.text)

Google Gemini

from slimx import llm

m = llm("google:gemini-3.5-flash", temperature=0.2)
res = m("Write a haiku about small, inspectable AI software.")

print(res.text)

Ollama local model

from slimx import llm

m = llm("ollama:llama3.2:3b", temperature=0.2, timeout=120)
res = m("Explain why small libraries are easier to inspect.")

print(res.text)

Response structure

Calling a SlimX model returns a Result object.

from slimx import llm

m = llm("ollama:llama3.2:3b", timeout=120)
res = m("Explain why small libraries are easier to inspect.")

print(res.text)

A Result contains:

Result(
    text="...",          # Normalized assistant text
    raw={...},           # Raw provider response
    usage=Usage(...),    # Token usage when available
    tool_calls=[],       # Tool/function calls requested by the model
    data=None,           # Parsed structured output, used by .json(...)
    trace={...},         # Runtime metadata: provider, model, latency, retries, tools
)

Most applications should use:

print(res.text)

Use res.raw when you need provider-specific details, and res.trace when you want runtime diagnostics such as provider name, model name, elapsed time, retries, and tool-call count.

Streaming

from slimx import llm

m = llm("google:gemini-3.5-flash", temperature=0.2)

for ev in m.stream("Tell a short story in 5 lines."):
    if ev.type == "text_delta":
        print(ev.text, end="", flush=True)

print()

Tools

SlimX tools are provider-neutral. The same @tool interface can be used across providers that support tool/function calling.

from slimx import llm, tool


@tool
def add(a: int, b: int) -> int:
    "Add two integers."
    return a + b


m = llm("google:gemini-3.5-flash", tools=[add], tool_runtime="auto")
res = m("What is 12 + 30?")

print(res.text)

Parallel execution

Fan one prompt out to several models at once with parallel(...). Use mode="all" to compare every answer, or mode="race" for the first successful response.

from slimx import parallel

ensemble = parallel(["google:gemini-3.5-flash", "openai:gpt-4.1-nano"])
res = ensemble("Explain SlimX in one paragraph.")

for item in res.results:
    print(item.model, item.result.text if item.ok else item.error)

Failures are surfaced in res.errors (never swallowed) and each result keeps its raw provider response. See docs: Parallel execution.

Structured output

SlimX can parse structured JSON output into a dataclass.

from dataclasses import dataclass

from slimx import llm


@dataclass
class City:
    name: str
    country: str


m = llm("google:gemini-3.5-flash")
res = m.json("Paris is in France.", schema=City)

print(res.data)

Inspectability

See exactly what SlimX does — dry-run a request, observe calls with hooks, and save reproducible call records. No hosted platform, no extra dependency.

from slimx import llm, CallRecord

m = llm("openai:gpt-4.1-nano")

# 1) Dry-run: the exact request, secrets redacted, without sending it
print(m.inspect("Hello").pretty())

# 2) Hooks: observe every call (log it, push metrics, anything)
traced = llm("openai:gpt-4.1-nano", hooks={"after_call": print})

# 3) Reproducible records: save the whole call to JSON and reload it
res = m("Capital of France?")
res.to_record().save("run.json")
CallRecord.load("run.json")

See docs: Inspectability.

CLI & model discovery

Installing SlimX adds a slimx command (no extra dependencies):

slimx doctor              # which keys/servers are configured and reachable
slimx models ollama       # list models a provider exposes (no guessing model strings)
slimx providers           # registered providers + capabilities

slimx doctor is the fastest way to answer "why isn't my model working?" — usually a missing key or wrong base URL. The same discovery is available in code via list_models(...). See docs: CLI & discovery.

Low-level API

Use the low-level API when you want explicit control over messages, requests, clients, and providers.

from slimx import Message
from slimx.low import ChatRequest, Client
from slimx.providers import get_provider

provider = get_provider("google")
client = Client(provider, timeout=30, retries=2)

req = ChatRequest(
    model="gemini-3.5-flash",
    messages=[Message.user("Explain provider-neutral LLM clients in one paragraph.")],
    temperature=0.2,
)

res = client.chat(req)

print(res.text)
print(res.trace)

Provider plugins

SlimX supports third-party provider plugins through the slimx.providers entry point group.

Built-in providers are registered lazily, so importing slimx does not load provider modules or require API keys.

Stability

As of 1.0, SlimX commits to semantic versioning. The public API is stable:

the top-level surface (llm, allm, Model, AsyncModel, tool, Message, Result, StreamEvent, ToolCall, Usage, InspectedRequest, CallRecord, parallel, list_models, describe_provider, and slimx.low's Client / ChatRequest),
the Provider Contract that every provider implements (see DEVELOPMENT.md), which is enforced by the conformance suite in tests/conformance/.

Breaking changes to these will only land in a new major version. The package ships type information (PEP 561), so type checkers see SlimX's types out of the box.

Troubleshooting

`ModuleNotFoundError: No module named 'slimx'`

If you installed with uv add slimx, run Python through uv:

uv run python

Or activate the virtual environment first:

source .venv/bin/activate
python

Ollama model not found

Check which models are installed:

ollama list

Pull a model before using it:

ollama pull llama3.2:3b

Then use the exact model name:

m = llm("ollama:llama3.2:3b", timeout=120)

Ollama server not running

Start Ollama:

ollama serve

Then retry your SlimX script.

Development

Run the full validation suite before opening a pull request or tagging a release:

uv sync --all-extras
uv run ruff check .
uv run pyright
uv run pytest -q
uv run python -m build

Repo automation

This repository includes GitHub Actions for:

CI (.github/workflows/ci.yml)
Docs deployment to GitHub Pages (docs.yml)

See docs/ for more detailed documentation.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.6.0

Jul 4, 2026

1.5.0

Jun 24, 2026

1.4.0

Jun 18, 2026

1.3.0

Jun 5, 2026

1.0.0

Jun 4, 2026

0.10.0

Jun 4, 2026

0.8.0

Jun 3, 2026

0.7.2

Jun 3, 2026

0.7.0

Jun 3, 2026

0.6.0

Jun 3, 2026

0.5.1

Jun 3, 2026

0.5.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slimx-1.6.0.tar.gz (2.0 MB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slimx-1.6.0-py3-none-any.whl (88.0 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file slimx-1.6.0.tar.gz.

File metadata

Download URL: slimx-1.6.0.tar.gz
Upload date: Jul 4, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for slimx-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`9e8e29b7e4a31d45e22ecebd10138d36022437099ddbbc743c23ce7d2dd280c9`
MD5	`4bf251e605043d0415d17a5c6497fafb`
BLAKE2b-256	`dffae78950d1048613bf689a21a344bca5b4a0dbee8bd05f3682c22e41858d59`

See more details on using hashes here.

File details

Details for the file slimx-1.6.0-py3-none-any.whl.

File metadata

Download URL: slimx-1.6.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 88.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for slimx-1.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`671df5579ded074210182132703e242ad20a1b8da7207c064a522b3d21e3d636`
MD5	`5551e3f9402246b715b9d6c30848b877`
BLAKE2b-256	`8a9a66adab8fd4ded1a35ee1a01a45d48aa943dc66a914e7782dac271c4c2627`

See more details on using hashes here.

slimx 1.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Why SlimX

Install

For users

For contributors

Supported providers

Inspect provider capabilities

Configure providers

OpenAI

OpenAI-compatible servers

Google Gemini

Anthropic

Ollama local models

Quickstart

OpenAI

OpenAI-compatible local/self-hosted server

Google Gemini

Ollama local model

Response structure

Streaming

Tools

Parallel execution

Structured output

Inspectability

CLI & model discovery

Low-level API

Provider plugins

Stability

Troubleshooting

ModuleNotFoundError: No module named 'slimx'

Ollama model not found

Ollama server not running

Development

Repo automation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ModuleNotFoundError: No module named 'slimx'`