Skip to main content

A lightweight Python SDK for using local and OpenAI-compatible LLMs.

Project description

llmbridge

llmbridge is a lightweight Python SDK and CLI for using local and OpenAI-compatible LLMs. It connects to runtimes you already run, such as Ollama, LM Studio, vLLM, llama.cpp server, LocalAI, or another OpenAI-compatible API.

llmbridge does not ship model files. You install and run the model runtime yourself, then use llmbridge as a small developer-friendly bridge.

Features

  • Local Ollama provider
  • Generic OpenAI-compatible provider
  • CLI commands for setup checks, model listing, chat, ask, pull, and config
  • Streaming responses
  • Local config at ~/.llmbridge/config.toml
  • Prompt templates
  • Structured JSON output with Pydantic validation and retry
  • Typed response models

Installation

pip install llmbridge-sdk

The PyPI distribution is named llmbridge-sdk. The Python import and CLI command remain llmbridge.

Requirements

  • Python 3.10+
  • Ollama for the Ollama provider, or an already-running OpenAI-compatible server
  • No bundled LLM model files

Ollama Quickstart

Install Ollama from https://ollama.com, start it locally, then pull a model:

ollama pull llama3.1:latest

Check your setup:

llmbridge doctor
llmbridge serve-check
llmbridge models

Ask a question:

llmbridge ask "Explain FastAPI in simple words"

Set your default model:

llmbridge config set model llama3.1:latest

OpenAI-Compatible Quickstart

Use an OpenAI-compatible server such as LM Studio, vLLM, llama.cpp server, or LocalAI. The base_url should point to the API root, usually ending in /v1.

llmbridge ask "Explain FastAPI" \
  --provider openai_compatible \
  --model local-model \
  --base-url http://localhost:1234/v1

List models:

llmbridge models \
  --provider openai_compatible \
  --base-url http://localhost:1234/v1

The OpenAI-compatible provider does not download or manage models. Start your server with the model you want before calling llmbridge.

CLI Usage

Use the configured default model:

llmbridge ask "Explain FastAPI"

Override the model:

llmbridge ask "Explain FastAPI" --model gemma4:e4b

Adjust temperature:

llmbridge ask "Explain FastAPI" --temperature 0.2

Run chat with an explicit model:

llmbridge chat llama3.1:latest "Explain PostgreSQL in simple words"

Run chat against an OpenAI-compatible server:

llmbridge chat local-model "Hello" \
  --provider openai_compatible \
  --base-url http://localhost:1234/v1

Pull an Ollama model:

llmbridge pull llama3.1:latest

Streaming Usage

llmbridge chat llama3.1:latest "Explain Docker" --stream
llmbridge ask "Explain Docker" --stream

Streaming chunks are printed as they arrive. Non-streaming CLI output is trimmed before printing.

Python streaming:

from llmbridge import LLM

llm = LLM(model="llama3.1:latest")

for chunk in llm.stream("Explain Docker Compose"):
    print(chunk.text, end="")

Config Usage

llmbridge stores local CLI defaults in:

~/.llmbridge/config.toml

Supported config keys:

  • provider
  • model
  • base_url
  • api_key
  • temperature
  • timeout

Commands:

llmbridge config show
llmbridge config set provider ollama
llmbridge config set model llama3.1:latest
llmbridge config set base_url http://localhost:11434
llmbridge config set api_key local-secret
llmbridge config set temperature 0.2
llmbridge config set timeout 120
llmbridge config reset

For OpenAI-compatible servers:

llmbridge config set provider openai_compatible
llmbridge config set base_url http://localhost:1234/v1
llmbridge config set model local-model
llmbridge config set api_key local-secret

llmbridge config show masks stored API keys.

For llmbridge ask, model resolution order is:

  1. --model
  2. model in ~/.llmbridge/config.toml
  3. LLMBRIDGE_DEFAULT_MODEL
  4. llama3.1:latest

Python Usage

Ollama:

from llmbridge import LLM

llm = LLM(
    provider="ollama",
    model="llama3.1:latest",
)

response = llm.chat("Explain FastAPI in simple words")
print(response.text)

OpenAI-compatible:

from llmbridge import LLM

llm = LLM(
    provider="openai_compatible",
    model="local-model",
    base_url="http://localhost:1234/v1",
)

response = llm.chat("Explain FastAPI in simple words")
print(response.text)

Message format:

response = llm.chat(
    [
        {"role": "system", "content": "You are a helpful backend architect."},
        {"role": "user", "content": "Explain PostgreSQL indexes."},
    ]
)

PromptTemplate Usage

Use PromptTemplate for small reusable prompts with named variables:

from llmbridge import LLM, PromptTemplate

template = PromptTemplate("Explain {topic} for a {audience}.")
prompt = template.format(topic="FastAPI", audience="backend developer")

llm = LLM(model="llama3.1:latest")
response = llm.chat(prompt)
print(response.text)

If a required variable is missing, llmbridge raises PromptTemplateError.

Structured Output Usage

LLM.structured() asks the model for JSON, validates it with a Pydantic schema, and returns a typed object:

from pydantic import BaseModel

from llmbridge import LLM


class TaskResult(BaseModel):
    title: str
    priority: str


llm = LLM(model="llama3.1:latest")
result = llm.structured(
    "Create a task for fixing a login bug",
    schema=TaskResult,
)

print(result.title)
print(result.priority)

Structured output depends on the model following instructions. llmbridge asks for JSON matching your schema, extracts JSON from the response, validates it with Pydantic, and retries when the output is invalid. If the final response still cannot be parsed or validated, llmbridge raises StructuredOutputError.

SQL plan example:

from pydantic import BaseModel

from llmbridge import LLM


class SQLPlan(BaseModel):
    sql: str
    explanation: str
    tables_used: list[str]


llm = LLM(model="llama3.1:latest")
plan = llm.structured(
    "Create a SQL plan to list the latest 10 paid invoices. Do not execute SQL.",
    schema=SQLPlan,
)

print(plan.sql)

This returns a structured SQL plan only. llmbridge does not execute SQL.

Examples

Runnable examples live in the examples/ folder:

python examples/basic_chat.py
python examples/streaming_chat.py
python examples/list_models.py
python examples/custom_options.py
python examples/ask_style_usage.py
python examples/prompt_template.py
python examples/structured_output.py
python examples/structured_sql_plan.py

Troubleshooting

If Ollama is not running, you may see:

Ollama is not running at http://localhost:11434. Start Ollama and run: ollama pull llama3.1

Start Ollama and pull the selected model:

ollama pull llama3.1:latest

If the CLI says a model is missing:

Model 'llama3.1:latest' is not installed.
Run:
  llmbridge pull llama3.1:latest

Pull it:

llmbridge pull llama3.1:latest

If your Ollama server uses a different URL:

llmbridge config set base_url http://localhost:11434

Or pass it for one command:

llmbridge ask "Explain FastAPI" --base-url http://localhost:11434

Roadmap

  • More provider integrations
  • Better structured-output controls
  • Tool calling
  • Embeddings and RAG support
  • Higher-level application workflows

Local Development

git clone https://github.com/iwasbugged/llmbridge.git
cd llmbridge
python3 -m pip install -e ".[dev]"

Run tests:

python3 -m pytest

Run linting:

python3 -m ruff check .
python3 -m ruff format --check .

Author

Rahul Kumar iamrahul.rk4@gmail.com

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmbridge_sdk-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmbridge_sdk-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file llmbridge_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: llmbridge_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmbridge_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 90e59c2bec82289ec991d857215f834cedb9348b7a19a95947f0e7b3eadb1ab5
MD5 20a5d969510d6d0c0f0e6dfc2c3176b1
BLAKE2b-256 a9958c9221b3560608dd76edc57dbe6b64699d909d53f73403c2fd5b14f516b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmbridge_sdk-0.1.0.tar.gz:

Publisher: publish.yml on iwasbugged/llmbridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmbridge_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmbridge_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmbridge_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 777460f4a73e67724eb2a238ab4fb6bdafb20bf910ec778c98eb0f93b5592598
MD5 82a0aa6d747c13c7037e31c1f166afa8
BLAKE2b-256 0ec3399cb30f173be1f36aed82a793c8e3b7eee5cb372c45a6766685c19cbc37

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmbridge_sdk-0.1.0-py3-none-any.whl:

Publisher: publish.yml on iwasbugged/llmbridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page