Skip to main content

Local browser-backed LLM and multimodal inference bridge

Project description

xlocllm

xlocllm is a Python SDK for local browser-backed AI inference. It exposes an OpenAI-compatible loopback API from Python while running model weights in a paired browser window through WebGPU/WebNN with MLC WebLLM and Transformers.js.

The goal is simple:

pip install xlocllm

Then:

import xlocllm

llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b")
runtime = xlocllm.runtime([llm])
runtime.run()

print(runtime.url)  # http://127.0.0.1:1146/v1
print(runtime.chat("Say hello", temperature=0))

What It Does

  • Starts a local FastAPI bridge on 127.0.0.1.
  • Opens a paired browser app window.
  • Runs models inside the browser runtime.
  • Provides OpenAI-compatible /v1 endpoints for local clients.
  • Supports LLMs, embeddings, rerankers, translation, TTS, vision, ASR, and more through a shared catalog.
  • Keeps Python-side objects for models, units, runtimes, and bridges.

Install

pip install xlocllm

Optional OpenAI client helper:

pip install "xlocllm[openai]"

Development install from this repository:

python -m pip install -e .\python\xlocllm[dev,openai]

Quick Start

import xlocllm

runtime = xlocllm.runtime(
    [
        xlocllm.unit("LLM", "Qwen-3.5-0.8b"),
        xlocllm.unit("embedding", "multilingual-e5-small"),
    ]
)

runtime.install()
runtime.run()

print(runtime.status())

OpenAI-Compatible Usage

import xlocllm
from openai import OpenAI

llm = xlocllm.unit(type="LLM", model="Qwen-3.5-0.8b-fp32")
client = OpenAI(base_url="http://127.0.0.1:1146/v1", api_key="xlocllm")

with xlocllm.runtime([llm]) as runtime:
    runtime.run()
    response = client.chat.completions.create(
        model="Qwen-3.5-0.8b-fp32",
        messages=[{"role": "user", "content": "What is lidar?"}],
        max_tokens=64,
    )
    print(response.choices[0].message.content)

With the optional helper:

client = runtime.client()

Core API

model = xlocllm.model("Qwen-3.5-0.8b", unit="LLM")
models = xlocllm.models(unit="LLM", max_vram_mb=1500)
cpu_models = xlocllm.models(webgpu=False)

unit = xlocllm.unit("LLM", "Qwen-3.5-0.8b", reasoning=None)
runtime = xlocllm.runtime([unit], port=1146)
bridge = xlocllm.Bridge(port=1146)

print(runtime.url)
print(bridge.url)
print(xlocllm.bridges())
print(xlocllm.runtimes())
print(xlocllm.status())
print(xlocllm.benchmark())
print(xlocllm.benchmark("LLM"))

benchmark() temporarily opens a paired mini browser by default to detect real WebGPU/WebNN/NPU support, then closes it. With a unit type, it returns fast and quality recommendations.

Reasoning-capable LLMs can be configured at creation and updated hot:

llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b-fp32", reasoning=False)
runtime.set_reasoning(llm.id, True)

CLI:

xlocllm status
xlocllm benchmark
xlocllm benchmark LLM
xlocllm models --unit LLM --no-webgpu
xlocllm run --unit LLM --model "Qwen-3.5-0.8b"

Documentation

Model Lookup

Use exact modelId, label, or aliases:

xlocllm.unit("LLM", "Qwen-3.5-0.8b")
xlocllm.unit("LLM", "Qwen3.5-0.8B-q4f16_1-MLC")
xlocllm.unit("embedding", "multilingual-e5-small")

Browse the complete catalog in models.md.

Local State

By default, xlocllm stores bridge metadata and browser profiles under:

  • Windows: %LOCALAPPDATA%\xlocllm
  • Linux/macOS: $XDG_STATE_HOME/xlocllm or ~/.local/state/xlocllm

Environment variables:

  • XLOCLLM_HOME - override local state directory.
  • XLOCLLM_WEB_URL - use a custom web runtime URL.
  • XLOCLLM_LOG_LEVEL - uvicorn log level.

Development Checks

python -m pytest python/xlocllm/tests
python -m ruff check python/xlocllm/src python/xlocllm/tests
python -m mypy python/xlocllm/src

Build the Python package:

cd python\xlocllm
python -m build

Notes

The bridge binds to loopback only. The browser window must remain open while browser-backed models are running. Without WebGPU, xlocllm exposes only the CPU/WASM-compatible Transformers.js subset and rejects heavier models before loading.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlocllm-1.0.1.tar.gz (8.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xlocllm-1.0.1-py3-none-any.whl (8.3 MB view details)

Uploaded Python 3

File details

Details for the file xlocllm-1.0.1.tar.gz.

File metadata

  • Download URL: xlocllm-1.0.1.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for xlocllm-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a241cb7cd30d5e3e657972fb095f8c3f6b87136ae39c102b6557fce60a0984d7
MD5 a8cec52d488d49f0b311a2b4ffd6fd6c
BLAKE2b-256 e5b4a7d408410b68179f9120434639977fac3aea4aefa601ad25604133128cea

See more details on using hashes here.

File details

Details for the file xlocllm-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: xlocllm-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for xlocllm-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef26bea934ef929baf51231fd25ff201faee26c1ff5a3dfd9894a5348af59086
MD5 5ff3b92579812deba616cfe30de9b218
BLAKE2b-256 8649caf4159eb931fd12af2be575eba42ab2ca8455757dd0ffde2199318e7ea0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page