Local browser-backed LLM and multimodal inference bridge
Project description
xlocllm
xlocllm is a Python SDK for local browser-backed AI inference. It exposes an
OpenAI-compatible loopback API from Python while running model weights in a
paired browser window through WebGPU/WebNN with MLC WebLLM and Transformers.js.
The goal is simple:
pip install xlocllm
Then:
import xlocllm
llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b")
runtime = xlocllm.runtime([llm])
runtime.run()
print(runtime.url) # http://127.0.0.1:1146/v1
print(runtime.chat("Say hello", temperature=0))
What It Does
- Starts a local FastAPI bridge on
127.0.0.1. - Opens a paired browser app window.
- Runs models inside the browser runtime.
- Provides OpenAI-compatible
/v1endpoints for local clients. - Supports LLMs, embeddings, rerankers, translation, TTS, vision, ASR, and more through a shared catalog.
- Keeps Python-side objects for models, units, runtimes, and bridges.
Install
pip install xlocllm
Optional OpenAI client helper:
pip install "xlocllm[openai]"
Development install from this repository:
python -m pip install -e .\python\xlocllm[dev,openai]
Quick Start
import xlocllm
runtime = xlocllm.runtime(
[
xlocllm.unit("LLM", "Qwen-3.5-0.8b"),
xlocllm.unit("embedding", "multilingual-e5-small"),
]
)
runtime.install()
runtime.run()
print(runtime.status())
OpenAI-Compatible Usage
import xlocllm
from openai import OpenAI
llm = xlocllm.unit(type="LLM", model="Qwen-3.5-0.8b-fp32")
client = OpenAI(base_url="http://127.0.0.1:1146/v1", api_key="xlocllm")
with xlocllm.runtime([llm]) as runtime:
runtime.run()
response = client.chat.completions.create(
model="Qwen-3.5-0.8b-fp32",
messages=[{"role": "user", "content": "What is lidar?"}],
max_tokens=64,
)
print(response.choices[0].message.content)
With the optional helper:
client = runtime.client()
Core API
model = xlocllm.model("Qwen-3.5-0.8b", unit="LLM")
models = xlocllm.models(unit="LLM", max_vram_mb=1500)
cpu_models = xlocllm.models(webgpu=False)
unit = xlocllm.unit("LLM", "Qwen-3.5-0.8b", reasoning=None)
runtime = xlocllm.runtime([unit], port=1146)
bridge = xlocllm.Bridge(port=1146)
print(runtime.url)
print(bridge.url)
print(xlocllm.bridges())
print(xlocllm.runtimes())
print(xlocllm.status())
print(xlocllm.benchmark())
print(xlocllm.benchmark("LLM"))
benchmark() temporarily opens a paired mini browser by default to detect real
WebGPU/WebNN/NPU support, then closes it. With a unit type, it returns fast
and quality recommendations.
Reasoning-capable LLMs can be configured at creation and updated hot:
llm = xlocllm.unit("LLM", "Qwen-3.5-0.8b-fp32", reasoning=False)
runtime.set_reasoning(llm.id, True)
CLI:
xlocllm status
xlocllm benchmark
xlocllm benchmark LLM
xlocllm models --unit LLM --no-webgpu
xlocllm run --unit LLM --model "Qwen-3.5-0.8b"
Documentation
- Full English SDK docs:
docs.md - Full Russian SDK docs:
docs_ru.md - English model catalog:
models.md - Russian model catalog:
models_ru.md
Model Lookup
Use exact modelId, label, or aliases:
xlocllm.unit("LLM", "Qwen-3.5-0.8b")
xlocllm.unit("LLM", "Qwen3.5-0.8B-q4f16_1-MLC")
xlocllm.unit("embedding", "multilingual-e5-small")
Browse the complete catalog in models.md.
Local State
By default, xlocllm stores bridge metadata and browser profiles under:
- Windows:
%LOCALAPPDATA%\xlocllm - Linux/macOS:
$XDG_STATE_HOME/xlocllmor~/.local/state/xlocllm
Environment variables:
XLOCLLM_HOME- override local state directory.XLOCLLM_WEB_URL- use a custom web runtime URL.XLOCLLM_LOG_LEVEL- uvicorn log level.
Development Checks
python -m pytest python/xlocllm/tests
python -m ruff check python/xlocllm/src python/xlocllm/tests
python -m mypy python/xlocllm/src
Build the Python package:
cd python\xlocllm
python -m build
Notes
The bridge binds to loopback only. The browser window must remain open while browser-backed models are running. Without WebGPU, xlocllm exposes only the CPU/WASM-compatible Transformers.js subset and rejects heavier models before loading.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xlocllm-1.0.1.tar.gz.
File metadata
- Download URL: xlocllm-1.0.1.tar.gz
- Upload date:
- Size: 8.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a241cb7cd30d5e3e657972fb095f8c3f6b87136ae39c102b6557fce60a0984d7
|
|
| MD5 |
a8cec52d488d49f0b311a2b4ffd6fd6c
|
|
| BLAKE2b-256 |
e5b4a7d408410b68179f9120434639977fac3aea4aefa601ad25604133128cea
|
File details
Details for the file xlocllm-1.0.1-py3-none-any.whl.
File metadata
- Download URL: xlocllm-1.0.1-py3-none-any.whl
- Upload date:
- Size: 8.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef26bea934ef929baf51231fd25ff201faee26c1ff5a3dfd9894a5348af59086
|
|
| MD5 |
5ff3b92579812deba616cfe30de9b218
|
|
| BLAKE2b-256 |
8649caf4159eb931fd12af2be575eba42ab2ca8455757dd0ffde2199318e7ea0
|