MatrixLLM: OpenAI-compatible multi-provider LLM router (OpenRouter-style) with optional relay nodes
Project description
MatrixLLM
OpenAI-compatible multi-provider LLM router with optional relay nodes.
Quick Start | How It Works | Multi-Provider Routing | OllaBridge Compatible
What is MatrixLLM?
MatrixLLM turns your computer into a private OpenAI-compatible API server.
Instead of sending your data to OpenAI, you can:
- Run AI models locally on your own computer
- Connect to multiple providers (OpenAI, Anthropic, Google, IBM) through one API
- Use the same code you'd use with OpenAI
Your App (uses OpenAI SDK)
|
v
+------------------+
| MatrixLLM | <-- Runs on localhost:11435
+------------------+
/ | \
v v v
Ollama OpenAI Anthropic (etc.)
(local) (cloud) (cloud)
Quick Start for Beginners
What You Need
- Python 3.10 or newer (Download Python)
- 5 minutes of your time
Step 1: Install MatrixLLM
Open your terminal (Command Prompt on Windows, Terminal on Mac/Linux) and run:
pip install matrixllm
Step 2: Start the Server
matrixllm start
That's it! You'll see something like this:
╭─────────────────── Gateway Ready ───────────────────╮
│ │
│ ✅ MatrixLLM is Online │
│ │
│ Model: deepseek-r1 │
│ Local API: http://localhost:11435/v1 │
│ Health: http://localhost:11435/health │
│ Key: sk-matrixllm-xY9kL2mN8pQ4rT6v │
│ │
│ Ollabridge compatible: │
│ OLLAS_BASE_URL=http://localhost:11435/v1 │
│ OLLAS_API_KEY=sk-matrixllm-xY9kL2mN8pQ4rT6v │
│ OLLAS_MODEL=deepseek-r1 │
│ │
╰──────────────────────────────────────────────────────╯
Important: Copy the API key shown (starts with sk-matrixllm-). You'll need it!
Step 3: Use It in Your Code
from openai import OpenAI
# Connect to your local MatrixLLM server
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="sk-matrixllm-YOUR-KEY-HERE" # Use the key from Step 2
)
# Send a message to the AI
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hello! What can you do?"}]
)
# Print the AI's response
print(response.choices[0].message.content)
Step 4: Test with curl (Optional)
You can also test from the command line:
curl http://localhost:11435/v1/chat/completions \
-H "Authorization: Bearer sk-matrixllm-YOUR-KEY-HERE" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Hello!"}]
}'
How It Works
API Keys Explained
When you run matrixllm start, it automatically generates a secure API key:
sk-matrixllm-xY9kL2mN8pQ4rT6vW1zA...
What does this mean?
sk-= "secret key" (standard prefix)matrixllm-= identifies this as a MatrixLLM keyxY9kL2mN...= random secure characters
You can set your own key by creating a .env file:
API_KEYS=my-custom-api-key
Or use multiple keys (comma-separated):
API_KEYS=key-for-app-1,key-for-app-2,key-for-testing
Authentication Methods
MatrixLLM accepts API keys in two ways:
# Method 1: Authorization header (recommended)
headers = {"Authorization": "Bearer sk-matrixllm-xxx"}
# Method 2: X-API-Key header
headers = {"X-API-Key": "sk-matrixllm-xxx"}
Both work identically. The OpenAI SDK uses Method 1 automatically.
OllaBridge Compatibility
MatrixLLM is fully compatible with OllaBridge. Both projects share the same API interface, making it easy to switch between them or migrate your applications.
When to Use Each
| Feature | OllaBridge | MatrixLLM |
|---|---|---|
| Use Case | Simple local-only proxy | Multi-provider enterprise router |
| Ollama Support | Local only | Local + distributed nodes |
| Cloud Providers | No | OpenAI, Anthropic, Google, IBM |
| Distributed Compute | No | Yes (relay nodes) |
| Complexity | Minimal | Full-featured |
Choose OllaBridge when:
- You only need local Ollama models
- You want a lightweight, simple setup
- You don't need cloud provider integration
Choose MatrixLLM when:
- You need multi-provider routing (OpenAI, Anthropic, Google, IBM)
- You want distributed compute across multiple machines
- You need enterprise features like load balancing and failover
Shared Configuration
Both projects use the same:
- Port:
11435 - API structure:
/v1/chat/completions,/v1/embeddings,/v1/models - Environment variables:
API_KEYS,OLLAMA_BASE_URL,DEFAULT_MODEL
Using OLLAS_* Environment Variables
MatrixLLM supports OllaBridge-style environment variables for seamless migration:
| Variable | Description | Example |
|---|---|---|
OLLAS_API_KEY |
API key (alias for API_KEYS) |
sk-matrixllm-xxx |
OLLAS_BASE_URL |
Server URL | http://localhost:11435/v1 |
OLLAS_MODEL |
Default model (alias for DEFAULT_MODEL) |
deepseek-r1 |
Example .env file:
# OllaBridge-compatible configuration
OLLAS_API_KEY=sk-matrixllm-your-key-here
OLLAS_BASE_URL=http://localhost:11435/v1
OLLAS_MODEL=deepseek-r1
Example Python code:
import os
from openai import OpenAI
# Works with both MatrixLLM and OllaBridge
client = OpenAI(
base_url=os.getenv("OLLAS_BASE_URL", "http://localhost:11435/v1"),
api_key=os.getenv("OLLAS_API_KEY", "your-key-here"),
)
response = client.chat.completions.create(
model=os.getenv("OLLAS_MODEL", "deepseek-r1"),
messages=[{"role": "user", "content": "Hello!"}]
)
Migration Path
Switch between OllaBridge and MatrixLLM without changing your application code:
# Start with OllaBridge (simple local setup)
pip install ollabridge
ollabridge start
# Upgrade to MatrixLLM (when you need more features)
pip install matrixllm
matrixllm start
Your application code stays exactly the same!
Multi-Provider Routing
MatrixLLM can route requests to different AI providers based on the model name.
Supported Providers
| Provider | Model Prefix | Example |
|---|---|---|
| Local Ollama | (no prefix) | deepseek-r1, llama3 |
| OpenAI | openai/ |
openai/gpt-4o-mini |
| Anthropic | anthropic/ |
anthropic/claude-3-5-sonnet-latest |
| Google Gemini | google/ |
google/gemini-1.5-pro |
| IBM watsonx | ibm/ |
ibm/granite-3-8b-instruct |
Setup Multi-Provider
Create a .env file with your API keys:
# Your MatrixLLM API key
API_KEYS=my-secure-key
# OpenAI (optional)
OPENAI_COMPAT_BASE_URL=https://api.openai.com/v1
OPENAI_COMPAT_API_KEY=sk-...
# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...
# Google Gemini (optional)
GEMINI_API_KEY=AIza...
# IBM watsonx (optional)
WATSONX_BASE_URL=https://us-south.ml.cloud.ibm.com
WATSONX_API_KEY=...
WATSONX_PROJECT_ID=...
Use Different Providers
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="my-secure-key"
)
# Use local Ollama (default)
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hello!"}]
)
# Use OpenAI
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
# Use Anthropic Claude
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-latest",
messages=[{"role": "user", "content": "Hello!"}]
)
# Use Google Gemini
response = client.chat.completions.create(
model="google/gemini-1.5-pro",
messages=[{"role": "user", "content": "Hello!"}]
)
Distributed Compute (Relay Nodes)
Add GPUs from anywhere without port forwarding. Nodes dial out to your gateway.
On Your Gateway (Control Plane)
matrixllm start
# Note the enrollment token shown
On Remote GPU/Machine (Node)
pip install matrixllm
matrixllm-node join \
--control http://YOUR_GATEWAY_IP:11435 \
--token YOUR_ENROLLMENT_TOKEN
Use Cases
- Gaming PC at home: Join your gateway from anywhere
- Free Colab/Kaggle GPUs: No port forwarding needed
- Cloud instances: Auto load balancing across nodes
API Reference
Endpoints
| Endpoint | Method | Auth Required | Description |
|---|---|---|---|
/health |
GET | No | Check if server is running |
/v1/models |
GET | Yes | List available models |
/v1/chat/completions |
POST | Yes | Generate chat responses |
/v1/embeddings |
POST | Yes | Generate text embeddings |
Quick Examples
# Check health (no auth needed)
curl http://localhost:11435/health
# List models
curl -H "Authorization: Bearer YOUR-KEY" \
http://localhost:11435/v1/models
# Chat completion
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Authorization: Bearer YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-r1", "messages": [{"role": "user", "content": "Hi!"}]}'
CLI Commands
# Start the server
matrixllm start
# Start with options
matrixllm start --port 8080 --model llama3
# Show LAN URLs (for other devices on your network)
matrixllm start --lan
# Create public URL (via ngrok)
matrixllm start --share
# Check system health
matrixllm doctor
# List available models
matrixllm models --api-key YOUR-KEY
# Test chat
matrixllm test-chat "Hello!" --api-key YOUR-KEY
Configuration Reference
All Environment Variables
# === Server ===
PORT=11435 # Server port
HOST=0.0.0.0 # Bind address
CORS_ORIGINS=http://localhost:5173,http://localhost:3000
# === Authentication ===
API_KEYS=dev-key-change-me # Comma-separated API keys
# === Rate Limiting ===
RATE_LIMIT=60/minute # Requests per minute
# === Local Ollama ===
OLLAMA_BASE_URL=http://localhost:11434
DEFAULT_MODEL=deepseek-r1
DEFAULT_EMBED_MODEL=nomic-embed-text
# === Routing ===
ROUTING_MODE=prefix # prefix | fallback
# === Multi-Provider (Optional) ===
OPENAI_COMPAT_BASE_URL=https://api.openai.com/v1
OPENAI_COMPAT_API_KEY=
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
WATSONX_BASE_URL=https://us-south.ml.cloud.ibm.com
WATSONX_API_KEY=
WATSONX_PROJECT_ID=
# === Relay Fabric ===
RELAY_ENABLED=true
ENROLLMENT_SECRET=dev-enroll-change-me
LOCAL_RUNTIME_ENABLED=true
# === OllaBridge Compatibility ===
# OLLAS_API_KEY= # Alias for API_KEYS
# OLLAS_BASE_URL= # Client base URL
# OLLAS_MODEL= # Alias for DEFAULT_MODEL
Troubleshooting
"Connection refused" error
Make sure the server is running:
matrixllm start
"Invalid API key" error
Check that you're using the correct key:
# The key is shown when you start the server
matrixllm start
# Look for: Key: sk-matrixllm-xxxxx
"Model not found" error
-
Check available models:
curl -H "Authorization: Bearer YOUR-KEY" http://localhost:11435/v1/models
-
For local models, make sure Ollama is running:
ollama list
Server won't start
Check if another service is using port 11435:
# Use a different port
matrixllm start --port 8080
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
make test
# Format code
make format
# Type check
make typecheck
License
Apache License 2.0 - see LICENSE
Built With
- FastAPI - Async web framework
- httpx - Async HTTP client
- Ollama - Local LLM runtime
- Pydantic - Data validation
MatrixLLM - Your unified gateway to all LLM providers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file matrixllm-0.1.0.tar.gz.
File metadata
- Download URL: matrixllm-0.1.0.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6cfebad43ef25629e1aa8527758a696766bc10ba2cb6ca76890b5840dd48b24
|
|
| MD5 |
bd43454573e6aad97f150a8c6f4fec1c
|
|
| BLAKE2b-256 |
15321d4bf5a710c55ceec9cb709e3252041d80335b64049fc23a5dad036a51c1
|
Provenance
The following attestation bundles were made for matrixllm-0.1.0.tar.gz:
Publisher:
publish.yml on agent-matrix/matrix-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
matrixllm-0.1.0.tar.gz -
Subject digest:
b6cfebad43ef25629e1aa8527758a696766bc10ba2cb6ca76890b5840dd48b24 - Sigstore transparency entry: 844833401
- Sigstore integration time:
-
Permalink:
agent-matrix/matrix-llm@5daa4debd45b3522f5f4c514cfb3270c58cb849a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/agent-matrix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5daa4debd45b3522f5f4c514cfb3270c58cb849a -
Trigger Event:
release
-
Statement type:
File details
Details for the file matrixllm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: matrixllm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d86386056ca53785885e6508db5d3861ebc6f1a3f62ae2871031ecbeaf6cedbc
|
|
| MD5 |
7f53169f8fa3d6551e160ca332687863
|
|
| BLAKE2b-256 |
8c11d94b2cc1dbd4b2465d33a21cc7c2b72132481af8c8329d813a511bd67f66
|
Provenance
The following attestation bundles were made for matrixllm-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on agent-matrix/matrix-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
matrixllm-0.1.0-py3-none-any.whl -
Subject digest:
d86386056ca53785885e6508db5d3861ebc6f1a3f62ae2871031ecbeaf6cedbc - Sigstore transparency entry: 844833403
- Sigstore integration time:
-
Permalink:
agent-matrix/matrix-llm@5daa4debd45b3522f5f4c514cfb3270c58cb849a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/agent-matrix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5daa4debd45b3522f5f4c514cfb3270c58cb849a -
Trigger Event:
release
-
Statement type: