Skip to main content

MatrixLLM: OpenAI-compatible multi-provider LLM router (OpenRouter-style) with optional relay nodes

Project description

MatrixLLM Logo

MatrixLLM

OpenAI-compatible multi-provider LLM router with optional relay nodes.

PyPI version Python 3.10+ License: Apache 2.0 Code style: black

Quick Start | How It Works | Multi-Provider Routing | OllaBridge Compatible


What is MatrixLLM?

MatrixLLM turns your computer into a private OpenAI-compatible API server.

Instead of sending your data to OpenAI, you can:

  • Run AI models locally on your own computer
  • Connect to multiple providers (OpenAI, Anthropic, Google, IBM) through one API
  • Use the same code you'd use with OpenAI
Your App (uses OpenAI SDK)
         |
         v
+------------------+
|    MatrixLLM     |  <-- Runs on localhost:11435
+------------------+
    /     |     \
   v      v      v
Ollama  OpenAI  Anthropic  (etc.)
(local) (cloud) (cloud)

Quick Start for Beginners

What You Need

Step 1: Install MatrixLLM

Open your terminal (Command Prompt on Windows, Terminal on Mac/Linux) and run:

pip install matrixllm

Step 2: Start the Server

matrixllm start

That's it! You'll see something like this:

╭─────────────────── Gateway Ready ───────────────────╮
│                                                      │
│ ✅ MatrixLLM is Online                              │
│                                                      │
│ Model:        deepseek-r1                           │
│ Local API:    http://localhost:11435/v1             │
│ Health:       http://localhost:11435/health         │
│ Key:          sk-matrixllm-xY9kL2mN8pQ4rT6v        │
│                                                      │
│ Ollabridge compatible:                              │
│   OLLAS_BASE_URL=http://localhost:11435/v1          │
│   OLLAS_API_KEY=sk-matrixllm-xY9kL2mN8pQ4rT6v      │
│   OLLAS_MODEL=deepseek-r1                           │
│                                                      │
╰──────────────────────────────────────────────────────╯

Important: Copy the API key shown (starts with sk-matrixllm-). You'll need it!

Step 3: Use It in Your Code

from openai import OpenAI

# Connect to your local MatrixLLM server
client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="sk-matrixllm-YOUR-KEY-HERE"  # Use the key from Step 2
)

# Send a message to the AI
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello! What can you do?"}]
)

# Print the AI's response
print(response.choices[0].message.content)

Step 4: Test with curl (Optional)

You can also test from the command line:

curl http://localhost:11435/v1/chat/completions \
  -H "Authorization: Bearer sk-matrixllm-YOUR-KEY-HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

How It Works

API Keys Explained

When you run matrixllm start, it automatically generates a secure API key:

sk-matrixllm-xY9kL2mN8pQ4rT6vW1zA...

What does this mean?

  • sk- = "secret key" (standard prefix)
  • matrixllm- = identifies this as a MatrixLLM key
  • xY9kL2mN... = random secure characters

You can set your own key by creating a .env file:

API_KEYS=my-custom-api-key

Or use multiple keys (comma-separated):

API_KEYS=key-for-app-1,key-for-app-2,key-for-testing

Authentication Methods

MatrixLLM accepts API keys in two ways:

# Method 1: Authorization header (recommended)
headers = {"Authorization": "Bearer sk-matrixllm-xxx"}

# Method 2: X-API-Key header
headers = {"X-API-Key": "sk-matrixllm-xxx"}

Both work identically. The OpenAI SDK uses Method 1 automatically.


OllaBridge Compatibility

MatrixLLM is fully compatible with OllaBridge. Both projects share the same API interface, making it easy to switch between them or migrate your applications.

When to Use Each

Feature OllaBridge MatrixLLM
Use Case Simple local-only proxy Multi-provider enterprise router
Ollama Support Local only Local + distributed nodes
Cloud Providers No OpenAI, Anthropic, Google, IBM
Distributed Compute No Yes (relay nodes)
Complexity Minimal Full-featured

Choose OllaBridge when:

  • You only need local Ollama models
  • You want a lightweight, simple setup
  • You don't need cloud provider integration

Choose MatrixLLM when:

  • You need multi-provider routing (OpenAI, Anthropic, Google, IBM)
  • You want distributed compute across multiple machines
  • You need enterprise features like load balancing and failover

Shared Configuration

Both projects use the same:

  • Port: 11435
  • API structure: /v1/chat/completions, /v1/embeddings, /v1/models
  • Environment variables: API_KEYS, OLLAMA_BASE_URL, DEFAULT_MODEL

Using OLLAS_* Environment Variables

MatrixLLM supports OllaBridge-style environment variables for seamless migration:

Variable Description Example
OLLAS_API_KEY API key (alias for API_KEYS) sk-matrixllm-xxx
OLLAS_BASE_URL Server URL http://localhost:11435/v1
OLLAS_MODEL Default model (alias for DEFAULT_MODEL) deepseek-r1

Example .env file:

# OllaBridge-compatible configuration
OLLAS_API_KEY=sk-matrixllm-your-key-here
OLLAS_BASE_URL=http://localhost:11435/v1
OLLAS_MODEL=deepseek-r1

Example Python code:

import os
from openai import OpenAI

# Works with both MatrixLLM and OllaBridge
client = OpenAI(
    base_url=os.getenv("OLLAS_BASE_URL", "http://localhost:11435/v1"),
    api_key=os.getenv("OLLAS_API_KEY", "your-key-here"),
)

response = client.chat.completions.create(
    model=os.getenv("OLLAS_MODEL", "deepseek-r1"),
    messages=[{"role": "user", "content": "Hello!"}]
)

Migration Path

Switch between OllaBridge and MatrixLLM without changing your application code:

# Start with OllaBridge (simple local setup)
pip install ollabridge
ollabridge start

# Upgrade to MatrixLLM (when you need more features)
pip install matrixllm
matrixllm start

Your application code stays exactly the same!


Multi-Provider Routing

MatrixLLM can route requests to different AI providers based on the model name.

Supported Providers

Provider Model Prefix Example
Local Ollama (no prefix) deepseek-r1, llama3
OpenAI openai/ openai/gpt-4o-mini
Anthropic anthropic/ anthropic/claude-3-5-sonnet-latest
Google Gemini google/ google/gemini-1.5-pro
IBM watsonx ibm/ ibm/granite-3-8b-instruct

Setup Multi-Provider

Create a .env file with your API keys:

# Your MatrixLLM API key
API_KEYS=my-secure-key

# OpenAI (optional)
OPENAI_COMPAT_BASE_URL=https://api.openai.com/v1
OPENAI_COMPAT_API_KEY=sk-...

# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini (optional)
GEMINI_API_KEY=AIza...

# IBM watsonx (optional)
WATSONX_BASE_URL=https://us-south.ml.cloud.ibm.com
WATSONX_API_KEY=...
WATSONX_PROJECT_ID=...

Use Different Providers

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="my-secure-key"
)

# Use local Ollama (default)
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Use OpenAI
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Use Anthropic Claude
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Use Google Gemini
response = client.chat.completions.create(
    model="google/gemini-1.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)

Distributed Compute (Relay Nodes)

Add GPUs from anywhere without port forwarding. Nodes dial out to your gateway.

On Your Gateway (Control Plane)

matrixllm start
# Note the enrollment token shown

On Remote GPU/Machine (Node)

pip install matrixllm

matrixllm-node join \
  --control http://YOUR_GATEWAY_IP:11435 \
  --token YOUR_ENROLLMENT_TOKEN

Use Cases

  • Gaming PC at home: Join your gateway from anywhere
  • Free Colab/Kaggle GPUs: No port forwarding needed
  • Cloud instances: Auto load balancing across nodes

API Reference

Endpoints

Endpoint Method Auth Required Description
/health GET No Check if server is running
/v1/models GET Yes List available models
/v1/chat/completions POST Yes Generate chat responses
/v1/embeddings POST Yes Generate text embeddings

Quick Examples

# Check health (no auth needed)
curl http://localhost:11435/health

# List models
curl -H "Authorization: Bearer YOUR-KEY" \
  http://localhost:11435/v1/models

# Chat completion
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Authorization: Bearer YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-r1", "messages": [{"role": "user", "content": "Hi!"}]}'

CLI Commands

# Start the server
matrixllm start

# Start with options
matrixllm start --port 8080 --model llama3

# Show LAN URLs (for other devices on your network)
matrixllm start --lan

# Create public URL (via ngrok)
matrixllm start --share

# Check system health
matrixllm doctor

# List available models
matrixllm models --api-key YOUR-KEY

# Test chat
matrixllm test-chat "Hello!" --api-key YOUR-KEY

Configuration Reference

All Environment Variables

# === Server ===
PORT=11435                    # Server port
HOST=0.0.0.0                  # Bind address
CORS_ORIGINS=http://localhost:5173,http://localhost:3000

# === Authentication ===
API_KEYS=dev-key-change-me    # Comma-separated API keys

# === Rate Limiting ===
RATE_LIMIT=60/minute          # Requests per minute

# === Local Ollama ===
OLLAMA_BASE_URL=http://localhost:11434
DEFAULT_MODEL=deepseek-r1
DEFAULT_EMBED_MODEL=nomic-embed-text

# === Routing ===
ROUTING_MODE=prefix           # prefix | fallback

# === Multi-Provider (Optional) ===
OPENAI_COMPAT_BASE_URL=https://api.openai.com/v1
OPENAI_COMPAT_API_KEY=

ANTHROPIC_API_KEY=

GEMINI_API_KEY=

WATSONX_BASE_URL=https://us-south.ml.cloud.ibm.com
WATSONX_API_KEY=
WATSONX_PROJECT_ID=

# === Relay Fabric ===
RELAY_ENABLED=true
ENROLLMENT_SECRET=dev-enroll-change-me
LOCAL_RUNTIME_ENABLED=true

# === OllaBridge Compatibility ===
# OLLAS_API_KEY=              # Alias for API_KEYS
# OLLAS_BASE_URL=             # Client base URL
# OLLAS_MODEL=                # Alias for DEFAULT_MODEL

Troubleshooting

"Connection refused" error

Make sure the server is running:

matrixllm start

"Invalid API key" error

Check that you're using the correct key:

# The key is shown when you start the server
matrixllm start
# Look for: Key: sk-matrixllm-xxxxx

"Model not found" error

  1. Check available models:

    curl -H "Authorization: Bearer YOUR-KEY" http://localhost:11435/v1/models
    
  2. For local models, make sure Ollama is running:

    ollama list
    

Server won't start

Check if another service is using port 11435:

# Use a different port
matrixllm start --port 8080

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
make test

# Format code
make format

# Type check
make typecheck

License

Apache License 2.0 - see LICENSE


Built With


MatrixLLM - Your unified gateway to all LLM providers

Report Bug | Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matrixllm-0.1.0.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matrixllm-0.1.0-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file matrixllm-0.1.0.tar.gz.

File metadata

  • Download URL: matrixllm-0.1.0.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for matrixllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b6cfebad43ef25629e1aa8527758a696766bc10ba2cb6ca76890b5840dd48b24
MD5 bd43454573e6aad97f150a8c6f4fec1c
BLAKE2b-256 15321d4bf5a710c55ceec9cb709e3252041d80335b64049fc23a5dad036a51c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for matrixllm-0.1.0.tar.gz:

Publisher: publish.yml on agent-matrix/matrix-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file matrixllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: matrixllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for matrixllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d86386056ca53785885e6508db5d3861ebc6f1a3f62ae2871031ecbeaf6cedbc
MD5 7f53169f8fa3d6551e160ca332687863
BLAKE2b-256 8c11d94b2cc1dbd4b2465d33a21cc7c2b72132481af8c8329d813a511bd67f66

See more details on using hashes here.

Provenance

The following attestation bundles were made for matrixllm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on agent-matrix/matrix-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page