Turn your PC into a private, OpenAI-compatible LLM provider in ~60 seconds
Project description
OllaBridge ⚡️
Your single gateway to ALL your LLMs — local, remote, anywhere.
Quick Start • Why OllaBridge • Distributed Compute • Examples • MCP Mode
🎯 What is OllaBridge?
One gateway. All your LLMs. Everywhere.
OllaBridge is your single, OpenAI-compatible API for every LLM you run — on your laptop, workstation, free GPU servers, cloud instances, anywhere.
The Problem: You have models running everywhere (laptop, cloud GPU, friend's gaming PC), and every app needs different configs.
OllaBridge Solution: Apps connect to ONE place. OllaBridge routes to the right compute automatically.
graph TB
A[Your Apps] -->|OpenAI SDK| B[OllaBridge<br/>Control Plane]
B -->|Auto Routes| C[Local Laptop<br/>llama3.1]
B -->|Auto Routes| D[Free GPU Cloud<br/>deepseek-r1]
B -->|Auto Routes| E[Remote Workstation<br/>mixtral]
C -.->|Dials Out| B
D -.->|Dials Out| B
E -.->|Dials Out| B
style B fill:#6366f1,stroke:#4f46e5,stroke-width:3px,color:#fff
style A fill:#10b981,stroke:#059669,stroke-width:2px,color:#fff
style C fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
style D fill:#ec4899,stroke:#db2777,stroke-width:2px,color:#fff
style E fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
Key Innovation: Compute nodes dial out to your gateway. No port forwarding, no VPN, no config hell.
🚀 Why OllaBridge?
🎯 Single Source of Truth
- ✅ One URL for everything — Your apps never change code
- ✅ Zero config — Add new GPUs without touching your app
- ✅ Smart routing — OllaBridge picks the best node automatically
- ✅ OpenAI compatible — Works with any SDK, framework, or tool
🛡️ Enterprise-Grade Security
- ✅ API key authentication — Protect your LLMs
- ✅ Rate limiting — Control usage per key
- ✅ Request logging — Full audit trail
- ✅ Encrypted connections — TLS for remote nodes
🌍 Works Everywhere
- ✅ Free GPU clouds — Colab, Kaggle, Lightning AI (no port forwarding needed!)
- ✅ Ephemeral instances — Nodes dial out, IPs don't matter
- ✅ Behind firewalls — Your laptop can join from coffee shop WiFi
- ✅ Mixed environments — Combine local + cloud seamlessly
🤖 AI Agent Ready
- ✅ MCP server — Agents can control your infrastructure
- ✅ Tool exposure — Manage nodes, routes, health via tools
- ✅ Self-healing — Auto-install, auto-configure, auto-recover
⚡ 60-Second Start
Step 1: Install
pip install ollabridge
Step 2: Start Your Gateway
ollabridge start
That's it! You'll see:
✅ Ollama installed (if needed)
✅ Model downloaded (if needed)
✅ Gateway online at http://localhost:11435
╭─────────────────── 🚀 Gateway Ready ────────────────────╮
│ │
│ ✅ OllaBridge is Online │
│ │
│ Model: deepseek-r1 │
│ Local API: http://localhost:11435/v1 │
│ Key: sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA │
│ │
│ Node join token: eyJ0eXAi... │
│ Example node command: │
│ ollabridge-node join --control http://localhost:11435 │
│ --token eyJ0eXAi... │
│ │
╰──────────────────────────────────────────────────────────╯
Step 3: Use It!
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA"
)
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Done! You're running private LLMs with the OpenAI API.
🌍 Add Any GPU in 60 Seconds
Have a free GPU on Colab? A remote workstation? Add it instantly:
On Your Remote GPU/Machine:
# Install
pip install ollabridge
# Join your gateway (copy the command from gateway startup)
ollabridge-node join \
--control http://YOUR_GATEWAY_IP:11435 \
--token eyJ0eXAi...
That's it! The remote GPU:
- ✅ Auto-installs Ollama if needed
- ✅ Auto-downloads models if needed
- ✅ Dials out to your gateway (no port forwarding!)
- ✅ Shows up as available compute
Your Apps See It Automatically
# Same code, now uses both local + remote GPU!
client = OpenAI(base_url="http://localhost:11435/v1", ...)
response = client.chat.completions.create(...) # Auto-routed
OllaBridge routes requests across all your nodes automatically.
🎯 Real-World Scenarios
Scenario 1: "I have a gaming PC at home"
# On your gaming PC:
ollabridge-node join --control https://your-gateway.com --token ...
# Now your laptop can use your gaming PC's GPU
# Even if you're at a coffee shop!
Scenario 2: "I want to use free Colab GPUs"
# In Colab notebook:
!pip install ollabridge
!ollabridge-node join --control https://your-gateway.com --token ...
# Now your production app can use free Colab compute
# Colab session ends? Start a new one. Zero config changes.
Scenario 3: "I have multiple cloud GPUs"
# Each GPU instance:
ollabridge-node join --control https://gateway.company.com --token ...
# Your team shares one API URL
# OllaBridge load-balances across all GPUs
💻 Use It Anywhere
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="your-key-here"
)
# Chat
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Embeddings
embeddings = client.embeddings.create(
model="nomic-embed-text",
input="Hello, world!"
)
Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:11435/v1",
apiKey: process.env.OLLABRIDGE_KEY
});
const completion = await client.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Hello!" }]
});
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:11435/v1",
api_key="your-key-here",
model="deepseek-r1"
)
response = llm.invoke("What is the meaning of life?")
cURL
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Authorization: Bearer your-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Works with ANY OpenAI-compatible tool or library.
🤖 AI Agents Love OllaBridge
OllaBridge has a Model Context Protocol (MCP) server built-in.
Agents can:
- ✅ Create enrollment tokens
- ✅ List connected compute nodes
- ✅ Check gateway health
- ✅ Manage your LLM infrastructure via tools
Start MCP Server
ollabridge-mcp
Example: Agent Workflow
# Agent can call these tools:
await session.call_tool("ollabridge.enroll.create", {})
# → Returns enrollment token
await session.call_tool("ollabridge.runtimes.list", {})
# → Shows all connected nodes
await session.call_tool("ollabridge.gateway.health", {})
# → Checks gateway status
Use Case: "Hey Claude, add my workstation's GPU to our LLM gateway"
→ Agent creates token, gives you the command, you run it. Done.
🔐 Security & Configuration
Authentication
OllaBridge auto-generates a secure API key on first run (saved in .env):
API_KEYS=sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA
Use it in your apps:
# Option 1: Bearer token
headers = {"Authorization": "Bearer sk-ollabridge-..."}
# Option 2: Custom header
headers = {"X-API-Key": "sk-ollabridge-..."}
Configuration (.env)
# API Keys (comma-separated for multiple)
API_KEYS=sk-ollabridge-abc123,sk-ollabridge-def456
# Server
HOST=0.0.0.0
PORT=11435
# Default models
DEFAULT_MODEL=deepseek-r1
DEFAULT_EMBED_MODEL=nomic-embed-text
# Rate limiting
RATE_LIMIT=60/minute
# Security
ENROLLMENT_SECRET=your-secret-here
ENROLLMENT_TTL_SECONDS=3600
# Database (optional)
DATABASE_URL=postgresql://user:pass@localhost/ollabridge
Enrollment Tokens
Create short-lived tokens for nodes to join:
ollabridge enroll-create --ttl 3600
Tokens expire automatically for security.
📡 API Reference
Core Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Gateway health + node count |
/v1/chat/completions |
POST | OpenAI-compatible chat |
/v1/embeddings |
POST | Generate embeddings |
/v1/models |
GET | List available models (aggregated from nodes) |
Admin Endpoints (require API key)
| Endpoint | Method | Description |
|---|---|---|
/admin/recent |
GET | Recent request logs |
/admin/runtimes |
GET | List connected nodes |
/admin/enroll |
POST | Create enrollment token |
Example: Check Connected Nodes
curl -H "X-API-Key: your-key" http://localhost:11435/admin/runtimes
Response:
{
"runtimes": [
{
"node_id": "local",
"connector": "local_ollama",
"healthy": true,
"tags": ["local"],
"models": ["deepseek-r1", "llama3.1"]
},
{
"node_id": "colab-gpu-1",
"connector": "relay_link",
"healthy": true,
"tags": ["gpu", "free"],
"models": ["mixtral", "codellama"]
}
]
}
🏗️ Architecture Deep Dive
How It Works
- Control Plane (Gateway): Your apps connect here
- Nodes: Any machine with GPUs/CPUs running models
- Relay Link: Nodes dial OUT to gateway (WebSocket)
- Router: Picks the best node for each request
Why "Dial Out" Matters
Traditional (broken):
App → Gateway → Try to reach GPU
❌ Blocked by firewall
❌ NAT issues
❌ No public IP
OllaBridge (works everywhere):
App → Gateway ← GPU dials in
✅ Works from anywhere
✅ No port forwarding
✅ Ephemeral IPs OK
Connector Types
- RelayLink: Node dials out via WebSocket (default, works everywhere)
- DirectEndpoint: HTTP to stable node (best performance)
- LocalOllama: Built-in local runtime (zero config)
OllaBridge picks the right one automatically.
📈 Scaling
Add More Workers
ollabridge start --workers 4
Use PostgreSQL
pip install psycopg2-binary
export DATABASE_URL=postgresql://user:pass@localhost/ollabridge
ollabridge start --workers 8
Add More Nodes
# Just keep adding nodes!
ollabridge-node join --control ... --token ...
OllaBridge automatically load-balances across all healthy nodes.
🌍 Public Access (Optional)
Quick Demo (Ngrok)
ollabridge start --share
Production (Cloudflare Tunnel)
# Terminal 1: Start gateway
ollabridge start
# Terminal 2: Expose it
cloudflared tunnel --url http://localhost:11435
Now your gateway has a public https:// URL!
Security: Always use API keys for public gateways.
🎓 Beginner's Guide
"I've never used LLMs before"
- Install:
pip install ollabridge - Start:
ollabridge start - Copy the API key from the output
- Use this code:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="PASTE_KEY_HERE"
)
response = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Explain Python in simple terms"}]
)
print(response.choices[0].message.content)
That's it! You're running AI models on your computer.
"I want to add my gaming PC's GPU"
-
On your main computer (gateway):
ollabridge start # Copy the "Node join token" and gateway URL
-
On your gaming PC:
pip install ollabridge ollabridge-node join --control http://GATEWAY_IP:11435 --token TOKEN_HERE
-
Done! Your apps can now use your gaming PC's power.
"I want to use free Colab GPUs"
-
Start your gateway at home:
ollabridge start --share # Note the public URL (https://xxx.ngrok.io)
-
In Colab notebook:
!pip install ollabridge !ollabridge-node join --control https://xxx.ngrok.io --token YOUR_TOKEN
-
Now your apps use FREE Colab GPUs!
Pro tip: When Colab disconnects, just restart and run step 2 again. Zero config changes needed.
🛠️ Common Tasks
List Available Models
curl http://localhost:11435/v1/models
Check Gateway Health
curl http://localhost:11435/health
See Connected Nodes
curl -H "X-API-Key: your-key" http://localhost:11435/admin/runtimes
Create New Enrollment Token
ollabridge enroll-create
View Recent Requests
curl -H "X-API-Key: your-key" http://localhost:11435/admin/recent
🗺️ Roadmap
- ✅ Control Plane + Node architecture
- ✅ Outbound-only node enrollment (no port forwarding)
- ✅ MCP server for AI agent control
- ✅ Multi-node load balancing
- 🚧 Tag-based routing (send "coding" requests to GPU nodes)
- 🚧 Model-specific routing rules
- 🚧 Streaming support for chat completions
- 🚧 Web UI for node management
- 🚧 Prometheus metrics
- 🚧 Support for more runtimes (vLLM, llama.cpp, LM Studio)
🤝 Contributing
We welcome contributions! Areas we'd love help:
- 🔌 More runtime adapters (vLLM, llama.cpp, etc.)
- 🎨 Web UI for management
- 📊 Better monitoring/metrics
- 🔒 Security enhancements
- 📖 Documentation improvements
How to contribute:
- Fork the repo
- Create a branch (
git checkout -b feature/amazing) - Make your changes
- Add tests
- Submit a PR
📄 License
MIT License - see LICENSE
🙏 Built With
- FastAPI — Modern async web framework
- Ollama — Run LLMs locally
- WebSockets — Real-time node connections
- SQLModel — Database with Python types
💬 Support
🌟 Star History
If OllaBridge helped you, give it a star! ⭐
Made with ❤️ for the local-first AI community
Stop paying cloud tokens. Use your own compute.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollabridge-0.1.0.tar.gz.
File metadata
- Download URL: ollabridge-0.1.0.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13d8b8c88b1e5751edd310659f931f3b3540568babc6269e76215295c5d2d59f
|
|
| MD5 |
242ebceee90a346b8dbb69e391d52959
|
|
| BLAKE2b-256 |
74053133285125dc845599a2804313ec57cf82b27c7f1c3e2256c09f6852447f
|
Provenance
The following attestation bundles were made for ollabridge-0.1.0.tar.gz:
Publisher:
publish.yml on ruslanmv/ollabridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ollabridge-0.1.0.tar.gz -
Subject digest:
13d8b8c88b1e5751edd310659f931f3b3540568babc6269e76215295c5d2d59f - Sigstore transparency entry: 781143792
- Sigstore integration time:
-
Permalink:
ruslanmv/ollabridge@7a69d8599a1a35c368dbe762613b2a2da3b3711c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ruslanmv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a69d8599a1a35c368dbe762613b2a2da3b3711c -
Trigger Event:
release
-
Statement type:
File details
Details for the file ollabridge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ollabridge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce19e0d5bf988dfedf31036b6f4cc31c75ec4f158358f90d4df22344e7770037
|
|
| MD5 |
34cd16b8ca4a1764c3fe8aab2a429ef0
|
|
| BLAKE2b-256 |
c1854c13445adf144cc59da820bc236b46f197b32d4df9338ef9d42e57f3b290
|
Provenance
The following attestation bundles were made for ollabridge-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ruslanmv/ollabridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ollabridge-0.1.0-py3-none-any.whl -
Subject digest:
ce19e0d5bf988dfedf31036b6f4cc31c75ec4f158358f90d4df22344e7770037 - Sigstore transparency entry: 781143793
- Sigstore integration time:
-
Permalink:
ruslanmv/ollabridge@7a69d8599a1a35c368dbe762613b2a2da3b3711c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ruslanmv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a69d8599a1a35c368dbe762613b2a2da3b3711c -
Trigger Event:
release
-
Statement type: