Skip to main content

Turn your PC into a private, OpenAI-compatible LLM provider in ~60 seconds

Project description

OllaBridge Logo

OllaBridge ⚡️

Your single gateway to ALL your LLMs — local, remote, anywhere.

PyPI version Python 3.10+ License: MIT Code style: black

Quick StartWhy OllaBridgeDistributed ComputeExamplesMCP Mode


🎯 What is OllaBridge?

One gateway. All your LLMs. Everywhere.

OllaBridge is your single, OpenAI-compatible API for every LLM you run — on your laptop, workstation, free GPU servers, cloud instances, anywhere.

The Problem: You have models running everywhere (laptop, cloud GPU, friend's gaming PC), and every app needs different configs.

OllaBridge Solution: Apps connect to ONE place. OllaBridge routes to the right compute automatically.

graph TB
    A[Your Apps] -->|OpenAI SDK| B[OllaBridge<br/>Control Plane]

    B -->|Auto Routes| C[Local Laptop<br/>llama3.1]
    B -->|Auto Routes| D[Free GPU Cloud<br/>deepseek-r1]
    B -->|Auto Routes| E[Remote Workstation<br/>mixtral]

    C -.->|Dials Out| B
    D -.->|Dials Out| B
    E -.->|Dials Out| B

    style B fill:#6366f1,stroke:#4f46e5,stroke-width:3px,color:#fff
    style A fill:#10b981,stroke:#059669,stroke-width:2px,color:#fff
    style C fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
    style D fill:#ec4899,stroke:#db2777,stroke-width:2px,color:#fff
    style E fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff

Key Innovation: Compute nodes dial out to your gateway. No port forwarding, no VPN, no config hell.


🚀 Why OllaBridge?

🎯 Single Source of Truth

  • One URL for everything — Your apps never change code
  • Zero config — Add new GPUs without touching your app
  • Smart routing — OllaBridge picks the best node automatically
  • OpenAI compatible — Works with any SDK, framework, or tool

🛡️ Enterprise-Grade Security

  • API key authentication — Protect your LLMs
  • Rate limiting — Control usage per key
  • Request logging — Full audit trail
  • Encrypted connections — TLS for remote nodes

🌍 Works Everywhere

  • Free GPU clouds — Colab, Kaggle, Lightning AI (no port forwarding needed!)
  • Ephemeral instances — Nodes dial out, IPs don't matter
  • Behind firewalls — Your laptop can join from coffee shop WiFi
  • Mixed environments — Combine local + cloud seamlessly

🤖 AI Agent Ready

  • MCP server — Agents can control your infrastructure
  • Tool exposure — Manage nodes, routes, health via tools
  • Self-healing — Auto-install, auto-configure, auto-recover

⚡ 60-Second Start

Step 1: Install

pip install ollabridge

Step 2: Start Your Gateway

ollabridge start

That's it! You'll see:

✅ Ollama installed (if needed)
✅ Model downloaded (if needed)
✅ Gateway online at http://localhost:11435

╭─────────────────── 🚀 Gateway Ready ────────────────────╮
│                                                          │
│ ✅ OllaBridge is Online                                  │
│                                                          │
│ Model:        deepseek-r1                                │
│ Local API:    http://localhost:11435/v1                 │
│ Key:          sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA        │
│                                                          │
│ Node join token:  eyJ0eXAi...                           │
│ Example node command:                                    │
│   ollabridge-node join --control http://localhost:11435 │
│                        --token eyJ0eXAi...              │
│                                                          │
╰──────────────────────────────────────────────────────────╯

Step 3: Use It!

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA"
)

response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Done! You're running private LLMs with the OpenAI API.


🌍 Add Any GPU in 60 Seconds

Have a free GPU on Colab? A remote workstation? Add it instantly:

On Your Remote GPU/Machine:

# Install
pip install ollabridge

# Join your gateway (copy the command from gateway startup)
ollabridge-node join \
  --control http://YOUR_GATEWAY_IP:11435 \
  --token eyJ0eXAi...

That's it! The remote GPU:

  • ✅ Auto-installs Ollama if needed
  • ✅ Auto-downloads models if needed
  • Dials out to your gateway (no port forwarding!)
  • ✅ Shows up as available compute

Your Apps See It Automatically

# Same code, now uses both local + remote GPU!
client = OpenAI(base_url="http://localhost:11435/v1", ...)
response = client.chat.completions.create(...)  # Auto-routed

OllaBridge routes requests across all your nodes automatically.


🎯 Real-World Scenarios

Scenario 1: "I have a gaming PC at home"

# On your gaming PC:
ollabridge-node join --control https://your-gateway.com --token ...

# Now your laptop can use your gaming PC's GPU
# Even if you're at a coffee shop!

Scenario 2: "I want to use free Colab GPUs"

# In Colab notebook:
!pip install ollabridge
!ollabridge-node join --control https://your-gateway.com --token ...

# Now your production app can use free Colab compute
# Colab session ends? Start a new one. Zero config changes.

Scenario 3: "I have multiple cloud GPUs"

# Each GPU instance:
ollabridge-node join --control https://gateway.company.com --token ...

# Your team shares one API URL
# OllaBridge load-balances across all GPUs

💻 Use It Anywhere

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="your-key-here"
)

# Chat
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Embeddings
embeddings = client.embeddings.create(
    model="nomic-embed-text",
    input="Hello, world!"
)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11435/v1",
  apiKey: process.env.OLLABRIDGE_KEY
});

const completion = await client.chat.completions.create({
  model: "deepseek-r1",
  messages: [{ role: "user", content: "Hello!" }]
});

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:11435/v1",
    api_key="your-key-here",
    model="deepseek-r1"
)

response = llm.invoke("What is the meaning of life?")

cURL

curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Authorization: Bearer your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Works with ANY OpenAI-compatible tool or library.


🤖 AI Agents Love OllaBridge

OllaBridge has a Model Context Protocol (MCP) server built-in.

Agents can:

  • ✅ Create enrollment tokens
  • ✅ List connected compute nodes
  • ✅ Check gateway health
  • ✅ Manage your LLM infrastructure via tools

Start MCP Server

ollabridge-mcp

Example: Agent Workflow

# Agent can call these tools:
await session.call_tool("ollabridge.enroll.create", {})
# → Returns enrollment token

await session.call_tool("ollabridge.runtimes.list", {})
# → Shows all connected nodes

await session.call_tool("ollabridge.gateway.health", {})
# → Checks gateway status

Use Case: "Hey Claude, add my workstation's GPU to our LLM gateway"

→ Agent creates token, gives you the command, you run it. Done.


🔐 Security & Configuration

Authentication

OllaBridge auto-generates a secure API key on first run (saved in .env):

API_KEYS=sk-ollabridge-xY9kL2mN8pQ4rT6vW1zA

Use it in your apps:

# Option 1: Bearer token
headers = {"Authorization": "Bearer sk-ollabridge-..."}

# Option 2: Custom header
headers = {"X-API-Key": "sk-ollabridge-..."}

Configuration (.env)

# API Keys (comma-separated for multiple)
API_KEYS=sk-ollabridge-abc123,sk-ollabridge-def456

# Server
HOST=0.0.0.0
PORT=11435

# Default models
DEFAULT_MODEL=deepseek-r1
DEFAULT_EMBED_MODEL=nomic-embed-text

# Rate limiting
RATE_LIMIT=60/minute

# Security
ENROLLMENT_SECRET=your-secret-here
ENROLLMENT_TTL_SECONDS=3600

# Database (optional)
DATABASE_URL=postgresql://user:pass@localhost/ollabridge

Enrollment Tokens

Create short-lived tokens for nodes to join:

ollabridge enroll-create --ttl 3600

Tokens expire automatically for security.


📡 API Reference

Core Endpoints

Endpoint Method Description
/health GET Gateway health + node count
/v1/chat/completions POST OpenAI-compatible chat
/v1/embeddings POST Generate embeddings
/v1/models GET List available models (aggregated from nodes)

Admin Endpoints (require API key)

Endpoint Method Description
/admin/recent GET Recent request logs
/admin/runtimes GET List connected nodes
/admin/enroll POST Create enrollment token

Example: Check Connected Nodes

curl -H "X-API-Key: your-key" http://localhost:11435/admin/runtimes

Response:

{
  "runtimes": [
    {
      "node_id": "local",
      "connector": "local_ollama",
      "healthy": true,
      "tags": ["local"],
      "models": ["deepseek-r1", "llama3.1"]
    },
    {
      "node_id": "colab-gpu-1",
      "connector": "relay_link",
      "healthy": true,
      "tags": ["gpu", "free"],
      "models": ["mixtral", "codellama"]
    }
  ]
}

🏗️ Architecture Deep Dive

How It Works

  1. Control Plane (Gateway): Your apps connect here
  2. Nodes: Any machine with GPUs/CPUs running models
  3. Relay Link: Nodes dial OUT to gateway (WebSocket)
  4. Router: Picks the best node for each request

Why "Dial Out" Matters

Traditional (broken):

App → Gateway → Try to reach GPU
                ❌ Blocked by firewall
                ❌ NAT issues
                ❌ No public IP

OllaBridge (works everywhere):

App → Gateway ← GPU dials in
               ✅ Works from anywhere
               ✅ No port forwarding
               ✅ Ephemeral IPs OK

Connector Types

  • RelayLink: Node dials out via WebSocket (default, works everywhere)
  • DirectEndpoint: HTTP to stable node (best performance)
  • LocalOllama: Built-in local runtime (zero config)

OllaBridge picks the right one automatically.


📈 Scaling

Add More Workers

ollabridge start --workers 4

Use PostgreSQL

pip install psycopg2-binary
export DATABASE_URL=postgresql://user:pass@localhost/ollabridge
ollabridge start --workers 8

Add More Nodes

# Just keep adding nodes!
ollabridge-node join --control ... --token ...

OllaBridge automatically load-balances across all healthy nodes.


🌍 Public Access (Optional)

Quick Demo (Ngrok)

ollabridge start --share

Production (Cloudflare Tunnel)

# Terminal 1: Start gateway
ollabridge start

# Terminal 2: Expose it
cloudflared tunnel --url http://localhost:11435

Now your gateway has a public https:// URL!

Security: Always use API keys for public gateways.


🎓 Beginner's Guide

"I've never used LLMs before"

  1. Install: pip install ollabridge
  2. Start: ollabridge start
  3. Copy the API key from the output
  4. Use this code:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="PASTE_KEY_HERE"
)

response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Explain Python in simple terms"}]
)

print(response.choices[0].message.content)

That's it! You're running AI models on your computer.

"I want to add my gaming PC's GPU"

  1. On your main computer (gateway):

    ollabridge start
    # Copy the "Node join token" and gateway URL
    
  2. On your gaming PC:

    pip install ollabridge
    ollabridge-node join --control http://GATEWAY_IP:11435 --token TOKEN_HERE
    
  3. Done! Your apps can now use your gaming PC's power.

"I want to use free Colab GPUs"

  1. Start your gateway at home:

    ollabridge start --share
    # Note the public URL (https://xxx.ngrok.io)
    
  2. In Colab notebook:

    !pip install ollabridge
    !ollabridge-node join --control https://xxx.ngrok.io --token YOUR_TOKEN
    
  3. Now your apps use FREE Colab GPUs!

Pro tip: When Colab disconnects, just restart and run step 2 again. Zero config changes needed.


🛠️ Common Tasks

List Available Models

curl http://localhost:11435/v1/models

Check Gateway Health

curl http://localhost:11435/health

See Connected Nodes

curl -H "X-API-Key: your-key" http://localhost:11435/admin/runtimes

Create New Enrollment Token

ollabridge enroll-create

View Recent Requests

curl -H "X-API-Key: your-key" http://localhost:11435/admin/recent

🗺️ Roadmap

  • ✅ Control Plane + Node architecture
  • ✅ Outbound-only node enrollment (no port forwarding)
  • ✅ MCP server for AI agent control
  • ✅ Multi-node load balancing
  • 🚧 Tag-based routing (send "coding" requests to GPU nodes)
  • 🚧 Model-specific routing rules
  • 🚧 Streaming support for chat completions
  • 🚧 Web UI for node management
  • 🚧 Prometheus metrics
  • 🚧 Support for more runtimes (vLLM, llama.cpp, LM Studio)

🤝 Contributing

We welcome contributions! Areas we'd love help:

  • 🔌 More runtime adapters (vLLM, llama.cpp, etc.)
  • 🎨 Web UI for management
  • 📊 Better monitoring/metrics
  • 🔒 Security enhancements
  • 📖 Documentation improvements

How to contribute:

  1. Fork the repo
  2. Create a branch (git checkout -b feature/amazing)
  3. Make your changes
  4. Add tests
  5. Submit a PR

📄 License

MIT License - see LICENSE


🙏 Built With


💬 Support


🌟 Star History

If OllaBridge helped you, give it a star! ⭐


Made with ❤️ for the local-first AI community

Stop paying cloud tokens. Use your own compute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollabridge-0.1.0.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollabridge-0.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file ollabridge-0.1.0.tar.gz.

File metadata

  • Download URL: ollabridge-0.1.0.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ollabridge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 13d8b8c88b1e5751edd310659f931f3b3540568babc6269e76215295c5d2d59f
MD5 242ebceee90a346b8dbb69e391d52959
BLAKE2b-256 74053133285125dc845599a2804313ec57cf82b27c7f1c3e2256c09f6852447f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollabridge-0.1.0.tar.gz:

Publisher: publish.yml on ruslanmv/ollabridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ollabridge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ollabridge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ollabridge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce19e0d5bf988dfedf31036b6f4cc31c75ec4f158358f90d4df22344e7770037
MD5 34cd16b8ca4a1764c3fe8aab2a429ef0
BLAKE2b-256 c1854c13445adf144cc59da820bc236b46f197b32d4df9338ef9d42e57f3b290

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollabridge-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ruslanmv/ollabridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page