Distributed AI inference hub — Mixture of Models on Ollama
Project description
MoMa Hub
Distributed AI inference on consumer GPUs — Mixture of Models on Ollama
pip install momahub
What is MoMa Hub?
MoMa Hub is the infrastructure layer for the MoMa (Mixture of Models on Ollama) vision: a federated network where anyone with a gaming GPU can contribute inference capacity to the global AI commons — and route tasks to the right model on the right GPU automatically.
| Inspiration | What they shared | MoMa Hub equivalent |
|---|---|---|
| SETI@home | Idle CPU cycles | Idle GPU cycles |
| Airbnb | Spare bedrooms | Spare VRAM |
| Docker Hub | Container images | Ollama runtime configs |
| GitHub | Source code | SPL prompt scripts |
A GTX 1080 Ti (11 GB VRAM, ~$150 used) runs any 7B model in real time. There are millions sitting idle in gaming PCs. MoMa Hub organises them.
Quick Start
# 1. Install
pip install momahub
# 2. Start the hub server
momahub serve --port 8765
# 3. Register your Ollama node (in another terminal)
momahub register --node-id home-gpu-0 \
--url http://localhost:11434 \
--gpu "GTX 1080 Ti" \
--vram 11 \
--models qwen2.5:7b --models mistral:7b
# 4. Check nodes
momahub nodes
# 5. Run inference through the hub
momahub infer --model qwen2.5:7b --prompt "Explain attention mechanisms"
Python SDK
import asyncio
from momahub import MoMaHub, NodeInfo, InferenceRequest
hub = MoMaHub()
hub.register(NodeInfo(
node_id="home-gpu-0",
url="http://localhost:11434",
gpu_model="GTX 1080 Ti",
models=["qwen2.5:7b"],
))
resp = asyncio.run(hub.infer(InferenceRequest(
model="qwen2.5:7b",
prompt="Hello from MoMa Hub!",
)))
print(resp.content)
Architecture
Consumer GPUs (GTX 1080 Ti × N)
│ Ollama serve (one per GPU)
│
▼
MoMa Hub ── FastAPI registry + round-robin router
│
▼
Client CLI / Python SDK / SPL scripts
See docs/DESIGN.md for the full roadmap and hardware reference.
Roadmap
| Version | Milestone |
|---|---|
| v0.1 (now) | Local MVP: register nodes, round-robin routing, CLI + SDK |
| v0.2 | Persistent registry, capability-aware routing, heartbeat daemon |
| v0.3 | LAN mDNS discovery, SPL integration (USING HUB momahub://...) |
| v0.4 | Internet federation, auth, public hub registry |
Contributing
MoMa Hub is planned as open-source (Apache 2.0). Once the MVP is proven on 4× GTX 1080 Ti at home, the repo will go public. Follow the GitHub for updates.
https://github.com/digital-duck/momahub
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file momahub-0.1.0.tar.gz.
File metadata
- Download URL: momahub-0.1.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad8b6e3a44fa7ef82e9a7e219ad92b6f821d779fd7a29e7abc8edfeffad6a6b1
|
|
| MD5 |
ce0f93124f0b6149b1bd41c72b9c7086
|
|
| BLAKE2b-256 |
ab568815a1c4d68656231982ecc544bafca3a9bb6f6a9fa4e09ca1c43dbc4f47
|
File details
Details for the file momahub-0.1.0-py3-none-any.whl.
File metadata
- Download URL: momahub-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc396487f56cd789ceeb747836181bcf7c4c7d7bd38dadc11d7504766348317b
|
|
| MD5 |
733e3543ddcae183a696f215eaeb90ac
|
|
| BLAKE2b-256 |
237632d7cee7772e0dbd20ccced4de9206dbbb3de40da4044b93c91156d71096
|