Your Personal AI Cloud -- intelligent proxy, router, and cache for LLMs
Project description
LLMHosts.com
Your hardware. Real AI infrastructure. From anywhere.
LLMHosts turns your local GPU into production AI infrastructure with intelligent routing, verified caching, and global access. The CLI proxy makes your Ollama/vLLM OpenAI-compatible. The SaaS platform at llmhosts.com provides cost tracking, plan management, and team features.
Two ways to use it:
- Self-hosted CLI —
pip install llmhostsand run on your own hardware (free, open source) - SaaS Platform — Sign up at llmhosts.com for cloud cost tracking, API key management, and team features
Licensing
LLMHosts uses an open-core model under the Functional Source License 1.1 (FSL-1.1-Apache-2.0). All components are FSL-licensed; the table below reflects open-core intent — components free for personal and non-competing use versus those that compete with our hosted service.
| Component | Intent | Converts to Apache 2.0 |
|---|---|---|
| Local inference proxy & router | Open-core (non-competing use free) | 2028-02-24 |
CLI tool (llmhosts) |
Open-core (non-competing use free) | 2028-02-24 |
| Auto-discovery | Open-core (non-competing use free) | 2028-02-24 |
| Cloud tunnel management | Proprietary (competing use restricted) | 2028-02-24 |
| SaaS platform & billing | Proprietary (competing use restricted) | 2028-02-24 |
| Fleet orchestration (Token) | Proprietary (competing use restricted) | 2028-02-24 |
After 2028-02-24, all components convert to Apache 2.0 with no restrictions.
SaaS Platform (llmhosts.com)
Track your AI spending, manage API keys, and get real-time savings projections.
Live at: https://llmhosts.com
Features:
- 📊 Cost tracking across OpenAI, Anthropic, Google AI, AWS Bedrock, Azure
- 🔑 API key management with plan-based limits
- 📈 12-month spending projections with confidence scoring
- 💰 Real-time savings estimates (35% with intelligent caching + routing)
- 🎯 Gamified achievements for cost milestones
- 💳 Stripe-powered billing (Pro $29/mo, Team $99/mo, Enterprise $299/mo)
- 👥 Team management (coming soon)
Quick Start:
- Sign up at llmhosts.com
- Add your first cost entry
- Generate an API key for the CLI proxy
- Connect your self-hosted LLMHost proxy to track usage
Self-Hosted CLI
Run the intelligent proxy on your own hardware.
pip install llmhosts
llmhosts serve
Point any OpenAI-compatible tool at http://localhost:4000/v1. Your tools now use your local GPU. Cost: $0.
Why LLMHosts?
- Cloud bills add up — Route Cursor, Claude Code, and Aider to your local GPU instead. Same tools, zero API spend.
- Your hardware, your control — All inference runs on your machine. No data leaves your network unless you choose.
- Works anywhere —
llmhosts tunneluses the built-in LLMHosts Relay (zero config) or falls back to Tailscale/Cloudflare if installed. Your home GPU becomes your portable AI.
Competitive Landscape
We are not entering an existing market — we are creating one. The market: personal and small-team AI infrastructure.
| Player | What They Do | Why They Lose |
|---|---|---|
| OpenAI / Anthropic | Cloud API | 100x more expensive for same hardware quality |
| Ollama | Local model runner | No remote access, no routing, no SaaS |
| LM Studio | Local GUI | Nowhere near production-ready |
| LocalAI | Self-hosted API | Technical, no UX, no moat |
| Replicate / Together | Hosted inference | Still cloud cost, no local hardware |
| LLMHosts | Infrastructure layer | The only production-grade local AI platform |
Competitive Moat
Five compounding advantages that deepen with every user:
| Layer | Name | What It Is |
|---|---|---|
| 1 | First-Mover Position | Building this market category before competition arrives |
| 2 | Data Flywheel | Routing telemetry trains better models → better product → more users |
| 3 | Simplicity Moat | Works for gamers, researchers, founders — not just DevOps engineers |
| 4 | Self-Healing Infrastructure | Fixes itself while you sleep. Zero tinkering required. |
| 5 | Ecosystem Lock-In | Token AI, Hardware Atlas, savings history = high switching cost |
Features
| Area | Description |
|---|---|
| Proxy | OpenAI + Anthropic compatible API on port 4000. Drop-in for any client. |
| Router | Three-tier: rules first, then kNN similarity, then ModernBERT classifier. Routes each request to the right model. |
| Cache | Three-tier vCache: exact hash, entity namespace, verified semantic. Cut repeat calls to zero. |
| Tunnel | llmhosts tunnel — built-in LLMHosts Relay (zero config), falls back to Tailscale or Cloudflare. Your GPU on your laptop, anywhere. |
| Dashboard | TUI (terminal) + web UI at /dashboard. Live request flow, cache stats, model health. |
| BYOK | Bring your own cloud keys. Fallback to OpenAI/Anthropic when local models can't handle a request. |
Quick Start
Install
Three tiers, pick what you need:
pip install llmhosts # Core (~50MB) — proxy, router, dashboard
pip install "llmhosts[smart]" # Smart (~150MB) — + ML router, semantic cache
pip install "llmhosts[full]" # Full (~2GB) — + PyTorch, full intelligence
Docker:
docker run -p 4000:4000 llmhosts/llmhosts
# GPU: docker run --gpus all -p 4000:4000 llmhosts/llmhosts
Start the Proxy
llmhosts serve
Starts the proxy on http://localhost:4000, auto-discovers Ollama, loads BYOK keys, and launches the TUI dashboard. Web dashboard at http://localhost:4000/dashboard.
Access from Anywhere
The differentiator: make your home GPU reachable from your laptop, phone, or office.
llmhosts tunnel
Uses the built-in LLMHosts Relay by default (zero config, Rust binary included in the pip wheel). Falls back to Tailscale or Cloudflare if installed. Prints a URL — use it from any device. No VPN config, no port forwarding.
llmhosts tunnel # Auto: relay first, then Tailscale/Cloudflare
llmhosts tunnel --provider tailscale --funnel # Force Tailscale Funnel
llmhosts tunnel status # Check tunnel status
llmhosts tunnel stop # Stop active tunnel
Works With Everything
Every tool that speaks OpenAI format works. Just set the base URL:
export OPENAI_API_BASE=http://localhost:4000/v1
# Some tools use: export OPENAI_BASE_URL=http://localhost:4000/v1
export OPENAI_API_KEY=anything # LLMHosts accepts any key for local mode
| Tool | How |
|---|---|
| Cursor | Settings > Models > Custom endpoint: http://localhost:4000/v1 |
| Claude Code | Set OPENAI_API_BASE or configure base URL in settings |
| Aider | aider --api-base http://localhost:4000/v1 |
| Continue.dev | Add OpenAI-compatible provider, base URL: http://localhost:4000/v1 |
| Open WebUI | Set OpenAI API URL to http://localhost:4000/v1 |
| Any OpenAI client | base_url="http://localhost:4000/v1" in client config |
Architecture
Request → Proxy (4000) → Router → vCache → Backend
│ │ │
│ ├─ Tier 1: Rules
│ ├─ Tier 2: kNN (FAISS + all-MiniLM)
│ └─ Tier 3: ModernBERT → Qwen-0.5B
│
├─ Cache: exact hash → namespace → semantic (vCache)
│
└─ Backend: Ollama | Cloud API (BYOK)
Commands
| Command | Description |
|---|---|
llmhosts serve |
Start proxy + dashboard |
llmhosts tunnel |
Start secure tunnel (built-in relay, Tailscale/Cloudflare fallback) |
llmhosts tunnel status |
Show tunnel status |
llmhosts tunnel stop |
Stop active tunnel |
llmhosts doctor |
Verify setup and dependencies |
llmhosts setup |
Interactive first-run wizard |
llmhosts keys add <provider> <key> |
Add BYOK API key |
llmhosts keys list |
List configured providers |
llmhosts keys validate |
Validate stored keys |
llmhosts cache stats |
Cache hit rates and size |
llmhosts cache clear |
Clear cache |
llmhosts suggest-models |
Recommend models for your hardware |
Dashboard
- TUI — Built-in terminal UI when you run
llmhosts serve. Live request flow, backends, cache activity. - Web — Browser dashboard at
http://localhost:4000/dashboard. Request history, cache stats, model health.
Configuration
- TOML —
~/.config/llmhosts/config.tomlor--config path/to/config.toml - Env —
LLMHOSTS_*prefixed variables - CLI —
--host,--port,--no-tui,--log-level
Development
docker compose run --rm dev
pip install -e ".[dev]"
llmhosts --version
pytest tests/ -v
Contributing
PRs welcome. Open an issue first for large changes. Run pytest tests/ and ruff check . before submitting.
License
FSL-1.1-Apache-2.0 — see Licensing section above for the open-core breakdown.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmhosts-0.7.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: llmhosts-0.7.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 5.4 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6adc713abfc03798bec93a6964973503d41730a49f8a820cdedcb31ad7e88c8
|
|
| MD5 |
9f08bc2aeef3f8b0412905ac2e737c42
|
|
| BLAKE2b-256 |
7b3ecd65aa3ff8b0dc21df8320a0ef983ffcbeda3fb603957910f7a74309e4b4
|
File details
Details for the file llmhosts-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: llmhosts-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 5.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ebb2547630d224baaf2c1c23633755bf7f62782c29332399cc347d3527c8729
|
|
| MD5 |
a3ac4f14a78876468bf9c9d175e8fb81
|
|
| BLAKE2b-256 |
55894fb4b4c9c7f9fad1151b1d818671ecb8e6f4527050cf15c3e77f42c7c7e7
|
File details
Details for the file llmhosts-0.7.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: llmhosts-0.7.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.4 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a872097302d71eccfa276e43c19383cb974c5836a504f073e5670c768f39a1b
|
|
| MD5 |
b381e404ec2f0183b6b0854dbab37117
|
|
| BLAKE2b-256 |
f8f4242aba7531739c77ed77a5518bfdf270dc4c8dcac2bd2647efe5c78efe0a
|