Your Personal AI Cloud -- intelligent proxy, router, and cache for LLMs
Project description
LLMHosts.com
Your hardware. Real AI infrastructure. From anywhere.
LLMHosts turns your local GPU into production AI infrastructure with intelligent routing, verified caching, and global access. One command (llmhosts up) auto-detects your hardware, loads models, and exposes an OpenAI-compatible API. The SaaS platform at llmhosts.com provides cost tracking, plan management, and team features.
Two ways to use it:
- Self-hosted CLI — one command to install and run on your own hardware (FSL open-core; Rust inference crates are Apache-2.0 — see
LICENSE-APACHE) - SaaS Platform — Sign up at llmhosts.com for cloud cost tracking, API key management, and team features
Licensing
LLMHosts uses an open-core model under the Functional Source License 1.1 (FSL-1.1-Apache-2.0). All components are FSL-licensed; the table below reflects open-core intent — components free for personal and non-competing use versus those that compete with our hosted service.
| Component | Intent | Converts to Apache 2.0 |
|---|---|---|
| Local inference proxy & router | Open-core (non-competing use free) | 2028-02-24 |
CLI tool (llmhosts) |
Open-core (non-competing use free) | 2028-02-24 |
| Auto-discovery | Open-core (non-competing use free) | 2028-02-24 |
| Cloud tunnel management | Proprietary (competing use restricted) | 2028-02-24 |
| SaaS platform & billing | Proprietary (competing use restricted) | 2028-02-24 |
| Fleet orchestration (Token) | Proprietary (competing use restricted) | 2028-02-24 |
After 2028-02-24, all components convert to Apache 2.0 with no restrictions.
SaaS Platform (llmhosts.com)
Track your AI spending, manage API keys, and get real-time savings projections.
Live at: https://llmhosts.com
Features:
- 📊 Cost tracking across OpenAI, Anthropic, Google AI, AWS Bedrock, Azure
- 🔑 API key management with plan-based limits
- 📈 12-month spending projections with confidence scoring
- 💰 Real-time savings estimates (illustrative; actual savings depend on workload and routing)
- 🎯 Gamified achievements for cost milestones
- 💳 Stripe-powered billing (Pro $29/mo, Team $99/mo, Enterprise $299/mo)
- 👥 Team management (coming soon)
Quick Start:
- Sign up at llmhosts.com
- Add your first cost entry
- Generate an API key for the CLI proxy
- Connect your self-hosted LLMHost proxy to track usage
Self-Hosted CLI
Run the intelligent proxy on your own hardware.
# Windows
irm https://llmhosts.com/install.ps1 | iex
# Linux / macOS
curl -LsSf https://llmhosts.com/install.sh | sh
llmhosts up
Point any OpenAI-compatible tool at http://localhost:4000/v1. Your tools now use your local GPU. Cost: $0.
Why LLMHosts?
- Cloud bills add up — Route Cursor, Claude Code, and Aider to your local GPU instead. Same tools, zero API spend.
- Your hardware, your control — All inference runs on your machine. No data leaves your network unless you choose.
- Works anywhere —
llmhosts tunneluses the built-in LLMHosts Relay (WSS + yamux + Noise NK encryption). Tailscale/Cloudflare are not part of the product path (removed per ADR-009).
Competitive Landscape
We are not entering an existing market — we are creating one. The market: personal and small-team AI infrastructure.
| Player | What They Do | Why They Lose |
|---|---|---|
| OpenAI / Anthropic | Cloud API | 100x more expensive for same hardware quality |
| Ollama | Local model runner | No remote access, no routing, no SaaS, no batching |
| LM Studio | Local GUI | Nowhere near production-ready |
| LocalAI | Self-hosted API | Technical, no UX, no moat |
| Replicate / Together | Hosted inference | Still cloud cost, no local hardware |
| LLMHosts | Infrastructure layer | Proxy + Core engine + relay + SaaS — see repo for shipped vs roadmap |
Competitive Moat
Five compounding advantages that deepen with every user:
| Layer | Name | What It Is |
|---|---|---|
| 1 | First-Mover Position | Building this market category before competition arrives |
| 2 | Data Flywheel | Routing telemetry trains better models → better product → more users |
| 3 | Simplicity Moat | Works for gamers, researchers, founders — not just DevOps engineers |
| 4 | Self-Healing Infrastructure | CI and tooling aim for self-healing; some flows still need operator attention (see issues). |
| 5 | Ecosystem Lock-In | Token AI, Hardware Atlas, savings history = high switching cost |
Features
| Area | Description |
|---|---|
| Proxy | OpenAI + Anthropic compatible API on port 4000. Drop-in for any client. |
| Router | Three-tier design (rules → kNN → classifier). Shipped: rules + wiring; partial / in progress: FAISS/ONNX distribution and full ML tiers (see AGENTS.md honest completion). |
| Cache | Tiered cache design (exact → namespace → semantic). Shipped: exact hash path; in progress: full semantic/vCache tiers per roadmap. |
| Tunnel | llmhosts tunnel — self-hosted LLMHosts Relay only (Noise NK). No Tailscale/Cloudflare in the supported product path (ADR-009). |
| Dashboard | TUI (terminal) + web UI at /dashboard. Live request flow, cache stats, model health. |
| BYOK | Bring your own cloud keys. Fallback to OpenAI/Anthropic when local models can't handle a request. |
Quick Start
Install
# Windows (PowerShell)
irm https://llmhosts.com/install.ps1 | iex
# Linux / macOS
curl -LsSf https://llmhosts.com/install.sh | sh
The installer auto-detects your GPU and installs the right tier. Three tiers available:
| Tier | Extras | Size | Includes |
|---|---|---|---|
| Core | — | ~50MB | Proxy, router, dashboard |
| Smart | [smart] |
~150MB | + ML router, semantic cache |
| Full | [full] |
~2GB | + PyTorch, full intelligence |
Docker:
docker run -p 4000:4000 llmhosts/llmhosts
# GPU: docker run --gpus all -p 4000:4000 llmhosts/llmhosts
Start the Proxy
llmhosts serve
Starts the proxy on http://localhost:4000, auto-detects your GPU, loads models via the built-in Core engine, loads BYOK keys, and launches the TUI dashboard. Web dashboard at http://localhost:4000/dashboard.
Access from Anywhere
The differentiator: make your home GPU reachable from your laptop, phone, or office.
llmhosts tunnel
Uses the LLMHosts Relay (Rust, included in the wheel): WSS + multiplexing + Noise NK end-to-end encryption. You run or connect to a relay endpoint you control — the relay cannot read payload traffic. Tailscale / Cloudflare Tunnel are not supported fallbacks in current product docs (ADR-009).
llmhosts tunnel # Start / manage relay-based remote access
llmhosts tunnel status # Check tunnel status
llmhosts tunnel stop # Stop active tunnel
Works With Everything
Every tool that speaks OpenAI format works. Just set the base URL:
export OPENAI_API_BASE=http://localhost:4000/v1
# Some tools use: export OPENAI_BASE_URL=http://localhost:4000/v1
export OPENAI_API_KEY=anything # LLMHosts accepts any key for local mode
| Tool | How |
|---|---|
| Cursor | Settings > Models > Custom endpoint: http://localhost:4000/v1 |
| Claude Code | Set OPENAI_API_BASE or configure base URL in settings |
| Aider | aider --api-base http://localhost:4000/v1 |
| Continue.dev | Add OpenAI-compatible provider, base URL: http://localhost:4000/v1 |
| Open WebUI | Set OpenAI API URL to http://localhost:4000/v1 |
| Any OpenAI client | base_url="http://localhost:4000/v1" in client config |
Architecture
Request → Proxy (4000) → Router → vCache → Backend
│ │ │
│ ├─ Tier 1: Rules
│ ├─ Tier 2: kNN (FAISS + embeddings — partial ship)
│ └─ Tier 3: ModernBERT → Qwen-0.5B (tiers vary by install)
│
├─ Cache: exact hash (shipped) → namespace / semantic (roadmap)
│
└─ Backend: Core Engine | Cloud API (BYOK)
Commands
| Command | Description |
|---|---|
llmhosts serve |
Start proxy + dashboard |
llmhosts tunnel |
Start secure tunnel (LLMHosts Relay + Noise NK; ADR-009) |
llmhosts tunnel status |
Show tunnel status |
llmhosts tunnel stop |
Stop active tunnel |
llmhosts doctor |
Verify setup and dependencies |
llmhosts setup |
Interactive first-run wizard |
llmhosts keys add <provider> <key> |
Add BYOK API key |
llmhosts keys list |
List configured providers |
llmhosts keys validate |
Validate stored keys |
llmhosts cache stats |
Cache hit rates and size |
llmhosts cache clear |
Clear cache |
llmhosts suggest-models |
Recommend models for your hardware |
Dashboard
- TUI — Built-in terminal UI when you run
llmhosts serve. Live request flow, backends, cache activity. - Web — Browser dashboard at
http://localhost:4000/dashboard. Request history, cache stats, model health.
Configuration
- TOML —
~/.config/llmhosts/config.tomlor--config path/to/config.toml - Env —
LLMHOSTS_*prefixed variables - CLI —
--host,--port,--no-tui,--log-level
Development
docker compose run --rm dev
uv pip install -e ".[dev]"
llmhosts --version
pytest tests/ -v
Contributing
PRs welcome. Open an issue first for large changes. Run pytest tests/ and ruff check . before submitting.
License
- Distribution / Python package: FSL-1.1-Apache-2.0 — see Licensing for open-core intent.
- Open-source inference crates (
llmhosts_core,relay_core,router_core): Apache-2.0 (SPDX headers in source).
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmhosts-0.14.5-cp312-abi3-win_amd64.whl.
File metadata
- Download URL: llmhosts-0.14.5-cp312-abi3-win_amd64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.12+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00d5d10705fa84f432b193ee20f742729d3f28b13240dbbeb2900d54b41378b5
|
|
| MD5 |
f3fff8dcb8e2282a6a4f6b7bd4d56895
|
|
| BLAKE2b-256 |
82040a799254d719f3e5795cd707337a7698613e3f7d755c8c0c0179777706d0
|
File details
Details for the file llmhosts-0.14.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: llmhosts-0.14.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 5.7 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62b976aa4a0d1d7ea0e58172c6b61ae7f06f21bfba42907af93070a6e0575e09
|
|
| MD5 |
bb34a992027d59235b644c4bb400e01d
|
|
| BLAKE2b-256 |
803399391a909bf9685864aee484f6ba503e3ff6a7c1667a9f5419dd75587b8e
|
File details
Details for the file llmhosts-0.14.5-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: llmhosts-0.14.5-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dac47e89f0f411d14a63be26f3cf7889e32d39fa130357d602675d795f6cfe0
|
|
| MD5 |
c584d323fe63c3f3f7e8cac86f17f48d
|
|
| BLAKE2b-256 |
cc31ef8685a29a022785f170b37bc0fe496dae98fc6b97d4dcc8eaf117242ae3
|