Skip to main content

Your Personal AI Cloud -- intelligent proxy, router, and cache for LLMs

Project description

LLMHosts.com

PyPI version Python 3.12+ License: FSL-1.1-Apache-2.0 CI Tests PRs Welcome

Your hardware. Real AI infrastructure. From anywhere.

LLMHosts turns your local GPU into production AI infrastructure with intelligent routing, verified caching, and global access. One command (llmhosts up) auto-detects your hardware, loads models, and exposes an OpenAI-compatible API. The SaaS platform at llmhosts.com provides cost tracking, plan management, and team features.

Two ways to use it:

  • Self-hosted CLI — one command to install and run on your own hardware (FSL open-core; Rust inference crates are Apache-2.0 — see LICENSE-APACHE)
  • SaaS Platform — Sign up at llmhosts.com for cloud cost tracking, API key management, and team features

Licensing

LLMHosts uses an open-core model under the Functional Source License 1.1 (FSL-1.1-Apache-2.0). All components are FSL-licensed; the table below reflects open-core intent — components free for personal and non-competing use versus those that compete with our hosted service.

Component Intent Converts to Apache 2.0
Local inference proxy & router Open-core (non-competing use free) 2028-02-24
CLI tool (llmhosts) Open-core (non-competing use free) 2028-02-24
Auto-discovery Open-core (non-competing use free) 2028-02-24
Cloud tunnel management Proprietary (competing use restricted) 2028-02-24
SaaS platform & billing Proprietary (competing use restricted) 2028-02-24
Fleet orchestration (Token) Proprietary (competing use restricted) 2028-02-24

After 2028-02-24, all components convert to Apache 2.0 with no restrictions.


SaaS Platform (llmhosts.com)

Track your AI spending, manage API keys, and get real-time savings projections.

Live at: https://llmhosts.com

Features:

  • 📊 Cost tracking across OpenAI, Anthropic, Google AI, AWS Bedrock, Azure
  • 🔑 API key management with plan-based limits
  • 📈 12-month spending projections with confidence scoring
  • 💰 Real-time savings estimates (illustrative; actual savings depend on workload and routing)
  • 🎯 Gamified achievements for cost milestones
  • 💳 Stripe-powered billing (Pro $29/mo, Team $99/mo, Enterprise $299/mo)
  • 👥 Team management (coming soon)

Quick Start:

  1. Sign up at llmhosts.com
  2. Add your first cost entry
  3. Generate an API key for the CLI proxy
  4. Connect your self-hosted LLMHost proxy to track usage

Self-Hosted CLI

Run the intelligent proxy on your own hardware.

# Windows
irm https://llmhosts.com/install.ps1 | iex
# Linux / macOS
curl -LsSf https://llmhosts.com/install.sh | sh
llmhosts up

Point any OpenAI-compatible tool at http://localhost:4000/v1. Your tools now use your local GPU. Cost: $0.


Why LLMHosts?

  • Cloud bills add up — Route Cursor, Claude Code, and Aider to your local GPU instead. Same tools, zero API spend.
  • Your hardware, your control — All inference runs on your machine. No data leaves your network unless you choose.
  • Works anywherellmhosts tunnel uses the built-in LLMHosts Relay (WSS + yamux + Noise NK encryption). Tailscale/Cloudflare are not part of the product path (removed per ADR-009).

Competitive Landscape

We are not entering an existing market — we are creating one. The market: personal and small-team AI infrastructure.

Player What They Do Why They Lose
OpenAI / Anthropic Cloud API 100x more expensive for same hardware quality
Ollama Local model runner No remote access, no routing, no SaaS, no batching
LM Studio Local GUI Nowhere near production-ready
LocalAI Self-hosted API Technical, no UX, no moat
Replicate / Together Hosted inference Still cloud cost, no local hardware
LLMHosts Infrastructure layer Proxy + Core engine + relay + SaaS — see repo for shipped vs roadmap

Competitive Moat

Five compounding advantages that deepen with every user:

Layer Name What It Is
1 First-Mover Position Building this market category before competition arrives
2 Data Flywheel Routing telemetry trains better models → better product → more users
3 Simplicity Moat Works for gamers, researchers, founders — not just DevOps engineers
4 Self-Healing Infrastructure CI and tooling aim for self-healing; some flows still need operator attention (see issues).
5 Ecosystem Lock-In Token AI, Hardware Atlas, savings history = high switching cost

Features

Area Description
Proxy OpenAI + Anthropic compatible API on port 4000. Drop-in for any client.
Router Three-tier design (rules → kNN → classifier). Shipped: rules + wiring; partial / in progress: FAISS/ONNX distribution and full ML tiers (see AGENTS.md honest completion).
Cache Tiered cache design (exact → namespace → semantic). Shipped: exact hash path; in progress: full semantic/vCache tiers per roadmap.
Tunnel llmhosts tunnelself-hosted LLMHosts Relay only (Noise NK). No Tailscale/Cloudflare in the supported product path (ADR-009).
Dashboard TUI (terminal) + web UI at /dashboard. Live request flow, cache stats, model health.
BYOK Bring your own cloud keys. Fallback to OpenAI/Anthropic when local models can't handle a request.

Quick Start

Install

# Windows (PowerShell)
irm https://llmhosts.com/install.ps1 | iex
# Linux / macOS
curl -LsSf https://llmhosts.com/install.sh | sh

The installer auto-detects your GPU and installs the right tier. Three tiers available:

Tier Extras Size Includes
Core ~50MB Proxy, router, dashboard
Smart [smart] ~150MB + ML router, semantic cache
Full [full] ~2GB + PyTorch, full intelligence

Docker:

docker run -p 4000:4000 llmhosts/llmhosts
# GPU: docker run --gpus all -p 4000:4000 llmhosts/llmhosts

Start the Proxy

llmhosts serve

Starts the proxy on http://localhost:4000, auto-detects your GPU, loads models via the built-in Core engine, loads BYOK keys, and launches the TUI dashboard. Web dashboard at http://localhost:4000/dashboard.

Access from Anywhere

The differentiator: make your home GPU reachable from your laptop, phone, or office.

llmhosts tunnel

Uses the LLMHosts Relay (Rust, included in the wheel): WSS + multiplexing + Noise NK end-to-end encryption. You run or connect to a relay endpoint you control — the relay cannot read payload traffic. Tailscale / Cloudflare Tunnel are not supported fallbacks in current product docs (ADR-009).

llmhosts tunnel           # Start / manage relay-based remote access
llmhosts tunnel status    # Check tunnel status
llmhosts tunnel stop      # Stop active tunnel

Works With Everything

Every tool that speaks OpenAI format works. Just set the base URL:

export OPENAI_API_BASE=http://localhost:4000/v1
# Some tools use: export OPENAI_BASE_URL=http://localhost:4000/v1
export OPENAI_API_KEY=anything   # LLMHosts accepts any key for local mode
Tool How
Cursor Settings > Models > Custom endpoint: http://localhost:4000/v1
Claude Code Set OPENAI_API_BASE or configure base URL in settings
Aider aider --api-base http://localhost:4000/v1
Continue.dev Add OpenAI-compatible provider, base URL: http://localhost:4000/v1
Open WebUI Set OpenAI API URL to http://localhost:4000/v1
Any OpenAI client base_url="http://localhost:4000/v1" in client config

Architecture

Request  →  Proxy (4000)  →  Router  →  vCache  →  Backend
                │              │          │
                │              ├─ Tier 1: Rules
                │              ├─ Tier 2: kNN (FAISS + embeddings — partial ship)
                │              └─ Tier 3: ModernBERT → Qwen-0.5B (tiers vary by install)
                │
                ├─ Cache: exact hash (shipped) → namespace / semantic (roadmap)
                │
                └─ Backend: Core Engine | Cloud API (BYOK)

Commands

Command Description
llmhosts serve Start proxy + dashboard
llmhosts tunnel Start secure tunnel (LLMHosts Relay + Noise NK; ADR-009)
llmhosts tunnel status Show tunnel status
llmhosts tunnel stop Stop active tunnel
llmhosts doctor Verify setup and dependencies
llmhosts setup Interactive first-run wizard
llmhosts keys add <provider> <key> Add BYOK API key
llmhosts keys list List configured providers
llmhosts keys validate Validate stored keys
llmhosts cache stats Cache hit rates and size
llmhosts cache clear Clear cache
llmhosts suggest-models Recommend models for your hardware

Dashboard

  • TUI — Built-in terminal UI when you run llmhosts serve. Live request flow, backends, cache activity.
  • Web — Browser dashboard at http://localhost:4000/dashboard. Request history, cache stats, model health.

Configuration

  • TOML~/.config/llmhosts/config.toml or --config path/to/config.toml
  • EnvLLMHOSTS_* prefixed variables
  • CLI--host, --port, --no-tui, --log-level

Development

docker compose run --rm dev
uv pip install -e ".[dev]"
llmhosts --version
pytest tests/ -v

Contributing

PRs welcome. Open an issue first for large changes. Run pytest tests/ and ruff check . before submitting.


License

  • Distribution / Python package: FSL-1.1-Apache-2.0 — see Licensing for open-core intent.
  • Open-source inference crates (llmhosts_core, relay_core, router_core): Apache-2.0 (SPDX headers in source).

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

llmhosts-0.13.7-cp312-abi3-win_amd64.whl (7.2 MB view details)

Uploaded CPython 3.12+Windows x86-64

llmhosts-0.13.7-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

llmhosts-0.13.7-cp312-abi3-macosx_11_0_arm64.whl (5.0 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

File details

Details for the file llmhosts-0.13.7-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: llmhosts-0.13.7-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 7.2 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmhosts-0.13.7-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 17ace6bf49bdb788b84f11a9c664e6f1bfda23e2213f1bfb041929784d95ddd7
MD5 c4a1ea41b1de8a302a5bfbd58873d217
BLAKE2b-256 652eb25f2c9a444879afb173febcf54d683a0c6446845c0ab155f1e4aa0d0e3f

See more details on using hashes here.

File details

Details for the file llmhosts-0.13.7-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llmhosts-0.13.7-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 29e3d14832c5a4eec5ffed04c6e8e321441e595156e734ca76d3b4b2ea318255
MD5 8876870915c11e828b619759ed10695c
BLAKE2b-256 d95cb0464556e9ed46df66528d1e03eb4cad79ac4d275dcc87552cb4c9d952df

See more details on using hashes here.

File details

Details for the file llmhosts-0.13.7-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llmhosts-0.13.7-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 627a3513280008e57b3710bb3d4c52d3cff531b8876b2dcd9a49f76db707bd46
MD5 1918c7da226cf265686ff9f5e4958337
BLAKE2b-256 4d4000f6b134d6fc082fbf962d1c2d9dbcdbab2dc0cdcb500d8609e8577c2693

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page