Skip to main content

Your Personal AI Cloud -- intelligent proxy, router, and cache for LLMs

Project description

LLMHosts.com

PyPI version Python 3.10+ License: FSL-1.1-Apache-2.0 CI Tests PRs Welcome

Your hardware. Real AI infrastructure. From anywhere.

LLMHosts turns your local GPU into production AI infrastructure with intelligent routing, verified caching, and global access. The CLI proxy makes your Ollama/vLLM OpenAI-compatible. The SaaS platform at llmhosts.com provides cost tracking, plan management, and team features.

Two ways to use it:

  • Self-hosted CLIpip install llmhosts and run on your own hardware (free, open source)
  • SaaS Platform — Sign up at llmhosts.com for cloud cost tracking, API key management, and team features

Licensing

LLMHosts uses an open-core model under the Functional Source License 1.1 (FSL-1.1-Apache-2.0). All components are FSL-licensed; the table below reflects open-core intent — components free for personal and non-competing use versus those that compete with our hosted service.

Component Intent Converts to Apache 2.0
Local inference proxy & router Open-core (non-competing use free) 2028-02-24
CLI tool (llmhosts) Open-core (non-competing use free) 2028-02-24
Auto-discovery Open-core (non-competing use free) 2028-02-24
Cloud tunnel management Proprietary (competing use restricted) 2028-02-24
SaaS platform & billing Proprietary (competing use restricted) 2028-02-24
Fleet orchestration (Token) Proprietary (competing use restricted) 2028-02-24

After 2028-02-24, all components convert to Apache 2.0 with no restrictions.


SaaS Platform (llmhosts.com)

Track your AI spending, manage API keys, and get real-time savings projections.

Live at: https://llmhosts.com

Features:

  • 📊 Cost tracking across OpenAI, Anthropic, Google AI, AWS Bedrock, Azure
  • 🔑 API key management with plan-based limits
  • 📈 12-month spending projections with confidence scoring
  • 💰 Real-time savings estimates (35% with intelligent caching + routing)
  • 🎯 Gamified achievements for cost milestones
  • 💳 Stripe-powered billing (Pro $29/mo, Team $99/mo, Enterprise $299/mo)
  • 👥 Team management (coming soon)

Quick Start:

  1. Sign up at llmhosts.com
  2. Add your first cost entry
  3. Generate an API key for the CLI proxy
  4. Connect your self-hosted LLMHost proxy to track usage

Self-Hosted CLI

Run the intelligent proxy on your own hardware.

pip install llmhosts
llmhosts serve

Point any OpenAI-compatible tool at http://localhost:4000/v1. Your tools now use your local GPU. Cost: $0.


Why LLMHosts?

  • Cloud bills add up — Route Cursor, Claude Code, and Aider to your local GPU instead. Same tools, zero API spend.
  • Your hardware, your control — All inference runs on your machine. No data leaves your network unless you choose.
  • Works anywherellmhosts tunnel uses the built-in LLMHosts Relay (zero config) or falls back to Tailscale/Cloudflare if installed. Your home GPU becomes your portable AI.

Competitive Landscape

We are not entering an existing market — we are creating one. The market: personal and small-team AI infrastructure.

Player What They Do Why They Lose
OpenAI / Anthropic Cloud API 100x more expensive for same hardware quality
Ollama Local model runner No remote access, no routing, no SaaS
LM Studio Local GUI Nowhere near production-ready
LocalAI Self-hosted API Technical, no UX, no moat
Replicate / Together Hosted inference Still cloud cost, no local hardware
LLMHosts Infrastructure layer The only production-grade local AI platform

Competitive Moat

Five compounding advantages that deepen with every user:

Layer Name What It Is
1 First-Mover Position Building this market category before competition arrives
2 Data Flywheel Routing telemetry trains better models → better product → more users
3 Simplicity Moat Works for gamers, researchers, founders — not just DevOps engineers
4 Self-Healing Infrastructure Fixes itself while you sleep. Zero tinkering required.
5 Ecosystem Lock-In Token AI, Hardware Atlas, savings history = high switching cost

Features

Area Description
Proxy OpenAI + Anthropic compatible API on port 4000. Drop-in for any client.
Router Three-tier: rules first, then kNN similarity, then ModernBERT classifier. Routes each request to the right model.
Cache Three-tier vCache: exact hash, entity namespace, verified semantic. Cut repeat calls to zero.
Tunnel llmhosts tunnel — built-in LLMHosts Relay (zero config), falls back to Tailscale or Cloudflare. Your GPU on your laptop, anywhere.
Dashboard TUI (terminal) + web UI at /dashboard. Live request flow, cache stats, model health.
BYOK Bring your own cloud keys. Fallback to OpenAI/Anthropic when local models can't handle a request.

Quick Start

Install

Three tiers, pick what you need:

pip install llmhosts                    # Core (~50MB) — proxy, router, dashboard
pip install "llmhosts[smart]"          # Smart (~150MB) — + ML router, semantic cache
pip install "llmhosts[full]"           # Full (~2GB) — + PyTorch, full intelligence

Docker:

docker run -p 4000:4000 llmhosts/llmhosts
# GPU: docker run --gpus all -p 4000:4000 llmhosts/llmhosts

Start the Proxy

llmhosts serve

Starts the proxy on http://localhost:4000, auto-discovers Ollama, loads BYOK keys, and launches the TUI dashboard. Web dashboard at http://localhost:4000/dashboard.

Access from Anywhere

The differentiator: make your home GPU reachable from your laptop, phone, or office.

llmhosts tunnel

Uses the built-in LLMHosts Relay by default (zero config, Rust binary included in the pip wheel). Falls back to Tailscale or Cloudflare if installed. Prints a URL — use it from any device. No VPN config, no port forwarding.

llmhosts tunnel                                  # Auto: relay first, then Tailscale/Cloudflare
llmhosts tunnel --provider tailscale --funnel    # Force Tailscale Funnel
llmhosts tunnel status                           # Check tunnel status
llmhosts tunnel stop                             # Stop active tunnel

Works With Everything

Every tool that speaks OpenAI format works. Just set the base URL:

export OPENAI_API_BASE=http://localhost:4000/v1
# Some tools use: export OPENAI_BASE_URL=http://localhost:4000/v1
export OPENAI_API_KEY=anything   # LLMHosts accepts any key for local mode
Tool How
Cursor Settings > Models > Custom endpoint: http://localhost:4000/v1
Claude Code Set OPENAI_API_BASE or configure base URL in settings
Aider aider --api-base http://localhost:4000/v1
Continue.dev Add OpenAI-compatible provider, base URL: http://localhost:4000/v1
Open WebUI Set OpenAI API URL to http://localhost:4000/v1
Any OpenAI client base_url="http://localhost:4000/v1" in client config

Architecture

Request  →  Proxy (4000)  →  Router  →  vCache  →  Backend
                │              │          │
                │              ├─ Tier 1: Rules
                │              ├─ Tier 2: kNN (FAISS + all-MiniLM)
                │              └─ Tier 3: ModernBERT → Qwen-0.5B
                │
                ├─ Cache: exact hash → namespace → semantic (vCache)
                │
                └─ Backend: Ollama | Cloud API (BYOK)

Commands

Command Description
llmhosts serve Start proxy + dashboard
llmhosts tunnel Start secure tunnel (built-in relay, Tailscale/Cloudflare fallback)
llmhosts tunnel status Show tunnel status
llmhosts tunnel stop Stop active tunnel
llmhosts doctor Verify setup and dependencies
llmhosts setup Interactive first-run wizard
llmhosts keys add <provider> <key> Add BYOK API key
llmhosts keys list List configured providers
llmhosts keys validate Validate stored keys
llmhosts cache stats Cache hit rates and size
llmhosts cache clear Clear cache
llmhosts suggest-models Recommend models for your hardware

Dashboard

  • TUI — Built-in terminal UI when you run llmhosts serve. Live request flow, backends, cache activity.
  • Web — Browser dashboard at http://localhost:4000/dashboard. Request history, cache stats, model health.

Configuration

  • TOML~/.config/llmhosts/config.toml or --config path/to/config.toml
  • EnvLLMHOSTS_* prefixed variables
  • CLI--host, --port, --no-tui, --log-level

Development

docker compose run --rm dev
pip install -e ".[dev]"
llmhosts --version
pytest tests/ -v

Contributing

PRs welcome. Open an issue first for large changes. Run pytest tests/ and ruff check . before submitting.


License

FSL-1.1-Apache-2.0 — see Licensing section above for the open-core breakdown.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmhosts-0.5.1.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

llmhosts-0.5.1-cp312-cp312-win_amd64.whl (4.5 MB view details)

Uploaded CPython 3.12Windows x86-64

llmhosts-0.5.1-cp312-cp312-manylinux_2_34_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

llmhosts-0.5.1-cp312-cp312-macosx_11_0_arm64.whl (3.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

llmhosts-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file llmhosts-0.5.1.tar.gz.

File metadata

  • Download URL: llmhosts-0.5.1.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.4

File hashes

Hashes for llmhosts-0.5.1.tar.gz
Algorithm Hash digest
SHA256 32f9ccf70fab67ed6e9a3275071ce585a0b93c6dd40462259c0e96eaabb786cc
MD5 c94b9a18f3478d92932e8b785c7fbc6e
BLAKE2b-256 591d12101583b5123282680b30594b0df213b88ddb9dda2b35ec6635ec77338b

See more details on using hashes here.

File details

Details for the file llmhosts-0.5.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: llmhosts-0.5.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmhosts-0.5.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6a9d8f5e4b0f4fc5d05f03df431fa8a74898197fd2509e857be49976f15849e2
MD5 57508b5fa9262b71044684afe457eef0
BLAKE2b-256 e967a87d9f0726f33a067815cb00f16b5a177ca65f17718438b278b4558c1a88

See more details on using hashes here.

File details

Details for the file llmhosts-0.5.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for llmhosts-0.5.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d18c4aeb7721441daf86de40c67c9ac61f8a3815aede222a65fd0333225acd8e
MD5 343923c27ebcb164d0afdc73a87e8ea8
BLAKE2b-256 f83d92272d08791d9757b89d40caf8a4c8d38974163c0ae5f8208b1c2ee8e027

See more details on using hashes here.

File details

Details for the file llmhosts-0.5.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llmhosts-0.5.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b4a16db4704fd7c1adddd2c6935b1cf35a6124e76ffcfb1281bf4cbeed5c56d0
MD5 0d8d15f96cd42a49d5f05a9f6814df12
BLAKE2b-256 15ecd444da10333400e626f6ada5098390cb6f0520e28be425cec36f67bf0b07

See more details on using hashes here.

File details

Details for the file llmhosts-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llmhosts-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb055783104b74c7e9f09a32c9ccbf1225947166bddb64748034c170b8da9212
MD5 ea0040ebc61f9b704da34fd1784caf47
BLAKE2b-256 bfb90321ae7161c210965838cea4a09ebec8d8d99c176ccc07fbea101563f7c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page