Skip to main content

Lightweight local LLM proxy with multiple provider management, automatic failover, privacy protection, and latency-based selection.

Project description

VibeLLM

English | 中文

Lightweight local LLM proxy with multiple provider management, privacy protection, and automatic failover for personal use.

Features

  • Lightweight: Only ~8MB install size (vs 100MB+ for litellm-proxy)
  • Privacy Protection: Automatically detect PII (personal identifiable information), route simple PII to local LLM, anonymize complex PII for remote LLM and restore automatically
  • ✅ Dual endpoints: Provides both OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) localhost endpoints
  • ✅ Multiple provider management: Add/remove/enable/disable providers with CLI
  • ✅ Support local LLMs: Native support for Ollama, llama.cpp, and any OpenAI-compatible local servers
  • ✅ Automatic failover: When you hit rate limit, automatically try the next provider
  • ✅ Latency benchmarking: Test which provider is fastest and auto-select
  • ✅ Format translation: A client configured for OpenAI can call Anthropic/Gemini, and vice versa
  • ✅ Claude Code skill integration: Claude can manage providers for you

Supported Providers

Incoming \ Target OpenAI Anthropic Gemini Local (OpenAI-compatible)
OpenAI ✅ Direct ✅ Translate ✅ Translate ✅ Direct
Anthropic ✅ Translate ✅ Direct ✅ Translate ✅ Translate

Installation

Install from PyPI (recommended)

pip install vibellm

Install from source

git clone https://github.com/easyhealth/VibeLLM.git
cd VibeLLM
pip install -e .

Quick Start

  1. Add your first provider:
vibellm add \
  --name openai \
  --base-url https://api.openai.com/v1 \
  --api-key sk-xxx \
  --default-model gpt-4o
  1. Start the server:
vibellm start --port 8080
  1. Configure your client to use:
  • OpenAI endpoint: http://localhost:8080/v1/chat/completions
  • Anthropic endpoint: http://localhost:8080/v1/messages

CLI Commands

Command Description
vibellm start Start the proxy server
vibellm add Add a new provider
vibellm remove Remove a provider
vibellm list List all providers
vibellm enable <name> Enable a provider
vibellm disable <name> Disable a provider
vibellm default <name> Set default provider
vibellm test <name> Test connectivity to a provider
vibellm benchmark Test latency for all providers
vibellm benchmark --auto-set Test and set fastest as default
vibellm status Show server status

Benchmarking

Find your fastest provider:

vibellm benchmark --auto-set

This will:

  1. Test all enabled providers with a simple request
  2. Measure latency
  3. Automatically set the fastest as the default

Claude Code Skill Installation

To use as a Claude Code skill, add this to your Claude Code skills directory:

ln -s D:/VibeLLM/vibellm-skills/llm_proxy.py ~/.config/claude-code/skills/

Then Claude can respond to natural language commands like:

  • "list providers"
  • "switch default to anthropic"
  • "I'm rate limited, find the fastest provider"
  • "benchmark and set fastest as default"
  • "test my openai provider"

Configuration

Configuration is stored at ~/.config/vibellm/config.yaml:

default_provider: openai-main
providers:
  - name: openai-main
    base_url: https://api.openai.com/v1
    api_key: sk-xxx
    default_model: gpt-4o
    enabled: true
    priority: 1  # lower = higher priority for failover
    last_latency_ms: null

Selecting Specific Provider

You can select a specific provider per request using the X-LLM-Provider header:

X-LLM-Provider: anthropic
POST /v1/chat/completions

This will bypass the default and use the explicitly requested provider.

Dependencies

  • Python 3.10+
  • fastapi
  • uvicorn
  • httpx
  • click
  • pydantic
  • pydantic-settings
  • pyyaml
  • tabulate

Total of 8 packages, all minimal.

Why this vs litellm-proxy?

litellm-proxy is great for production with many features, but it's heavy and pulls in dozens of dependencies. This project is:

  • For personal use on your local machine
  • Much lighter weight (only 8 core dependencies vs 50+ for litellm)
  • Simpler: just local config file, no database
  • Focused on the specific use case: multiple API keys/providers with failover and latency selection

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibellm-0.1.1.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibellm-0.1.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file vibellm-0.1.1.tar.gz.

File metadata

  • Download URL: vibellm-0.1.1.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.4

File hashes

Hashes for vibellm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 538163ad64cc4f09d86e9899d51914859b0ba6e91f4fadcfef6ca1428a99e6e7
MD5 512c9f82704c11690989a2165e940770
BLAKE2b-256 32e36f1204d8b3461b8b1086e7c717c1236bbc7bc88ca8b0dd0bc19df3c403a5

See more details on using hashes here.

File details

Details for the file vibellm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vibellm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.4

File hashes

Hashes for vibellm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b9bddedc36fd4ec8c9c749d22df2a5bb3f142c921727477e5471d1be528e931
MD5 c8a093499ea0333ba1cb3c5dfb414b34
BLAKE2b-256 17be2405be07aff7a3a99fe882785c7efbb0af23fe0d48f4d62379cc76b0d0c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page