Lightweight local LLM proxy with multiple provider management, automatic failover, privacy protection, and latency-based selection.
Project description
VibeLLM
English | 中文
Lightweight local LLM proxy with multiple provider management, privacy protection, and automatic failover for personal use.
Features
- ✅ Lightweight: Only ~8MB install size (vs 100MB+ for litellm-proxy)
- ✅ Privacy Protection: Automatically detect PII (personal identifiable information), route simple PII to local LLM, anonymize complex PII for remote LLM and restore automatically
- ✅ Dual endpoints: Provides both OpenAI-compatible (
/v1/chat/completions) and Anthropic-compatible (/v1/messages) localhost endpoints - ✅ Multiple provider management: Add/remove/enable/disable providers with CLI
- ✅ Support local LLMs: Native support for Ollama, llama.cpp, and any OpenAI-compatible local servers
- ✅ Automatic failover: When you hit rate limit, automatically try the next provider
- ✅ Latency benchmarking: Test which provider is fastest and auto-select
- ✅ Format translation: A client configured for OpenAI can call Anthropic/Gemini, and vice versa
- ✅ Claude Code skill integration: Claude can manage providers for you
Supported Providers
| Incoming \ Target | OpenAI | Anthropic | Gemini | Local (OpenAI-compatible) |
|---|---|---|---|---|
| OpenAI | ✅ Direct | ✅ Translate | ✅ Translate | ✅ Direct |
| Anthropic | ✅ Translate | ✅ Direct | ✅ Translate | ✅ Translate |
Installation
Install from PyPI (recommended)
pip install vibellm
Install from source
git clone https://github.com/easyhealth/VibeLLM.git
cd VibeLLM
pip install -e .
Quick Start
- Add your first provider:
llm-proxy add \
--name openai \
--base-url https://api.openai.com/v1 \
--api-key sk-xxx \
--default-model gpt-4o
- Start the server:
llm-proxy start --port 8080
- Configure your client to use:
- OpenAI endpoint:
http://localhost:8080/v1/chat/completions - Anthropic endpoint:
http://localhost:8080/v1/messages
CLI Commands
| Command | Description |
|---|---|
llm-proxy start |
Start the proxy server |
llm-proxy add |
Add a new provider |
llm-proxy remove |
Remove a provider |
llm-proxy list |
List all providers |
llm-proxy enable <name> |
Enable a provider |
llm-proxy disable <name> |
Disable a provider |
llm-proxy default <name> |
Set default provider |
llm-proxy test <name> |
Test connectivity to a provider |
llm-proxy benchmark |
Test latency for all providers |
llm-proxy benchmark --auto-set |
Test and set fastest as default |
llm-proxy status |
Show server status |
Benchmarking
Find your fastest provider:
llm-proxy benchmark --auto-set
This will:
- Test all enabled providers with a simple request
- Measure latency
- Automatically set the fastest as the default
Claude Code Skill Installation
To use as a Claude Code skill, add this to your Claude Code skills directory:
ln -s D:/VibeLLM/vibellm-skills/llm_proxy.py ~/.config/claude-code/skills/
Then Claude can respond to natural language commands like:
- "list providers"
- "switch default to anthropic"
- "I'm rate limited, find the fastest provider"
- "benchmark and set fastest as default"
- "test my openai provider"
Configuration
Configuration is stored at ~/.config/llm-proxy/config.yaml:
default_provider: openai-main
providers:
- name: openai-main
base_url: https://api.openai.com/v1
api_key: sk-xxx
default_model: gpt-4o
enabled: true
priority: 1 # lower = higher priority for failover
last_latency_ms: null
Selecting Specific Provider
You can select a specific provider per request using the X-LLM-Provider header:
X-LLM-Provider: anthropic
POST /v1/chat/completions
This will bypass the default and use the explicitly requested provider.
Dependencies
- Python 3.10+
- fastapi
- uvicorn
- httpx
- click
- pydantic
- pydantic-settings
- pyyaml
- tabulate
Total of 8 packages, all minimal.
Why this vs litellm-proxy?
litellm-proxy is great for production with many features, but it's heavy and pulls in dozens of dependencies. This project is:
- For personal use on your local machine
- Much lighter weight (only 8 core dependencies vs 50+ for litellm)
- Simpler: just local config file, no database
- Focused on the specific use case: multiple API keys/providers with failover and latency selection
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vibellm-0.1.0.tar.gz.
File metadata
- Download URL: vibellm-0.1.0.tar.gz
- Upload date:
- Size: 23.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bad412e14330c4aa41c18382099f717ff9d594adeea7c7f52fab7ba48d3ba3e4
|
|
| MD5 |
4a504b37f2b273e40ee7282cf492e833
|
|
| BLAKE2b-256 |
6afd090b95c0623ea2ffa9e7ab27186335258cd9025c9028c174a204a6a6945a
|
File details
Details for the file vibellm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vibellm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11501be6559769a7e7b94035dba193dcb8491dd9651be6b24dcb4fbfb0e704db
|
|
| MD5 |
c47208ca9e99e345d9e34a69aa38f29b
|
|
| BLAKE2b-256 |
0fda0ae6842b1f7c83bd1079051c50a98875822b0f9233be98b2aedbf145036d
|