Universal LLM interfaces for multi-provider chat and utilities
Project description
vv-llm
Universal LLM interface layer for Python. One API, 16 backends, sync & async.
pip install vv-llm
Supported Backends
OpenAI | Anthropic | DeepSeek | Gemini | Qwen | Groq | Mistral | Moonshot | MiniMax | Yi | ZhiPuAI | Baichuan | StepFun | xAI | Ernie | Local
Also supports Azure OpenAI, Vertex AI, and AWS Bedrock deployments.
Quick Start
Configure
from vv_llm.settings import settings
settings.load({
"VERSION": "2",
"endpoints": [
{
"id": "openai-default",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-...",
}
],
"backends": {
"openai": {
"models": {
"gpt-4o": {
"id": "gpt-4o",
"endpoints": ["openai-default"],
}
}
}
}
})
Sync
from vv_llm.chat_clients import create_chat_client, BackendType
client = create_chat_client(BackendType.OpenAI, model="gpt-4o")
resp = client.create_completion([
{"role": "user", "content": "Explain RAG in one sentence"}
])
print(resp.content)
Streaming
for chunk in client.create_stream([
{"role": "user", "content": "Write a haiku"}
]):
if chunk.content:
print(chunk.content, end="")
Async
import asyncio
from vv_llm.chat_clients import create_async_chat_client, BackendType
async def main():
client = create_async_chat_client(BackendType.OpenAI, model="gpt-4o")
resp = await client.create_completion([
{"role": "user", "content": "hello"}
])
print(resp.content)
asyncio.run(main())
Embedding & Rerank
from vv_llm.settings import settings
settings.load({
"VERSION": "2",
"endpoints": [
{
"id": "siliconflow",
"api_base": "https://api.siliconflow.cn/v1",
"api_key": "sk-...",
}
],
"backends": {},
"embedding_backends": {
"siliconflow": {
"models": {
"BAAI/bge-large-zh-v1.5": {
"id": "BAAI/bge-large-zh-v1.5",
"endpoints": ["siliconflow"],
"protocol": "openai_embeddings",
}
}
}
},
"rerank_backends": {
"siliconflow": {
"models": {
"BAAI/bge-reranker-v2-m3": {
"id": "BAAI/bge-reranker-v2-m3",
"endpoints": ["siliconflow"],
"protocol": "custom_json_http",
"request_mapping": {
"method": "POST",
"path": "/rerank",
"body_template": {
"model": "${model_id}",
"query": "${query}",
"documents": "${documents}",
},
},
"response_mapping": {
"results_path": "$.results[*]",
"field_map": {
"index": "$.index",
"relevance_score": "$.relevance_score",
},
},
}
}
}
},
})
from vv_llm.embedding_clients import create_embedding_client
from vv_llm.rerank_clients import create_rerank_client
embedding_client = create_embedding_client("siliconflow", model="BAAI/bge-large-zh-v1.5")
embedding_resp = embedding_client.create_embeddings(input="hello world")
print(len(embedding_resp.data[0].embedding))
rerank_client = create_rerank_client("siliconflow", model="BAAI/bge-reranker-v2-m3")
rerank_resp = rerank_client.rerank(
query="Apple",
documents=["apple", "banana", "fruit", "vegetable"],
)
print(rerank_resp.results[0].index, rerank_resp.results[0].relevance_score)
import asyncio
from vv_llm.embedding_clients import create_async_embedding_client
from vv_llm.rerank_clients import create_async_rerank_client
async def main():
embedding_client = create_async_embedding_client("siliconflow", model="BAAI/bge-large-zh-v1.5")
rerank_client = create_async_rerank_client("siliconflow", model="BAAI/bge-reranker-v2-m3")
emb = await embedding_client.create_embeddings(input=["a", "b"])
rr = await rerank_client.rerank(query="Apple", documents=["apple", "banana"])
print(len(emb.data), len(rr.results))
asyncio.run(main())
Features
- Unified interface — same
create_completion/create_streamAPI across all providers - Embedding & rerank — unified sync/async retrieval clients with normalized outputs
- Type-safe factory —
create_chat_client(BackendType.X)returns the correct client type - Multi-endpoint — configure multiple endpoints per backend with random selection and failover
- Tool calling — normalized tool/function calling across providers
- Multimodal — text + image inputs where supported
- Thinking/reasoning — access chain-of-thought from Claude, DeepSeek Reasoner, etc.
- Token counting — per-model tokenizers (tiktoken, deepseek-tokenizer, qwen-tokenizer)
- Rate limiting — RPM/TPM controls with memory, Redis, or DiskCache backends
- Context length control — automatic message truncation to fit model limits
- Prompt caching — Anthropic prompt caching support
- Retry with backoff — configurable retry logic for transient failures
Utilities
from vv_llm.chat_clients import format_messages, get_token_counts, get_message_token_counts
| Function | Description |
|---|---|
format_messages |
Normalize multimodal/tool messages across formats |
get_token_counts |
Count tokens for a text string |
get_message_token_counts |
Count tokens for a message list |
Optional Dependencies
pip install 'vv-llm[redis]' # Redis rate limiting
pip install 'vv-llm[diskcache]' # DiskCache rate limiting
pip install 'vv-llm[server]' # FastAPI token server
pip install 'vv-llm[vertex]' # Google Vertex AI
pip install 'vv-llm[bedrock]' # AWS Bedrock
Project Structure
src/vv_llm/
chat_clients/ # Per-backend clients + factory
embedding_clients/ # Embedding clients + factory
rerank_clients/ # Rerank clients + factory
retrieval_clients/ # Shared retrieval client internals
settings/ # Configuration management
types/ # Type definitions & enums
utilities/ # Rate limiting, retry, media processing, token counting
server/ # Optional token counting server
tests/unit/ # Unit tests
tests/live/ # Live integration tests (requires real API keys)
Development
pdm install -d # Install dev dependencies
pdm run lint # Ruff linter
pdm run format-check # Ruff format check
pdm run type-check # Ty type checker
pdm run test # Unit tests
pdm run test-live # Live tests (needs real endpoints)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vv_llm-0.3.95.tar.gz.
File metadata
- Download URL: vv_llm-0.3.95.tar.gz
- Upload date:
- Size: 72.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
effb19607c428ed16b0415436d213f8d595b71bded19154f484ed3c8e52de045
|
|
| MD5 |
0ab4d5b48bda039ba09e75940755fb94
|
|
| BLAKE2b-256 |
a4dd5366da30a4e5429edc9fb9d73f4543384853adb483f2e111322b88947cee
|
Provenance
The following attestation bundles were made for vv_llm-0.3.95.tar.gz:
Publisher:
release.yml on AndersonBY/vv-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vv_llm-0.3.95.tar.gz -
Subject digest:
effb19607c428ed16b0415436d213f8d595b71bded19154f484ed3c8e52de045 - Sigstore transparency entry: 1446218763
- Sigstore integration time:
-
Permalink:
AndersonBY/vv-llm@8c1c43e3c974b85084eefe4b8ae29af440b557dd -
Branch / Tag:
refs/tags/v0.3.95 - Owner: https://github.com/AndersonBY
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8c1c43e3c974b85084eefe4b8ae29af440b557dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file vv_llm-0.3.95-py3-none-any.whl.
File metadata
- Download URL: vv_llm-0.3.95-py3-none-any.whl
- Upload date:
- Size: 88.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28c56fd769ed7291eeedc8fb38581c5c9199d9ecc43442fb6eb813f3849a55ef
|
|
| MD5 |
e826838c53fa9e6c54e5b0ef2fd4fa50
|
|
| BLAKE2b-256 |
67c49ffa97b6c3e1a1ba344749323498a84b60649fc02157eee33e0ace9879e8
|
Provenance
The following attestation bundles were made for vv_llm-0.3.95-py3-none-any.whl:
Publisher:
release.yml on AndersonBY/vv-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vv_llm-0.3.95-py3-none-any.whl -
Subject digest:
28c56fd769ed7291eeedc8fb38581c5c9199d9ecc43442fb6eb813f3849a55ef - Sigstore transparency entry: 1446218852
- Sigstore integration time:
-
Permalink:
AndersonBY/vv-llm@8c1c43e3c974b85084eefe4b8ae29af440b557dd -
Branch / Tag:
refs/tags/v0.3.95 - Owner: https://github.com/AndersonBY
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8c1c43e3c974b85084eefe4b8ae29af440b557dd -
Trigger Event:
push
-
Statement type: