Universal LLM interfaces for multi-provider chat and utilities

These details have not been verified by PyPI

Project description

vv-llm

Universal LLM interface layer for Python. One API, 16 backends, sync & async.

pip install vv-llm

Supported Backends

Also supports Azure OpenAI, Vertex AI, and AWS Bedrock deployments.

Quick Start

Configure

from vv_llm.settings import settings

settings.load({
    "VERSION": "2",
    "endpoints": [
        {
            "id": "openai-default",
            "api_base": "https://api.openai.com/v1",
            "api_key": "sk-...",
        }
    ],
    "backends": {
        "openai": {
            "models": {
                "gpt-4o": {
                    "id": "gpt-4o",
                    "endpoints": ["openai-default"],
                }
            }
        }
    }
})

Sync

from vv_llm.chat_clients import create_chat_client, BackendType

client = create_chat_client(BackendType.OpenAI, model="gpt-4o")
resp = client.create_completion([
    {"role": "user", "content": "Explain RAG in one sentence"}
])
print(resp.content)

Streaming

for chunk in client.create_stream([
    {"role": "user", "content": "Write a haiku"}
]):
    if chunk.content:
        print(chunk.content, end="")

Async

import asyncio
from vv_llm.chat_clients import create_async_chat_client, BackendType

async def main():
    client = create_async_chat_client(BackendType.OpenAI, model="gpt-4o")
    resp = await client.create_completion([
        {"role": "user", "content": "hello"}
    ])
    print(resp.content)

asyncio.run(main())

Embedding & Rerank

from vv_llm.settings import settings

settings.load({
    "VERSION": "2",
    "endpoints": [
        {
            "id": "siliconflow",
            "api_base": "https://api.siliconflow.cn/v1",
            "api_key": "sk-...",
        }
    ],
    "backends": {},
    "embedding_backends": {
        "siliconflow": {
            "models": {
                "BAAI/bge-large-zh-v1.5": {
                    "id": "BAAI/bge-large-zh-v1.5",
                    "endpoints": ["siliconflow"],
                    "protocol": "openai_embeddings",
                }
            }
        }
    },
    "rerank_backends": {
        "siliconflow": {
            "models": {
                "BAAI/bge-reranker-v2-m3": {
                    "id": "BAAI/bge-reranker-v2-m3",
                    "endpoints": ["siliconflow"],
                    "protocol": "custom_json_http",
                    "request_mapping": {
                        "method": "POST",
                        "path": "/rerank",
                        "body_template": {
                            "model": "${model_id}",
                            "query": "${query}",
                            "documents": "${documents}",
                        },
                    },
                    "response_mapping": {
                        "results_path": "$.results[*]",
                        "field_map": {
                            "index": "$.index",
                            "relevance_score": "$.relevance_score",
                        },
                    },
                }
            }
        }
    },
})

from vv_llm.embedding_clients import create_embedding_client
from vv_llm.rerank_clients import create_rerank_client

embedding_client = create_embedding_client("siliconflow", model="BAAI/bge-large-zh-v1.5")
embedding_resp = embedding_client.create_embeddings(input="hello world")
print(len(embedding_resp.data[0].embedding))

rerank_client = create_rerank_client("siliconflow", model="BAAI/bge-reranker-v2-m3")
rerank_resp = rerank_client.rerank(
    query="Apple",
    documents=["apple", "banana", "fruit", "vegetable"],
)
print(rerank_resp.results[0].index, rerank_resp.results[0].relevance_score)

import asyncio
from vv_llm.embedding_clients import create_async_embedding_client
from vv_llm.rerank_clients import create_async_rerank_client

async def main():
    embedding_client = create_async_embedding_client("siliconflow", model="BAAI/bge-large-zh-v1.5")
    rerank_client = create_async_rerank_client("siliconflow", model="BAAI/bge-reranker-v2-m3")

    emb = await embedding_client.create_embeddings(input=["a", "b"])
    rr = await rerank_client.rerank(query="Apple", documents=["apple", "banana"])
    print(len(emb.data), len(rr.results))

asyncio.run(main())

Features

Unified interface — same create_completion / create_stream API across all providers
Embedding & rerank — unified sync/async retrieval clients with normalized outputs
Type-safe factory — create_chat_client(BackendType.X) returns the correct client type
Multi-endpoint — configure multiple endpoints per backend with random selection and failover
Tool calling — normalized tool/function calling across providers
Multimodal — text + image inputs where supported
Thinking/reasoning — access chain-of-thought from Claude, DeepSeek Reasoner, etc.
Token counting — per-model tokenizers (tiktoken, deepseek-tokenizer, qwen-tokenizer)
Rate limiting — RPM/TPM controls with memory, Redis, or DiskCache backends
Context length control — automatic message truncation to fit model limits
Prompt caching — Anthropic prompt caching support
Retry with backoff — configurable retry logic for transient failures

Utilities

from vv_llm.chat_clients import format_messages, get_token_counts, get_message_token_counts

Function	Description
`format_messages`	Normalize multimodal/tool messages across formats
`get_token_counts`	Count tokens for a text string
`get_message_token_counts`	Count tokens for a message list

Optional Dependencies

pip install 'vv-llm[redis]'      # Redis rate limiting
pip install 'vv-llm[diskcache]'  # DiskCache rate limiting
pip install 'vv-llm[server]'     # FastAPI token server
pip install 'vv-llm[vertex]'     # Google Vertex AI
pip install 'vv-llm[bedrock]'    # AWS Bedrock

Project Structure

src/vv_llm/
  chat_clients/    # Per-backend clients + factory
  embedding_clients/  # Embedding clients + factory
  rerank_clients/     # Rerank clients + factory
  retrieval_clients/  # Shared retrieval client internals
  settings/        # Configuration management
  types/           # Type definitions & enums
  utilities/       # Rate limiting, retry, media processing, token counting
  server/          # Optional token counting server

tests/unit/        # Unit tests
tests/live/        # Live integration tests (requires real API keys)

Development

pdm install -d          # Install dev dependencies
pdm run lint            # Ruff linter
pdm run format-check    # Ruff format check
pdm run type-check      # Ty type checker
pdm run test            # Unit tests
pdm run test-live       # Live tests (needs real endpoints)

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.95

May 6, 2026

0.3.94

Apr 24, 2026

0.3.93

Apr 21, 2026

0.3.92

Apr 18, 2026

0.3.91

Apr 16, 2026

0.3.90

Apr 8, 2026

0.3.89

Apr 8, 2026

0.3.88

Apr 5, 2026

0.3.87

Apr 2, 2026

0.3.86

Mar 20, 2026

0.3.85

Mar 18, 2026

0.3.84

Mar 18, 2026

0.3.83

Mar 17, 2026

0.3.82

Mar 14, 2026

0.3.81

Mar 14, 2026

0.3.80

Mar 12, 2026

0.3.79

Mar 11, 2026

0.3.78

Mar 11, 2026

0.3.77

Mar 10, 2026

0.3.76

Mar 3, 2026

0.3.75

Feb 23, 2026

0.3.74

Feb 22, 2026

0.3.73

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vv_llm-0.3.95.tar.gz (72.9 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vv_llm-0.3.95-py3-none-any.whl (88.0 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file vv_llm-0.3.95.tar.gz.

File metadata

Download URL: vv_llm-0.3.95.tar.gz
Upload date: May 6, 2026
Size: 72.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vv_llm-0.3.95.tar.gz
Algorithm	Hash digest
SHA256	`effb19607c428ed16b0415436d213f8d595b71bded19154f484ed3c8e52de045`
MD5	`0ab4d5b48bda039ba09e75940755fb94`
BLAKE2b-256	`a4dd5366da30a4e5429edc9fb9d73f4543384853adb483f2e111322b88947cee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vv_llm-0.3.95.tar.gz:

Publisher: release.yml on AndersonBY/vv-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vv_llm-0.3.95.tar.gz
- Subject digest: effb19607c428ed16b0415436d213f8d595b71bded19154f484ed3c8e52de045
- Sigstore transparency entry: 1446218763
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: AndersonBY/vv-llm@8c1c43e3c974b85084eefe4b8ae29af440b557dd
- Branch / Tag: refs/tags/v0.3.95
- Owner: https://github.com/AndersonBY
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c1c43e3c974b85084eefe4b8ae29af440b557dd
- Trigger Event: push

File details

Details for the file vv_llm-0.3.95-py3-none-any.whl.

File metadata

Download URL: vv_llm-0.3.95-py3-none-any.whl
Upload date: May 6, 2026
Size: 88.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vv_llm-0.3.95-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28c56fd769ed7291eeedc8fb38581c5c9199d9ecc43442fb6eb813f3849a55ef`
MD5	`e826838c53fa9e6c54e5b0ef2fd4fa50`
BLAKE2b-256	`67c49ffa97b6c3e1a1ba344749323498a84b60649fc02157eee33e0ace9879e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vv_llm-0.3.95-py3-none-any.whl:

Publisher: release.yml on AndersonBY/vv-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vv_llm-0.3.95-py3-none-any.whl
- Subject digest: 28c56fd769ed7291eeedc8fb38581c5c9199d9ecc43442fb6eb813f3849a55ef
- Sigstore transparency entry: 1446218852
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: AndersonBY/vv-llm@8c1c43e3c974b85084eefe4b8ae29af440b557dd
- Branch / Tag: refs/tags/v0.3.95
- Owner: https://github.com/AndersonBY
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c1c43e3c974b85084eefe4b8ae29af440b557dd
- Trigger Event: push

vv-llm 0.3.95

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

vv-llm

Supported Backends

Quick Start

Configure

Sync

Streaming

Async

Embedding & Rerank

Features

Utilities

Optional Dependencies

Project Structure

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance