Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

PyPI Python Python CI License

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl) x86_64, aarch64
macOS x86_64, arm64
Windows x86_64

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Model Discovery

Programmatic model listing, capability filtering, and name/fuzzy search — backed by the Rust discovery engine:

import edgequake_litellm as litellm

# List providers (unified catalog — includes cohere, nvidia, etc.)
print(litellm.list_providers())

# Filter by capabilities (live discovery)
models = litellm.discovery.find_models(
    requires_vision=True,
    min_context_length=100_000,
    max_output_tokens=32_768,
)

# Offline capability search (no API keys)
static = litellm.discovery.find_static_models(requires_thinking=True)

# Search by name / fuzzy with input & output length bounds
hits = litellm.discovery.search_static_models_by_name(
    "claude sonnet",
    fuzzy=True,
    min_context_length=200_000,
    min_output_tokens=16_384,
)
for hit in hits:
    print(f"{hit.model.provider}/{hit.model.id} score={hit.score:.2f} ({hit.match_kind})")

# Exact lookup by ID or display name
model = litellm.discovery.lookup_model_by_name("openai", "GPT-4.1")

See docs/discovery.md for the full Rust + Python API reference.

Provider Routing

Pass provider/model as the model argument:

Provider Example
OpenAI openai/gpt-4o-mini
Azure OpenAI azure/my-gpt4o-deployment
Anthropic anthropic/claude-3-5-sonnet-20241022
Gemini gemini/gemini-2.5-flash
Vertex AI vertexai/gemini-2.5-flash
xAI xai/grok-4
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
NVIDIA NIM nvidia/meta/llama-3.1-8b-instruct
Mistral mistral/mistral-large-latest
AWS Bedrock bedrock/amazon.nova-lite-v1:0
HuggingFace huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
OpenAI Compatible openai-compatible/deepseek-chat
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
VSCode Copilot vscode-copilot/auto
Mock mock/test-model

Embedding-only backend:

Provider Example
Jina jina/jina-embeddings-v3

Supported Features

Provider Chat Stream Tools Embeddings Notes
OpenAI Yes Yes Yes Yes includes max_completion_tokens handling
Azure OpenAI Yes Yes Yes Yes deployment-based routing
Anthropic Yes Yes Yes No Claude extended thinking surfaced in response metadata
Gemini Yes Yes Yes Yes Google AI Studio
Vertex AI Yes Yes Yes Yes GCP auth / ADC
xAI Yes Yes Yes No Grok
OpenRouter Yes Yes Yes No gateway models
NVIDIA NIM Yes Yes Yes No OpenAI-compatible hosted NIM
Mistral Yes Yes Yes Yes native embeddings
AWS Bedrock Yes Yes Yes Yes backed by the Rust Bedrock feature
HuggingFace Yes Yes Limited No Inference API
OpenAI Compatible Yes Yes Yes Yes Groq, Together, DeepSeek, custom gateways
Ollama Yes Yes Yes Yes local runtime
LM Studio Yes Yes Yes Yes local OpenAI-compatible server
VSCode Copilot Yes Yes Yes Yes direct auth by default, proxy optional
Jina No No No Yes embeddings only
Mock Yes No Yes Yes unit tests / local development

Environment Setup

Provider Required environment
OpenAI OPENAI_API_KEY
Azure OpenAI AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY or GOOGLE_API_KEY
Vertex AI GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN
xAI XAI_API_KEY
OpenRouter OPENROUTER_API_KEY
NVIDIA NIM NVIDIA_API_KEY
Mistral MISTRAL_API_KEY
AWS Bedrock standard AWS credential chain plus AWS_REGION
HuggingFace HF_TOKEN or HUGGINGFACE_TOKEN
OpenAI Compatible OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY
Ollama optional OLLAMA_HOST
LM Studio optional LMSTUDIO_HOST
VSCode Copilot optional VSCODE_COPILOT_PROXY_URL; otherwise reuse the official VS Code Copilot auth cache
Jina JINA_API_KEY

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

  • LITELLM_EDGE_PROVIDER
  • LITELLM_EDGE_MODEL
  • LITELLM_EDGE_TIMEOUT
  • LITELLM_EDGE_MAX_RETRIES
  • LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

  • completion()
  • acompletion()
  • embedding()
  • aembedding()
  • stream=True on acompletion()
  • stream() async generator
  • response.choices[0].message.content
  • response.to_dict()
  • AuthenticationError, RateLimitError, NotFoundError, Timeout
  • list_providers()
  • detect_provider()
  • discovery.discover_all() / adiscover_all()
  • discovery.find_models() / afind_models()
  • discovery.find_static_models()
  • discovery.search_models() / search_static_models_by_name() / asearch_models()
  • discovery.lookup_model_by_name()
  • discovery.get_model_info("provider/model")

Behavior notes:

  • synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
  • unsupported or extra keyword arguments are dropped for LiteLLM parity
  • per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=8" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

  • Rust crate: vX.Y.Z
  • Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

  1. bump edgequake-litellm/Cargo.toml
  2. bump edgequake-litellm/pyproject.toml
  3. update CHANGELOG.md
  4. push the release-prep commit
  5. wait for python-ci.yml to go green
  6. push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.9.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.9.0-cp39-abi3-win_amd64.whl (7.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_x86_64.whl (9.4 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_aarch64.whl (9.2 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (9.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.9.0-cp39-abi3-macosx_11_0_arm64.whl (8.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.9.0-cp39-abi3-macosx_10_12_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.9.0.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.9.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.9.0.tar.gz
Algorithm Hash digest
SHA256 ff13f592b8a51c87bfda45747b3b8029fd5edd33cd8428083ce450a1153d3a52
MD5 9b9029685cd4bc892461db8c5071d388
BLAKE2b-256 6056ef2cff86689a71e56225c5c616330c707c56fd0efdf25e7be91ee3a68a40

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a6972e0c89a4d3e0e085f8dbc5aa24519188e4fbed241e5360f80f86a4c5b02e
MD5 9041f53e0af0408423457a51a8340f85
BLAKE2b-256 cce4a7c15cc8ce2f6412bbc8e1f7ac9a3dc2bf630e598bb74ee35e05b825fa16

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0e933b8aa36aade46529d90c89c545122bed83c569beadb1cbf069c7a6c3820f
MD5 89205dc384f2650c808bf2d13bdec6ef
BLAKE2b-256 0bfd6316fed71242d5fd84ca0b88170116207f9945fc72575e7c3dd9cc11a7b6

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d74d56c1249ff4b75eded1a6fc53f6e1e6f7edb5a6f890a9e3964fbf4a8b1f15
MD5 4a220ca159aaee5b44618ac87454145e
BLAKE2b-256 8821aae52d4d610fe35e63c1b080c47b1d2529b2a57dd725770fcc5803ce1c94

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 95df8f4c1c1ff0f1f54d780eeaddab37e97f729d7985aa1399e478237c7bd985
MD5 11adabd0dc1b76db124387d5721cb5ef
BLAKE2b-256 9a2a09aec692d008aad523e7a63641425d6d60430b6807ca2000d5c979b0f21f

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 378b0c000e28bdf7437423f7aa1335a3e607cda9f1e09c5de6ff6e94924e26dd
MD5 1dcaf41b4e3bf85aeb74280cb1d5a6f6
BLAKE2b-256 20abde368fbed5e8a208c83dbbe2fbbd1aaad85de84e9c62c489c0bf482a1fda

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 67c8fec7723a8a6c43852a64602cf6715fece1bc544c78de018aae848ab57d2a
MD5 778f34b793efbcb6bc0055a0668fff82
BLAKE2b-256 f3340657804cdaa48fb405dcf049c9dd8f80b23e9697c5b66d0b9f83ecb50c26

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.9.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.9.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 48eb8e377f34dee8124eea9908e7fea150a1f6196c77351b79f84e1a722386de
MD5 917d30f6183cb7dc3a946d5f9fdcf487
BLAKE2b-256 6cede4000d2c0997efb41e834c2137d16d0eff24388ae7b7b7ca1f9fca183515

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page