Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

PyPI Python Python CI License

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl) x86_64, aarch64
macOS x86_64, arm64
Windows x86_64

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Model Discovery

Programmatic model listing, capability filtering, and name/fuzzy search — backed by the Rust discovery engine:

import edgequake_litellm as litellm

# List providers (unified catalog — includes cohere, nvidia, etc.)
print(litellm.list_providers())

# Filter by capabilities (live discovery)
models = litellm.discovery.find_models(
    requires_vision=True,
    min_context_length=100_000,
    max_output_tokens=32_768,
)

# Offline capability search (no API keys)
static = litellm.discovery.find_static_models(requires_thinking=True)

# Search by name / fuzzy with input & output length bounds
hits = litellm.discovery.search_static_models_by_name(
    "claude sonnet",
    fuzzy=True,
    min_context_length=200_000,
    min_output_tokens=16_384,
)
for hit in hits:
    print(f"{hit.model.provider}/{hit.model.id} score={hit.score:.2f} ({hit.match_kind})")

# Exact lookup by ID or display name
model = litellm.discovery.lookup_model_by_name("openai", "GPT-4.1")

See docs/discovery.md for the full Rust + Python API reference.

Provider Routing

Pass provider/model as the model argument:

Provider Example
OpenAI openai/gpt-4o-mini
Azure OpenAI azure/my-gpt4o-deployment
Anthropic anthropic/claude-3-5-sonnet-20241022
Gemini gemini/gemini-2.5-flash
Vertex AI vertexai/gemini-2.5-flash
xAI xai/grok-4
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
NVIDIA NIM nvidia/meta/llama-3.1-8b-instruct
Mistral mistral/mistral-large-latest
AWS Bedrock bedrock/amazon.nova-lite-v1:0
HuggingFace huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
OpenAI Compatible openai-compatible/deepseek-chat
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
VSCode Copilot vscode-copilot/auto
Mock mock/test-model

Embedding-only backend:

Provider Example
Jina jina/jina-embeddings-v3

Supported Features

Provider Chat Stream Tools Embeddings Notes
OpenAI Yes Yes Yes Yes includes max_completion_tokens handling
Azure OpenAI Yes Yes Yes Yes deployment-based routing
Anthropic Yes Yes Yes No Claude extended thinking surfaced in response metadata
Gemini Yes Yes Yes Yes Google AI Studio
Vertex AI Yes Yes Yes Yes GCP auth / ADC
xAI Yes Yes Yes No Grok
OpenRouter Yes Yes Yes No gateway models
NVIDIA NIM Yes Yes Yes No OpenAI-compatible hosted NIM
Mistral Yes Yes Yes Yes native embeddings
AWS Bedrock Yes Yes Yes Yes backed by the Rust Bedrock feature
HuggingFace Yes Yes Limited No Inference API
OpenAI Compatible Yes Yes Yes Yes Groq, Together, DeepSeek, custom gateways
Ollama Yes Yes Yes Yes local runtime
LM Studio Yes Yes Yes Yes local OpenAI-compatible server
VSCode Copilot Yes Yes Yes Yes direct auth by default, proxy optional
Jina No No No Yes embeddings only
Mock Yes No Yes Yes unit tests / local development

Application Attribution

Propagate caller identity to upstream providers (OpenAI OpenAI-Project, OpenRouter referer/title, Ollama X-Client-Request-Id, etc.) and OTEL spans.

import edgequake_litellm as eq
from edgequake_litellm import ApplicationContext, get_provider_attribution

# Per-call kwargs
eq.completion(
    "openrouter/anthropic/claude-3.5-sonnet",
    [{"role": "user", "content": "hi"}],
    application_id="my-backend",
    application_name="My Service",
    application_url="https://app.example.com",
    request_id="req-123",
)

# Reusable context
ctx = ApplicationContext(application_id="my-backend", request_id="req-456")
eq.completion("mock/test-model", [{"role": "user", "content": "hi"}], application_context=ctx)

# Catalog: full | passthrough | observability_only | none
assert get_provider_attribution("openai") == "full"
assert get_provider_attribution("ollama") == "passthrough"

Ingress from a web framework: ApplicationContext.from_headers(request.headers).

Defaults from env: EDGEQUAKE_APP_ID, EDGEQUAKE_APP_NAME, EDGEQUAKE_APP_URL, EDGEQUAKE_TENANT_ID.

See migration guide and observability.

Environment Setup

Provider Required environment
OpenAI OPENAI_API_KEY
Azure OpenAI AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY or GOOGLE_API_KEY
Vertex AI GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN
xAI XAI_API_KEY
OpenRouter OPENROUTER_API_KEY
NVIDIA NIM NVIDIA_API_KEY
Mistral MISTRAL_API_KEY
AWS Bedrock standard AWS credential chain plus AWS_REGION
HuggingFace HF_TOKEN or HUGGINGFACE_TOKEN
OpenAI Compatible OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY
Ollama optional OLLAMA_HOST
LM Studio optional LMSTUDIO_HOST
VSCode Copilot optional VSCODE_COPILOT_PROXY_URL; otherwise reuse the official VS Code Copilot auth cache
Jina JINA_API_KEY

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

  • LITELLM_EDGE_PROVIDER
  • LITELLM_EDGE_MODEL
  • LITELLM_EDGE_TIMEOUT
  • LITELLM_EDGE_MAX_RETRIES
  • LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

  • completion()
  • acompletion()
  • embedding()
  • aembedding()
  • stream=True on acompletion()
  • stream() async generator
  • response.choices[0].message.content
  • response.to_dict()
  • AuthenticationError, RateLimitError, NotFoundError, Timeout
  • list_providers()
  • detect_provider()
  • discovery.discover_all() / adiscover_all()
  • discovery.find_models() / afind_models()
  • discovery.find_static_models()
  • discovery.search_models() / search_static_models_by_name() / asearch_models()
  • discovery.lookup_model_by_name()
  • discovery.get_model_info("provider/model")

Behavior notes:

  • synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
  • unsupported or extra keyword arguments are dropped for LiteLLM parity
  • per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=9.0.3" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

  • Rust crate: vX.Y.Z
  • Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

  1. bump edgequake-litellm/Cargo.toml
  2. bump edgequake-litellm/pyproject.toml
  3. update CHANGELOG.md
  4. push the release-prep commit
  5. wait for python-ci.yml to go green
  6. push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.10.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl (9.5 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (9.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl (8.3 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl (8.6 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.10.0.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.10.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0.tar.gz
Algorithm Hash digest
SHA256 0b410bfc2757126b0c049754aeafb5a5f3e9af592e297d6e0a60a25a126e7680
MD5 7c51750eb1345269d8edf2383c4c3177
BLAKE2b-256 6eb89ce374d64bb877af7242e29d7f0e1b3f0fcd54d02ad303515b5ddaf23995

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9e79dadbec4fee123f6dfa6574354c2de7238b98baa9367afa5aea28542ce1f1
MD5 a7d25e3255907ec00b196886f352d452
BLAKE2b-256 c94ad57d9b40c2174f08afb233999a264e023860eaa1b5334ac54d5175b4e581

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 26da46bf36e92b6287e28bff5737ba6bae0b2160d94eee565e74c4290728b3fa
MD5 ecf728ec8cf480a96c66dd9b019f8b2b
BLAKE2b-256 150ad9eb509f866de6f5e1f9aa84f6b41166119d9da9b9eb428f62c9cd5f1c3b

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 3d411d579b774c3d53b3bc297d355aa6e17075226e6f4361f2344d103e373d6d
MD5 0ce8e2f5982ea7260b776b4573a3982a
BLAKE2b-256 f75f57c83eb89a962ce7c1a711a4dd5cd8e33a673e313a014c28bca07387cfca

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c805bf4b2c5e294e98bc9ac1a88ab4ff8ecac44ed8af46dbdbb078920ca9fd09
MD5 8ad62831a0b80da6a6b386d7bfab0e08
BLAKE2b-256 c1892502347ed1fa6c67e2b065e9f342d8875aeb37933affbcba9b14cfc19ec5

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9de5c2a0031c88d0073dda5d17aeac60aab8fadb0f711901b0f1668626898209
MD5 65e3ee74e90ddcf45294a2efe08d1a15
BLAKE2b-256 cd0a4d83e9ffb322308c17d082d907e5008e78a2066e96de996c1ce7733c2a7d

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aff9c66d821ccf84028b4b34572fafba8f6578740d20ea16f18b5b20fd957ac6
MD5 a7a06b459941b950adf713325ea9055b
BLAKE2b-256 ed6c0413d1f1dbb2b1239e149e243b85b21daec7947d50eef7e9bbdc24e64ab6

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f30722c8ba35060030c599892714298ef7844fb9dab318d9c4bf5fd160018515
MD5 2b85b841355f54e107a250741aa54ef5
BLAKE2b-256 2574e70b6439eef3eac1a335611b6c91c0ff3b0664c5f4564be5f433794d4a21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page