Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

These details have not been verified by PyPI

Project description

edgequake-litellm

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform	Architectures
Linux (glibc)	`x86_64`, `aarch64`
Linux (musl)	`x86_64`, `aarch64`
macOS	`x86_64`, `arm64`
Windows	`x86_64`

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Model Discovery

Programmatic model listing, capability filtering, and name/fuzzy search — backed by the Rust discovery engine:

import edgequake_litellm as litellm

# List providers (unified catalog — includes cohere, nvidia, etc.)
print(litellm.list_providers())

# Filter by capabilities (live discovery)
models = litellm.discovery.find_models(
    requires_vision=True,
    min_context_length=100_000,
    max_output_tokens=32_768,
)

# Offline capability search (no API keys)
static = litellm.discovery.find_static_models(requires_thinking=True)

# Search by name / fuzzy with input & output length bounds
hits = litellm.discovery.search_static_models_by_name(
    "claude sonnet",
    fuzzy=True,
    min_context_length=200_000,
    min_output_tokens=16_384,
)
for hit in hits:
    print(f"{hit.model.provider}/{hit.model.id} score={hit.score:.2f} ({hit.match_kind})")

# Exact lookup by ID or display name
model = litellm.discovery.lookup_model_by_name("openai", "GPT-4.1")

See docs/discovery.md for the full Rust + Python API reference.

Provider Routing

Pass provider/model as the model argument:

Provider	Example
OpenAI	`openai/gpt-4o-mini`
Azure OpenAI	`azure/my-gpt4o-deployment`
Anthropic	`anthropic/claude-3-5-sonnet-20241022`
Gemini	`gemini/gemini-2.5-flash`
Vertex AI	`vertexai/gemini-2.5-flash`
xAI	`xai/grok-4`
OpenRouter	`openrouter/meta-llama/llama-3.1-70b-instruct`
NVIDIA NIM	`nvidia/meta/llama-3.1-8b-instruct`
Mistral	`mistral/mistral-large-latest`
AWS Bedrock	`bedrock/amazon.nova-lite-v1:0`
HuggingFace	`huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct`
OpenAI Compatible	`openai-compatible/deepseek-chat`
Ollama	`ollama/llama3.2`
LM Studio	`lmstudio/local-model`
VSCode Copilot	`vscode-copilot/auto`
Mock	`mock/test-model`

Embedding-only backend:

Provider	Example
Jina	`jina/jina-embeddings-v3`

Supported Features

Provider	Chat	Stream	Tools	Embeddings	Notes
OpenAI	Yes	Yes	Yes	Yes	includes `max_completion_tokens` handling
Azure OpenAI	Yes	Yes	Yes	Yes	deployment-based routing
Anthropic	Yes	Yes	Yes	No	Claude extended thinking surfaced in response metadata
Gemini	Yes	Yes	Yes	Yes	Google AI Studio
Vertex AI	Yes	Yes	Yes	Yes	GCP auth / ADC
xAI	Yes	Yes	Yes	No	Grok
OpenRouter	Yes	Yes	Yes	No	gateway models
NVIDIA NIM	Yes	Yes	Yes	No	OpenAI-compatible hosted NIM
Mistral	Yes	Yes	Yes	Yes	native embeddings
AWS Bedrock	Yes	Yes	Yes	Yes	backed by the Rust Bedrock feature
HuggingFace	Yes	Yes	Limited	No	Inference API
OpenAI Compatible	Yes	Yes	Yes	Yes	Groq, Together, DeepSeek, custom gateways
Ollama	Yes	Yes	Yes	Yes	local runtime
LM Studio	Yes	Yes	Yes	Yes	local OpenAI-compatible server
VSCode Copilot	Yes	Yes	Yes	Yes	direct auth by default, proxy optional
Jina	No	No	No	Yes	embeddings only
Mock	Yes	No	Yes	Yes	unit tests / local development

Application Attribution

Propagate caller identity to upstream providers (OpenAI OpenAI-Project, OpenRouter referer/title, Ollama X-Client-Request-Id, etc.) and OTEL spans.

import edgequake_litellm as eq
from edgequake_litellm import ApplicationContext, get_provider_attribution

# Per-call kwargs
eq.completion(
    "openrouter/anthropic/claude-3.5-sonnet",
    [{"role": "user", "content": "hi"}],
    application_id="my-backend",
    application_name="My Service",
    application_url="https://app.example.com",
    request_id="req-123",
)

# Reusable context
ctx = ApplicationContext(application_id="my-backend", request_id="req-456")
eq.completion("mock/test-model", [{"role": "user", "content": "hi"}], application_context=ctx)

# Catalog: full | passthrough | observability_only | none
assert get_provider_attribution("openai") == "full"
assert get_provider_attribution("ollama") == "passthrough"

Ingress from a web framework: ApplicationContext.from_headers(request.headers).

Defaults from env: EDGEQUAKE_APP_ID, EDGEQUAKE_APP_NAME, EDGEQUAKE_APP_URL, EDGEQUAKE_TENANT_ID.

See migration guide and observability.

Environment Setup

Provider	Required environment
OpenAI	`OPENAI_API_KEY`
Azure OpenAI	`AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_DEPLOYMENT_NAME`
Anthropic	`ANTHROPIC_API_KEY`
Gemini	`GEMINI_API_KEY` or `GOOGLE_API_KEY`
Vertex AI	`GOOGLE_CLOUD_PROJECT` and ADC or `GOOGLE_ACCESS_TOKEN`
xAI	`XAI_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`
NVIDIA NIM	`NVIDIA_API_KEY`
Mistral	`MISTRAL_API_KEY`
AWS Bedrock	standard AWS credential chain plus `AWS_REGION`
HuggingFace	`HF_TOKEN` or `HUGGINGFACE_TOKEN`
OpenAI Compatible	`OPENAI_COMPATIBLE_BASE_URL`, optional `OPENAI_COMPATIBLE_API_KEY`
Ollama	optional `OLLAMA_HOST`
LM Studio	optional `LMSTUDIO_HOST`
VSCode Copilot	optional `VSCODE_COPILOT_PROXY_URL`; otherwise reuse the official VS Code Copilot auth cache
Jina	`JINA_API_KEY`

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

LITELLM_EDGE_PROVIDER
LITELLM_EDGE_MODEL
LITELLM_EDGE_TIMEOUT
LITELLM_EDGE_MAX_RETRIES
LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

completion()
acompletion()
embedding()
aembedding()
stream=True on acompletion()
stream() async generator
response.choices[0].message.content
response.to_dict()
AuthenticationError, RateLimitError, NotFoundError, Timeout
list_providers()
detect_provider()
discovery.discover_all() / adiscover_all()
discovery.find_models() / afind_models()
discovery.find_static_models()
discovery.search_models() / search_static_models_by_name() / asearch_models()
discovery.lookup_model_by_name()
discovery.get_model_info("provider/model")

Behavior notes:

synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
unsupported or extra keyword arguments are dropped for LiteLLM parity
per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...

import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login

resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=9.0.3" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

Rust crate: vX.Y.Z
Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

bump edgequake-litellm/Cargo.toml
bump edgequake-litellm/pyproject.toml
update CHANGELOG.md
push the release-prep commit
wait for python-ci.yml to go green
push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.10.0

Jul 5, 2026

0.9.0

Jul 5, 2026

0.6.12

Apr 25, 2026

0.6.7

Apr 20, 2026

0.4.0

Apr 4, 2026

0.3.0

Apr 4, 2026

0.2.0

Mar 1, 2026

0.1.4

Feb 23, 2026

0.1.3

Feb 22, 2026

0.1.2

Feb 21, 2026

0.1.1

Feb 20, 2026

0.1.0

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.10.0.tar.gz (1.2 MB view details)

Uploaded Jul 5, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl (7.4 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+Windows x86-64

edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl (9.5 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl (9.3 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (9.1 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl (8.3 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl (8.6 MB view details)

Uploaded Jul 5, 2026 CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.10.0.tar.gz.

File metadata

Download URL: edgequake_litellm-0.10.0.tar.gz
Upload date: Jul 5, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`0b410bfc2757126b0c049754aeafb5a5f3e9af592e297d6e0a60a25a126e7680`
MD5	`7c51750eb1345269d8edf2383c4c3177`
BLAKE2b-256	`6eb89ce374d64bb877af7242e29d7f0e1b3f0fcd54d02ad303515b5ddaf23995`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl
Upload date: Jul 5, 2026
Size: 7.4 MB
Tags: CPython 3.9+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`9e79dadbec4fee123f6dfa6574354c2de7238b98baa9367afa5aea28542ce1f1`
MD5	`a7d25e3255907ec00b196886f352d452`
BLAKE2b-256	`c94ad57d9b40c2174f08afb233999a264e023860eaa1b5334ac54d5175b4e581`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl
Upload date: Jul 5, 2026
Size: 9.5 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm	Hash digest
SHA256	`26da46bf36e92b6287e28bff5737ba6bae0b2160d94eee565e74c4290728b3fa`
MD5	`ecf728ec8cf480a96c66dd9b019f8b2b`
BLAKE2b-256	`150ad9eb509f866de6f5e1f9aa84f6b41166119d9da9b9eb428f62c9cd5f1c3b`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl
Upload date: Jul 5, 2026
Size: 9.3 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm	Hash digest
SHA256	`3d411d579b774c3d53b3bc297d355aa6e17075226e6f4361f2344d103e373d6d`
MD5	`0ce8e2f5982ea7260b776b4573a3982a`
BLAKE2b-256	`f75f57c83eb89a962ce7c1a711a4dd5cd8e33a673e313a014c28bca07387cfca`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 5, 2026
Size: 9.2 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`c805bf4b2c5e294e98bc9ac1a88ab4ff8ecac44ed8af46dbdbb078920ca9fd09`
MD5	`8ad62831a0b80da6a6b386d7bfab0e08`
BLAKE2b-256	`c1892502347ed1fa6c67e2b065e9f342d8875aeb37933affbcba9b14cfc19ec5`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Jul 5, 2026
Size: 9.1 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`9de5c2a0031c88d0073dda5d17aeac60aab8fadb0f711901b0f1668626898209`
MD5	`65e3ee74e90ddcf45294a2efe08d1a15`
BLAKE2b-256	`cd0a4d83e9ffb322308c17d082d907e5008e78a2066e96de996c1ce7733c2a7d`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl
Upload date: Jul 5, 2026
Size: 8.3 MB
Tags: CPython 3.9+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`aff9c66d821ccf84028b4b34572fafba8f6578740d20ea16f18b5b20fd957ac6`
MD5	`a7a06b459941b950adf713325ea9055b`
BLAKE2b-256	`ed6c0413d1f1dbb2b1239e149e243b85b21daec7947d50eef7e9bbdc24e64ab6`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl
Upload date: Jul 5, 2026
Size: 8.6 MB
Tags: CPython 3.9+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.10.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`f30722c8ba35060030c599892714298ef7844fb9dab318d9c4bf5fd160018515`
MD5	`2b85b841355f54e107a250741aa54ef5`
BLAKE2b-256	`2574e70b6439eef3eac1a335611b6c91c0ff3b0664c5f4564be5f433794d4a21`

See more details on using hashes here.

edgequake-litellm 0.10.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

edgequake-litellm

Install

Quick Start

Model Discovery

Provider Routing

Supported Features

Application Attribution

Environment Setup

LiteLLM Compatibility

Provider Examples

Development

Release

Changelog

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes