Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency
Project description
edgequake-litellm
edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.
# Before
import litellm
# After
import edgequake_litellm as litellm
Install
pip install edgequake-litellm
Supported wheel targets:
| Platform | Architectures |
|---|---|
| Linux (glibc) | x86_64, aarch64 |
| Linux (musl) | x86_64, aarch64 |
| macOS | x86_64, arm64 |
| Windows | x86_64 |
The package uses abi3-py39, so one wheel per platform covers Python 3.9+.
Scope note: this package covers the LiteLLM-compatible chat and embedding API
surface. The Rust crate also ships image-generation providers, but those APIs
are not exposed through edgequake-litellm yet.
Quick Start
import asyncio
import edgequake_litellm as litellm
messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]
# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)
# Async
async def main() -> None:
resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
print(resp.content)
stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
asyncio.run(main())
Embeddings:
import edgequake_litellm as litellm
result = litellm.embedding(
"openai/text-embedding-3-small",
["hello world", "rust is fast"],
)
print(result.data[0].embedding[:3])
print(len(result[0]))
Provider Routing
Pass provider/model as the model argument:
| Provider | Example |
|---|---|
| OpenAI | openai/gpt-4o-mini |
| Azure OpenAI | azure/my-gpt4o-deployment |
| Anthropic | anthropic/claude-3-5-sonnet-20241022 |
| Gemini | gemini/gemini-2.5-flash |
| Vertex AI | vertexai/gemini-2.5-flash |
| xAI | xai/grok-4 |
| OpenRouter | openrouter/meta-llama/llama-3.1-70b-instruct |
| NVIDIA NIM | nvidia/meta/llama-3.1-8b-instruct |
| Mistral | mistral/mistral-large-latest |
| AWS Bedrock | bedrock/amazon.nova-lite-v1:0 |
| HuggingFace | huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct |
| OpenAI Compatible | openai-compatible/deepseek-chat |
| Ollama | ollama/llama3.2 |
| LM Studio | lmstudio/local-model |
| VSCode Copilot | vscode-copilot/auto |
| Mock | mock/test-model |
Embedding-only backend:
| Provider | Example |
|---|---|
| Jina | jina/jina-embeddings-v3 |
Supported Features
| Provider | Chat | Stream | Tools | Embeddings | Notes |
|---|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | Yes | includes max_completion_tokens handling |
| Azure OpenAI | Yes | Yes | Yes | Yes | deployment-based routing |
| Anthropic | Yes | Yes | Yes | No | Claude extended thinking surfaced in response metadata |
| Gemini | Yes | Yes | Yes | Yes | Google AI Studio |
| Vertex AI | Yes | Yes | Yes | Yes | GCP auth / ADC |
| xAI | Yes | Yes | Yes | No | Grok |
| OpenRouter | Yes | Yes | Yes | No | gateway models |
| NVIDIA NIM | Yes | Yes | Yes | No | OpenAI-compatible hosted NIM |
| Mistral | Yes | Yes | Yes | Yes | native embeddings |
| AWS Bedrock | Yes | Yes | Yes | Yes | backed by the Rust Bedrock feature |
| HuggingFace | Yes | Yes | Limited | No | Inference API |
| OpenAI Compatible | Yes | Yes | Yes | Yes | Groq, Together, DeepSeek, custom gateways |
| Ollama | Yes | Yes | Yes | Yes | local runtime |
| LM Studio | Yes | Yes | Yes | Yes | local OpenAI-compatible server |
| VSCode Copilot | Yes | Yes | Yes | Yes | direct auth by default, proxy optional |
| Jina | No | No | No | Yes | embeddings only |
| Mock | Yes | No | Yes | Yes | unit tests / local development |
Environment Setup
| Provider | Required environment |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Azure OpenAI | AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME |
| Anthropic | ANTHROPIC_API_KEY |
| Gemini | GEMINI_API_KEY or GOOGLE_API_KEY |
| Vertex AI | GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN |
| xAI | XAI_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| NVIDIA NIM | NVIDIA_API_KEY |
| Mistral | MISTRAL_API_KEY |
| AWS Bedrock | standard AWS credential chain plus AWS_REGION |
| HuggingFace | HF_TOKEN or HUGGINGFACE_TOKEN |
| OpenAI Compatible | OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY |
| Ollama | optional OLLAMA_HOST |
| LM Studio | optional LMSTUDIO_HOST |
| VSCode Copilot | optional VSCODE_COPILOT_PROXY_URL; otherwise reuse the official VS Code Copilot auth cache |
| Jina | JINA_API_KEY |
Module defaults:
import edgequake_litellm as litellm
litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")
Environment defaults:
LITELLM_EDGE_PROVIDERLITELLM_EDGE_MODELLITELLM_EDGE_TIMEOUTLITELLM_EDGE_MAX_RETRIESLITELLM_EDGE_VERBOSE
LiteLLM Compatibility
Implemented:
completion()acompletion()embedding()aembedding()stream=Trueonacompletion()stream()async generatorresponse.choices[0].message.contentresponse.to_dict()AuthenticationError,RateLimitError,NotFoundError,Timeout- module globals
set_verboseanddrop_params
Behavior notes:
- synchronous streaming is intentionally not supported; use
acompletion(..., stream=True)orstream() - unsupported or extra keyword arguments are dropped for LiteLLM parity
- per-call
api_key,api_base, andtimeoutparameters are accepted at the Python layer but not yet wired into the Rust core for every provider
Provider Examples
OpenAI-compatible custom gateway:
export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm
resp = litellm.completion(
"openai-compatible/llama-3.3-70b-versatile",
[{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)
Vertex AI:
export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
"vertexai/gemini-2.5-flash",
[{"role": "user", "content": "Summarise this design review."}],
)
Jina embeddings:
import edgequake_litellm as litellm
vectors = litellm.embedding(
"jina/jina-embeddings-v3",
["retrieval query", "retrieval document"],
)
print(len(vectors[0]))
Development
git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm
python -m venv .venv
source .venv/bin/activate
pip install "maturin>=1.7" "pytest>=8" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v
pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports
Release
Release tags are separate from the Rust crate:
- Rust crate:
vX.Y.Z - Python package:
py-vX.Y.Z
Publish flow for edgequake-litellm:
- bump
edgequake-litellm/Cargo.toml - bump
edgequake-litellm/pyproject.toml - update
CHANGELOG.md - push the release-prep commit
- wait for
python-ci.ymlto go green - push
py-vX.Y.Z
python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.
Changelog
See CHANGELOG.md for the current release line and published history.
License
Apache-2.0. See ../LICENSE-APACHE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgequake_litellm-0.6.12.tar.gz.
File metadata
- Download URL: edgequake_litellm-0.6.12.tar.gz
- Upload date:
- Size: 962.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a5d8513b9bbe41bad057c53946606b0f14e97368b0c3973ccfe018ca58874d3
|
|
| MD5 |
70095606be56d619e8871a6b759fc1b7
|
|
| BLAKE2b-256 |
8b640a67030ec6791d00ca80830d10fe05cbae80ba904292469773c290734bb4
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 6.8 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a60f1b5cac67ecef3b054750b511eb9ed896cfeafcffc07bb1c6d6270c90638
|
|
| MD5 |
19f8a4a2c7bd87f87e8fb72f49397d4f
|
|
| BLAKE2b-256 |
cb7b8a8f3542f76e0a2924b3f41fd1cee6070701dc7e4f62be7bc68c5e854743
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 8.9 MB
- Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
928e1dd7c682fa1dcb93b927863544ee48a2678272c68e38961adaaa1cf9ed2a
|
|
| MD5 |
0700cfd460da7ecc6f8be5b2fd422b03
|
|
| BLAKE2b-256 |
6a168728a7c4f18b6c85aa498ead46bdc6f1c768f9d84096a750905942841e17
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 8.7 MB
- Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
026b83ff423a956466536c186dad51f336db513c90a3cdb9d1b3e68b7b88b473
|
|
| MD5 |
8f34f4f38beb8fc8e738a6d0aa8988ab
|
|
| BLAKE2b-256 |
76a08f5dea3c2cade91cc160cf33c5ba5d0e1e48c853f937f153d4b0dbefc871
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 8.6 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff53e2de5f132dd0b85f8ab94d9d145aea31a9f4dce389e10cca5cc7634607e6
|
|
| MD5 |
35c1558dbd023cab1f295514b8dda342
|
|
| BLAKE2b-256 |
aae031b76a65c8012562241398433e9394835000a798b126e5712eb245262dfc
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 8.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a6298a4f87e2d81b92cf7817c0f95609a928e71896509e707dd811d2917d10e
|
|
| MD5 |
34766d482787c01db9fcdc856898cd68
|
|
| BLAKE2b-256 |
ef51ebca6ac379f56d4f3cb60d2ee0e75b6c1f7562a06607aa4cb8e2a25eaad0
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 7.8 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
846265461a09119da8f13b28fc22b9c45a99178acedc3cb6b43a12b1784547ac
|
|
| MD5 |
2dd4197ec5f7aee0d6bd3e25fa8031ba
|
|
| BLAKE2b-256 |
7516868498b1fb50ca3b3f276f2af739af5a20b67f7de740f6e0a1385628b5e9
|
File details
Details for the file edgequake_litellm-0.6.12-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.6.12-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 8.0 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39eace680d97fb44d9edf3709dcb10453e22b4a79a5c1d04ad230e9a82cda233
|
|
| MD5 |
5e7769bee4ad720f4f3f1157ad8f262b
|
|
| BLAKE2b-256 |
6d27e54fe474f7cab8f43d71eca6278180844c16dd12ef5ea7b39f5e0d6f514b
|