Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency
Project description
edgequake-litellm
Drop-in LiteLLM replacement backed by Rust — same API, lower overhead.
edgequake-litellm wraps the edgequake-llm Rust core via PyO3, providing a high-performance drop-in for LiteLLM. Swap the import — the rest of your code stays unchanged.
# Before
import litellm
# After — same API, Rust-backed
import edgequake_litellm as litellm
Features
- LiteLLM-compatible API —
completion(),acompletion(),stream(),embedding(), same call signatures, same response shape (resp.choices[0].message.content). - Multi-provider routing — OpenAI, Anthropic, Gemini, Mistral, OpenRouter, xAI, Ollama, LM Studio, HuggingFace, and more, via
provider/modelstrings. - Async-native — built on Tokio; sync and async Python both supported.
- Single wheel per platform — uses PyO3's
abi3-py39stable ABI, one.whlcovers Python 3.9–3.13+. - Zero Python runtime dependencies — the Rust extension is self-contained.
- Full type annotations — ships with
py.typedand.pyistubs.
Installation
pip install edgequake-litellm
Quick Start
import edgequake_litellm as litellm # drop-in import alias
# ── Synchronous chat ────────────────────────────────────────────────────────
resp = litellm.completion(
"openai/gpt-4o-mini",
[{"role": "user", "content": "Hello, world!"}],
)
# litellm-compatible access
print(resp.choices[0].message.content)
# convenience shortcut
print(resp.content)
# ── Asynchronous chat ───────────────────────────────────────────────────────
import asyncio
async def main():
resp = await litellm.acompletion(
"anthropic/claude-3-5-haiku-20241022",
[{"role": "user", "content": "Tell me a joke."}],
max_tokens=128,
temperature=0.8,
)
print(resp.choices[0].message.content)
asyncio.run(main())
# ── Streaming (async generator) ─────────────────────────────────────────────
async def stream_example():
messages = [{"role": "user", "content": "Count to five."}]
async for chunk in litellm.acompletion("openai/gpt-4o", messages, stream=True):
print(chunk.choices[0].delta.content or "", end="", flush=True)
# ── Embeddings ──────────────────────────────────────────────────────────────
result = litellm.embedding(
"openai/text-embedding-3-small",
["Hello world", "Rust is fast"],
)
# litellm-compatible access
print(result.data[0].embedding[:3])
# legacy list access still works
print(len(result), len(result[0])) # 2 1536
Provider Routing
Pass provider/model as the first argument — the prefix selects the provider:
| Provider | Example model string |
|---|---|
| OpenAI | openai/gpt-4o |
| Anthropic | anthropic/claude-3-5-sonnet-20241022 |
| Google Gemini | gemini/gemini-2.0-flash |
| Mistral | mistral/mistral-large-latest |
| OpenRouter | openrouter/meta-llama/llama-3.1-70b-instruct |
| xAI | xai/grok-3-beta |
| Ollama | ollama/llama3.2 |
| LM Studio | lmstudio/local-model |
| HuggingFace | huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1 |
| Mock (tests) | mock/any-name |
API Reference
completion(model, messages, **kwargs) → ModelResponseCompat
Synchronous chat completion. Blocks but releases the GIL during Rust I/O so other Python threads keep running.
resp = litellm.completion(
"openai/gpt-4o",
messages,
max_tokens=256,
temperature=0.7,
system="You are a helpful assistant.",
max_completion_tokens=256, # alias for max_tokens
seed=42,
response_format={"type": "json_object"}, # or "text" / "json_object"
)
# All of these access the same content:
resp.choices[0].message.content # litellm path
resp.content # shortcut
resp["choices"][0]["message"]["content"] # dict-style
resp.usage.total_tokens
resp.model
resp.response_ms # latency in milliseconds
resp.to_dict() # plain dict
acompletion(model, messages, stream=False, **kwargs)
Async chat completion. Returns ModelResponseCompat or (if stream=True) AsyncGenerator[StreamChunkCompat, None].
# Non-streaming
resp = await litellm.acompletion("openai/gpt-4o", messages)
# Streaming
async for chunk in await litellm.acompletion("openai/gpt-4o", messages, stream=True):
print(chunk.choices[0].delta.content or "", end="")
stream(model, messages, **kwargs) → AsyncGenerator[StreamChunk, None]
Low-level streaming. Raw StreamChunk objects:
async for chunk in litellm.stream("openai/gpt-4o", messages):
if chunk.content:
print(chunk.content, end="")
elif chunk.is_finished:
print(f"\n[stop: {chunk.finish_reason}]")
embedding(model, input, **kwargs) → EmbeddingResponseCompat
Synchronous embeddings. Returns an EmbeddingResponseCompat that supports both litellm-style and legacy list-style access:
result = litellm.embedding("openai/text-embedding-3-small", ["foo", "bar"])
# litellm path
result.data[0].embedding
# backwards-compatible list access
for vec in result: # iterates List[float]
print(len(vec))
result[0] # List[float]
len(result) # number of vectors
aembedding(model, input, **kwargs) → EmbeddingResponseCompat
Async embeddings — same return type as embedding().
stream_chunk_builder(chunks, messages=None) → ModelResponseCompat
Reconstruct a full ModelResponseCompat from a collected list of streaming chunks:
from edgequake_litellm import stream_chunk_builder
chunks = []
async for chunk in litellm.stream("openai/gpt-4o", messages):
chunks.append(chunk)
full = stream_chunk_builder(chunks, messages=messages)
print(full.content)
Configuration
Module-level globals mirror litellm:
import edgequake_litellm as litellm
litellm.set_verbose = True # enable debug logging
litellm.drop_params = True # drop unknown params (always True)
# Set default provider / model
litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")
# Now the provider prefix can be omitted:
resp = litellm.completion("claude-3-5-haiku-20241022", messages)
Exception Hierarchy
Exceptions mirror LiteLLM for painless migration:
import edgequake_litellm as litellm
try:
resp = litellm.completion("openai/gpt-4o", messages)
except litellm.AuthenticationError as e:
print(f"Check your API key: {e}")
except litellm.RateLimitError:
time.sleep(5)
except litellm.ContextWindowExceededError:
# trim messages and retry
pass
except litellm.NotFoundError: # alias for ModelNotFoundError
pass
except litellm.APIConnectionError:
pass
All exceptions (AuthenticationError, RateLimitError, ContextWindowExceededError, ModelNotFoundError, Timeout, APIConnectionError, APIError) are also available from edgequake_litellm.exceptions.
Environment Variables
Provider credentials follow the standard naming convention:
| Provider | Environment variable |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Gemini | GEMINI_API_KEY |
| Mistral | MISTRAL_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| xAI | XAI_API_KEY |
| HuggingFace | HF_TOKEN |
| Ollama | OLLAMA_HOST (default: http://localhost:11434) |
| LM Studio | LMSTUDIO_HOST (default: http://localhost:1234) |
Defaults can also be set via LITELLM_EDGE_PROVIDER / LITELLM_EDGE_MODEL.
Development
Prerequisites
- Rust ≥ 1.83 (
rustup toolchain install stable) - Python ≥ 3.9
pip install maturin
Build from source
git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install maturin pytest pytest-asyncio ruff mypy
# Build & install in dev mode (incremental Rust + Python)
maturin develop --release
# Run unit tests (mock provider — no API keys needed)
pytest tests/ -k "not e2e" -v
Running E2E tests
export OPENAI_API_KEY=sk-...
pytest tests/test_e2e_openai.py -v
Publishing
# Bump version in pyproject.toml AND Cargo.toml (must match), then:
git tag py-v0.2.0
git push --tags
# GitHub Actions builds and publishes to PyPI automatically.
License
Apache-2.0 — see LICENSE-APACHE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgequake_litellm-0.1.0.tar.gz.
File metadata
- Download URL: edgequake_litellm-0.1.0.tar.gz
- Upload date:
- Size: 699.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9269f0e0a74d712a711b2fe2019ac9006e0bf9663b8f7645ffe763b508bf9bec
|
|
| MD5 |
bb4f47dee2c50b3ab5fa7b95ec9f22c7
|
|
| BLAKE2b-256 |
9acd0675e041af6ac6ce88060d4de364580e37b5ee524ee49e399e0a2ede069c
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc721152943d5697c7747e930cebf04c0b05701d372f6cb039be8a92d9e0d35c
|
|
| MD5 |
596c72a0d2a7fd4e71093f632025f2c3
|
|
| BLAKE2b-256 |
e571d447e09bcd8887a9e6348e9913202ff524959446808f49f7c622226d269d
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81edc0fe7f250c2756ed2202e0c8cbe87c47e895385d4dff1191ad43868deaae
|
|
| MD5 |
6f06a50f6f365544c9a1a2dc537342f2
|
|
| BLAKE2b-256 |
f8111ae69751ecc776ebe098d37629e43178bf71578e2f455e423d3fcd106043
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 3.7 MB
- Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e99b38d752ba93c9be2b52c98a07c891a6f62b6320c0d54526ceda48492a7ec
|
|
| MD5 |
ea3bdc9702c754d4b18ce4593be5d8d6
|
|
| BLAKE2b-256 |
3098931e8cdca5957716601687713d2067a798987aa9bfb62fd2bb46e888e55c
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.6 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0733c5c16eb62a8bf1950d317be787fe9007a3832587e717c7e7a077aff6035d
|
|
| MD5 |
b8e1cfa025cc768883aef4d20972006e
|
|
| BLAKE2b-256 |
326d6a9aaf2f9de0fe815bc6012d248d35384ec3d003d617edeca9b0aa22107a
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 3.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc9e397c365f0dae694b56dd7b96f27d81b41756baf10460ef6b023ee397307f
|
|
| MD5 |
7a880f9c98e66d1f590129565246bba0
|
|
| BLAKE2b-256 |
db1dedd003c2d8aac46959a0c31c8564e788d105385ea67b1658a0c63159dac8
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e012e0a7c2a481a068441302876f579849d995529c54121929dfb67590111b
|
|
| MD5 |
4aeb588de68ebc6f4697b17bae072f29
|
|
| BLAKE2b-256 |
63007446b7391adaa8718dbb557700a7368f15a29fd28719af7842ec2e814baf
|
File details
Details for the file edgequake_litellm-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: edgequake_litellm-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98caa5e2ef71040a37eb2e7bdb6d7f1b85b0dcaaec1793ae4b8b7ac48ea62bd6
|
|
| MD5 |
c08761ec91e5bd49545cf0761fe2f84d
|
|
| BLAKE2b-256 |
169b6b6a7daf1b5bbe8c60274c4c4c181d7d0f101cb0782f628ad5eeba223b84
|