Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

PyPI Python Python CI License

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl) x86_64, aarch64
macOS x86_64, arm64
Windows x86_64

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Provider Routing

Pass provider/model as the model argument:

Provider Example
OpenAI openai/gpt-4o-mini
Azure OpenAI azure/my-gpt4o-deployment
Anthropic anthropic/claude-3-5-sonnet-20241022
Gemini gemini/gemini-2.5-flash
Vertex AI vertexai/gemini-2.5-flash
xAI xai/grok-4
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
Mistral mistral/mistral-large-latest
AWS Bedrock bedrock/amazon.nova-lite-v1:0
HuggingFace huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
OpenAI Compatible openai-compatible/deepseek-chat
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
VSCode Copilot vscode-copilot/gpt-4o-mini
Mock mock/test-model

Embedding-only backend:

Provider Example
Jina jina/jina-embeddings-v3

Supported Features

Provider Chat Stream Tools Embeddings Notes
OpenAI Yes Yes Yes Yes includes max_completion_tokens handling
Azure OpenAI Yes Yes Yes Yes deployment-based routing
Anthropic Yes Yes Yes No Claude extended thinking surfaced in response metadata
Gemini Yes Yes Yes Yes Google AI Studio
Vertex AI Yes Yes Yes Yes GCP auth / ADC
xAI Yes Yes Yes No Grok
OpenRouter Yes Yes Yes No gateway models
Mistral Yes Yes Yes Yes native embeddings
AWS Bedrock Yes Yes Yes Yes backed by the Rust Bedrock feature
HuggingFace Yes Yes Limited No Inference API
OpenAI Compatible Yes Yes Yes Yes Groq, Together, DeepSeek, custom gateways
Ollama Yes Yes Yes Yes local runtime
LM Studio Yes Yes Yes Yes local OpenAI-compatible server
VSCode Copilot Yes Yes Yes Yes requires proxy server
Jina No No No Yes embeddings only
Mock Yes No Yes Yes unit tests / local development

Environment Setup

Provider Required environment
OpenAI OPENAI_API_KEY
Azure OpenAI AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY or GOOGLE_API_KEY
Vertex AI GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN
xAI XAI_API_KEY
OpenRouter OPENROUTER_API_KEY
Mistral MISTRAL_API_KEY
AWS Bedrock standard AWS credential chain plus AWS_REGION
HuggingFace HF_TOKEN or HUGGINGFACE_TOKEN
OpenAI Compatible OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY
Ollama optional OLLAMA_HOST
LM Studio optional LMSTUDIO_HOST
VSCode Copilot optional VSCODE_COPILOT_PROXY_URL
Jina JINA_API_KEY

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

  • LITELLM_EDGE_PROVIDER
  • LITELLM_EDGE_MODEL
  • LITELLM_EDGE_TIMEOUT
  • LITELLM_EDGE_MAX_RETRIES
  • LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

  • completion()
  • acompletion()
  • embedding()
  • aembedding()
  • stream=True on acompletion()
  • stream() async generator
  • response.choices[0].message.content
  • response.to_dict()
  • AuthenticationError, RateLimitError, NotFoundError, Timeout
  • module globals set_verbose and drop_params

Behavior notes:

  • synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
  • unsupported or extra keyword arguments are dropped for LiteLLM parity
  • per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=8" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

  • Rust crate: vX.Y.Z
  • Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

  1. bump edgequake-litellm/Cargo.toml
  2. bump edgequake-litellm/pyproject.toml
  3. update CHANGELOG.md
  4. push the release-prep commit
  5. wait for python-ci.yml to go green
  6. push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.4.0.tar.gz (893.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.4.0-cp39-abi3-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl (9.7 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl (9.5 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.4.0-cp39-abi3-macosx_11_0_arm64.whl (8.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl (8.7 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.4.0.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.4.0.tar.gz
  • Upload date:
  • Size: 893.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.4.0.tar.gz
Algorithm Hash digest
SHA256 9eb34dfd56794095dbfb7162f78c689d8dbc83858e92a4bb4b9e6e920d143704
MD5 4d9d41e5f6c7628a6d350e1820ce6345
BLAKE2b-256 e40510d33c12903baae63725315807a7499b221c888c715fd1751b9a8f32c0c3

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 825e9f5526032aeb212a79fc20fca572b8f2259c443c74a3c6ecdffbb42bc23d
MD5 69a5f4f317547e25115aa0ae5ac06051
BLAKE2b-256 a72fe87579cc677db41100389d5ea124005a10d24a7d05dc4fc6e5b9de169c18

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 fd7dc6341eb5497da63e70ca00d9ca28032e8f48d37899747b8b99e0cab1cd9e
MD5 f2eb88d17b89329bb81cbf1cf3a191a6
BLAKE2b-256 92d9341e558cc7c3121e448c0be32e7831ebf5cc83263ccb342721f362205fb4

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 bfca9073a5f2cb635cbf706bf696ca6be859af9c0f1f153a5a28d374d06cfbc4
MD5 5c376bdcbb0c1de163c628b9b8dba9b2
BLAKE2b-256 405a886a96378a405b74c6f95ce96c9ea82e39f39c0b671dc4a36e678fc6a532

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0d16ce7a1df7ee2614c2a972277870c7252da06b0c4cdef27b3ca7e9536a171c
MD5 70ae6e08ca55f02cef78b836dc3558ba
BLAKE2b-256 d8877a4f2f11eb30a5b5ee14ddca088262d36ef184e95264886ec46a235c293d

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9d5309aa05a1f6acad35ed6ef457043e58076a553cd3175f3ae96a92d977e2a1
MD5 4e0153e4204bb6498685b2351d0505d5
BLAKE2b-256 aaed22497914b8d81ba20d4b4eae3a8db8f30eee388b900521a503ca08dbb336

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa3cc4bb4f120c5f67ac8d6db722ce8705901fe8b04db53dff786cb0e5f68d37
MD5 b40bbce46e5abffcccaf9a6939cff18c
BLAKE2b-256 026c76efa504a42adc6a202ed1cc12db246f8170fb4265c3d65613d0b97aa00a

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7bd37546501ca283585162a6b41cf634226ae30940a09dd552408f3e72f3e375
MD5 c0f0deeddd7a6d98936b478f7f0e11cc
BLAKE2b-256 09c674795aec7c56a2517370f1cb55777d6ba9d8b4b8c10130852813943d4ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page