Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

PyPI Python Python CI License

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl) x86_64, aarch64
macOS x86_64, arm64
Windows x86_64

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Provider Routing

Pass provider/model as the model argument:

Provider Example
OpenAI openai/gpt-4o-mini
Azure OpenAI azure/my-gpt4o-deployment
Anthropic anthropic/claude-3-5-sonnet-20241022
Gemini gemini/gemini-2.5-flash
Vertex AI vertexai/gemini-2.5-flash
xAI xai/grok-4
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
NVIDIA NIM nvidia/meta/llama-3.1-8b-instruct
Mistral mistral/mistral-large-latest
AWS Bedrock bedrock/amazon.nova-lite-v1:0
HuggingFace huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
OpenAI Compatible openai-compatible/deepseek-chat
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
VSCode Copilot vscode-copilot/auto
Mock mock/test-model

Embedding-only backend:

Provider Example
Jina jina/jina-embeddings-v3

Supported Features

Provider Chat Stream Tools Embeddings Notes
OpenAI Yes Yes Yes Yes includes max_completion_tokens handling
Azure OpenAI Yes Yes Yes Yes deployment-based routing
Anthropic Yes Yes Yes No Claude extended thinking surfaced in response metadata
Gemini Yes Yes Yes Yes Google AI Studio
Vertex AI Yes Yes Yes Yes GCP auth / ADC
xAI Yes Yes Yes No Grok
OpenRouter Yes Yes Yes No gateway models
NVIDIA NIM Yes Yes Yes No OpenAI-compatible hosted NIM
Mistral Yes Yes Yes Yes native embeddings
AWS Bedrock Yes Yes Yes Yes backed by the Rust Bedrock feature
HuggingFace Yes Yes Limited No Inference API
OpenAI Compatible Yes Yes Yes Yes Groq, Together, DeepSeek, custom gateways
Ollama Yes Yes Yes Yes local runtime
LM Studio Yes Yes Yes Yes local OpenAI-compatible server
VSCode Copilot Yes Yes Yes Yes direct auth by default, proxy optional
Jina No No No Yes embeddings only
Mock Yes No Yes Yes unit tests / local development

Environment Setup

Provider Required environment
OpenAI OPENAI_API_KEY
Azure OpenAI AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY or GOOGLE_API_KEY
Vertex AI GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN
xAI XAI_API_KEY
OpenRouter OPENROUTER_API_KEY
NVIDIA NIM NVIDIA_API_KEY
Mistral MISTRAL_API_KEY
AWS Bedrock standard AWS credential chain plus AWS_REGION
HuggingFace HF_TOKEN or HUGGINGFACE_TOKEN
OpenAI Compatible OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY
Ollama optional OLLAMA_HOST
LM Studio optional LMSTUDIO_HOST
VSCode Copilot optional VSCODE_COPILOT_PROXY_URL; otherwise reuse the official VS Code Copilot auth cache
Jina JINA_API_KEY

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

  • LITELLM_EDGE_PROVIDER
  • LITELLM_EDGE_MODEL
  • LITELLM_EDGE_TIMEOUT
  • LITELLM_EDGE_MAX_RETRIES
  • LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

  • completion()
  • acompletion()
  • embedding()
  • aembedding()
  • stream=True on acompletion()
  • stream() async generator
  • response.choices[0].message.content
  • response.to_dict()
  • AuthenticationError, RateLimitError, NotFoundError, Timeout
  • module globals set_verbose and drop_params

Behavior notes:

  • synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
  • unsupported or extra keyword arguments are dropped for LiteLLM parity
  • per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=8" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

  • Rust crate: vX.Y.Z
  • Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

  1. bump edgequake-litellm/Cargo.toml
  2. bump edgequake-litellm/pyproject.toml
  3. update CHANGELOG.md
  4. push the release-prep commit
  5. wait for python-ci.yml to go green
  6. push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.6.12.tar.gz (962.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.6.12-cp39-abi3-win_amd64.whl (6.8 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_x86_64.whl (8.9 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_aarch64.whl (8.7 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (8.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.6.12-cp39-abi3-macosx_11_0_arm64.whl (7.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.6.12-cp39-abi3-macosx_10_12_x86_64.whl (8.0 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.6.12.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.6.12.tar.gz
  • Upload date:
  • Size: 962.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.6.12.tar.gz
Algorithm Hash digest
SHA256 8a5d8513b9bbe41bad057c53946606b0f14e97368b0c3973ccfe018ca58874d3
MD5 70095606be56d619e8871a6b759fc1b7
BLAKE2b-256 8b640a67030ec6791d00ca80830d10fe05cbae80ba904292469773c290734bb4

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6a60f1b5cac67ecef3b054750b511eb9ed896cfeafcffc07bb1c6d6270c90638
MD5 19f8a4a2c7bd87f87e8fb72f49397d4f
BLAKE2b-256 cb7b8a8f3542f76e0a2924b3f41fd1cee6070701dc7e4f62be7bc68c5e854743

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 928e1dd7c682fa1dcb93b927863544ee48a2678272c68e38961adaaa1cf9ed2a
MD5 0700cfd460da7ecc6f8be5b2fd422b03
BLAKE2b-256 6a168728a7c4f18b6c85aa498ead46bdc6f1c768f9d84096a750905942841e17

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 026b83ff423a956466536c186dad51f336db513c90a3cdb9d1b3e68b7b88b473
MD5 8f34f4f38beb8fc8e738a6d0aa8988ab
BLAKE2b-256 76a08f5dea3c2cade91cc160cf33c5ba5d0e1e48c853f937f153d4b0dbefc871

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ff53e2de5f132dd0b85f8ab94d9d145aea31a9f4dce389e10cca5cc7634607e6
MD5 35c1558dbd023cab1f295514b8dda342
BLAKE2b-256 aae031b76a65c8012562241398433e9394835000a798b126e5712eb245262dfc

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3a6298a4f87e2d81b92cf7817c0f95609a928e71896509e707dd811d2917d10e
MD5 34766d482787c01db9fcdc856898cd68
BLAKE2b-256 ef51ebca6ac379f56d4f3cb60d2ee0e75b6c1f7562a06607aa4cb8e2a25eaad0

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 846265461a09119da8f13b28fc22b9c45a99178acedc3cb6b43a12b1784547ac
MD5 2dd4197ec5f7aee0d6bd3e25fa8031ba
BLAKE2b-256 7516868498b1fb50ca3b3f276f2af739af5a20b67f7de740f6e0a1385628b5e9

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.12-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.12-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 39eace680d97fb44d9edf3709dcb10453e22b4a79a5c1d04ad230e9a82cda233
MD5 5e7769bee4ad720f4f3f1157ad8f262b
BLAKE2b-256 6d27e54fe474f7cab8f43d71eca6278180844c16dd12ef5ea7b39f5e0d6f514b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page