Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

PyPI Python Python CI License

edgequake-litellm is a LiteLLM-compatible Python package backed by the Rust edgequake-llm core. The intent is simple: keep the LiteLLM call shape, replace the Python network path with a native implementation, and preserve operational features such as streaming, tool calling, embeddings, and provider routing.

# Before
import litellm

# After
import edgequake_litellm as litellm

Install

pip install edgequake-litellm

Supported wheel targets:

Platform Architectures
Linux (glibc) x86_64, aarch64
Linux (musl) x86_64, aarch64
macOS x86_64, arm64
Windows x86_64

The package uses abi3-py39, so one wheel per platform covers Python 3.9+.

Scope note: this package covers the LiteLLM-compatible chat and embedding API surface. The Rust crate also ships image-generation providers, but those APIs are not exposed through edgequake-litellm yet.

Quick Start

import asyncio
import edgequake_litellm as litellm

messages = [{"role": "user", "content": "Explain Rust ownership in one sentence."}]

# Sync
resp = litellm.completion("openai/gpt-4o-mini", messages, max_tokens=128)
print(resp.choices[0].message.content)

# Async
async def main() -> None:
    resp = await litellm.acompletion("anthropic/claude-3-5-haiku-20241022", messages)
    print(resp.content)

    stream = await litellm.acompletion("openai/gpt-4o-mini", messages, stream=True)
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Embeddings:

import edgequake_litellm as litellm

result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["hello world", "rust is fast"],
)

print(result.data[0].embedding[:3])
print(len(result[0]))

Provider Routing

Pass provider/model as the model argument:

Provider Example
OpenAI openai/gpt-4o-mini
Azure OpenAI azure/my-gpt4o-deployment
Anthropic anthropic/claude-3-5-sonnet-20241022
Gemini gemini/gemini-2.5-flash
Vertex AI vertexai/gemini-2.5-flash
xAI xai/grok-4
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
Mistral mistral/mistral-large-latest
AWS Bedrock bedrock/amazon.nova-lite-v1:0
HuggingFace huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
OpenAI Compatible openai-compatible/deepseek-chat
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
VSCode Copilot vscode-copilot/auto
Mock mock/test-model

Embedding-only backend:

Provider Example
Jina jina/jina-embeddings-v3

Supported Features

Provider Chat Stream Tools Embeddings Notes
OpenAI Yes Yes Yes Yes includes max_completion_tokens handling
Azure OpenAI Yes Yes Yes Yes deployment-based routing
Anthropic Yes Yes Yes No Claude extended thinking surfaced in response metadata
Gemini Yes Yes Yes Yes Google AI Studio
Vertex AI Yes Yes Yes Yes GCP auth / ADC
xAI Yes Yes Yes No Grok
OpenRouter Yes Yes Yes No gateway models
Mistral Yes Yes Yes Yes native embeddings
AWS Bedrock Yes Yes Yes Yes backed by the Rust Bedrock feature
HuggingFace Yes Yes Limited No Inference API
OpenAI Compatible Yes Yes Yes Yes Groq, Together, DeepSeek, custom gateways
Ollama Yes Yes Yes Yes local runtime
LM Studio Yes Yes Yes Yes local OpenAI-compatible server
VSCode Copilot Yes Yes Yes Yes direct auth by default, proxy optional
Jina No No No Yes embeddings only
Mock Yes No Yes Yes unit tests / local development

Environment Setup

Provider Required environment
OpenAI OPENAI_API_KEY
Azure OpenAI AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY or GOOGLE_API_KEY
Vertex AI GOOGLE_CLOUD_PROJECT and ADC or GOOGLE_ACCESS_TOKEN
xAI XAI_API_KEY
OpenRouter OPENROUTER_API_KEY
Mistral MISTRAL_API_KEY
AWS Bedrock standard AWS credential chain plus AWS_REGION
HuggingFace HF_TOKEN or HUGGINGFACE_TOKEN
OpenAI Compatible OPENAI_COMPATIBLE_BASE_URL, optional OPENAI_COMPATIBLE_API_KEY
Ollama optional OLLAMA_HOST
LM Studio optional LMSTUDIO_HOST
VSCode Copilot optional VSCODE_COPILOT_PROXY_URL; otherwise reuse the official VS Code Copilot auth cache
Jina JINA_API_KEY

Module defaults:

import edgequake_litellm as litellm

litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

Environment defaults:

  • LITELLM_EDGE_PROVIDER
  • LITELLM_EDGE_MODEL
  • LITELLM_EDGE_TIMEOUT
  • LITELLM_EDGE_MAX_RETRIES
  • LITELLM_EDGE_VERBOSE

LiteLLM Compatibility

Implemented:

  • completion()
  • acompletion()
  • embedding()
  • aembedding()
  • stream=True on acompletion()
  • stream() async generator
  • response.choices[0].message.content
  • response.to_dict()
  • AuthenticationError, RateLimitError, NotFoundError, Timeout
  • module globals set_verbose and drop_params

Behavior notes:

  • synchronous streaming is intentionally not supported; use acompletion(..., stream=True) or stream()
  • unsupported or extra keyword arguments are dropped for LiteLLM parity
  • per-call api_key, api_base, and timeout parameters are accepted at the Python layer but not yet wired into the Rust core for every provider

Provider Examples

OpenAI-compatible custom gateway:

export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_API_KEY=...
import edgequake_litellm as litellm

resp = litellm.completion(
    "openai-compatible/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Write a one-line changelog summary."}],
)
print(resp.content)

Vertex AI:

export GOOGLE_CLOUD_PROJECT=my-project
gcloud auth application-default login
resp = litellm.completion(
    "vertexai/gemini-2.5-flash",
    [{"role": "user", "content": "Summarise this design review."}],
)

Jina embeddings:

import edgequake_litellm as litellm

vectors = litellm.embedding(
    "jina/jina-embeddings-v3",
    ["retrieval query", "retrieval document"],
)
print(len(vectors[0]))

Development

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

python -m venv .venv
source .venv/bin/activate

pip install "maturin>=1.7" "pytest>=8" "pytest-asyncio>=0.24" "ruff>=0.3" "mypy>=1.8"
pip install . -v

pytest -q -k "not e2e"
ruff check python/
mypy python/edgequake_litellm --ignore-missing-imports

Release

Release tags are separate from the Rust crate:

  • Rust crate: vX.Y.Z
  • Python package: py-vX.Y.Z

Publish flow for edgequake-litellm:

  1. bump edgequake-litellm/Cargo.toml
  2. bump edgequake-litellm/pyproject.toml
  3. update CHANGELOG.md
  4. push the release-prep commit
  5. wait for python-ci.yml to go green
  6. push py-vX.Y.Z

python-publish.yml builds the sdist and wheels, smoke-tests the native wheels, publishes to PyPI, and can attach built artifacts to the GitHub Release.

Changelog

See CHANGELOG.md for the current release line and published history.

License

Apache-2.0. See ../LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.6.7.tar.gz (911.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.6.7-cp39-abi3-win_amd64.whl (7.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_x86_64.whl (9.7 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_aarch64.whl (9.6 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (9.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.6.7-cp39-abi3-macosx_11_0_arm64.whl (8.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.6.7-cp39-abi3-macosx_10_12_x86_64.whl (8.8 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.6.7.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.6.7.tar.gz
  • Upload date:
  • Size: 911.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgequake_litellm-0.6.7.tar.gz
Algorithm Hash digest
SHA256 cb037cd33b8ce6b6f277a0aec415581720a8564b5ea1f08cbf3f0ce908fe16c8
MD5 8b8814c7efbdd95f6e2745b3a29af363
BLAKE2b-256 07368d65bb8f74473f97a540995698962080c744a067f23fe2fa925cabe0d3e9

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4ecb598e8880d95dd44dd4958d09738a1628d8e455576b46a0d91fe8a16e318a
MD5 2eaa930eec3d0682da7ec26d381e99ce
BLAKE2b-256 a36babd7ad181e299364d33abb1cc38a65f70f8312e612aae9e01a254cc1cbf5

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c4b78f18668321bf6d9d9ed940670936c4dff2ce3024897b1d2d32ef4a22bd35
MD5 5289b8e2fb85f6c1383a279885bb7cef
BLAKE2b-256 e8b6be07debea1e5f34061c6daa71d164a975fe7288249559c9ad4b7d9955afa

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 0ba5bb4c5f134698cf974f80a7abdeba0743688ffe8b9b16b2d2dae70c1a2df3
MD5 9661cabd9f7a563b2c0344b6497b5fb8
BLAKE2b-256 c97e407dc5c300b991bf44ca3dd91551fd5be27f6e42a86832a6f0d46d8b830e

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1a5d269a6c2ebcfed9a9e73826e90fc53e16a2b75e0afab4c67f6d4943806df9
MD5 ef3d5a337cbb85af10307a5f70cffea5
BLAKE2b-256 23de4dce7cee99d2daf32da50cbf5bd499606b5f9faf0446180fb9ae0b4cf96a

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7063dddb0631101b27de66ed3d71ea12622e1631aba58290ea41c977cf7a7739
MD5 86f4d9e20a77569ba44e9c5f2842d2bc
BLAKE2b-256 6a7cecd5c83094dc43c0f6a2be99841f5c4cd8033ac3b40f9eb5f9688c40265e

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5c1c4ef4e43d82a411eb0b62a5ff357103fbc865033af3b43a52bc89644e67bc
MD5 6a1335513f309e4610e4517ebacd7c17
BLAKE2b-256 13c5519fdcd4384fb310d31acb3127c664189bcd507d07127451d5590c1e759e

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.6.7-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.6.7-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cfbe6512191cbf64e3d94b3f8f70eeec2ee2ba8a2a1a4c0e6685c842bdc0688c
MD5 14c11bec42c10ebe0ec4f6b307f4cf3c
BLAKE2b-256 ae5a2a780a2bc23222ade8a5ca52cd844622cc5a5cfa6dc8e69ec824a75bb1f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page