Skip to main content

Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

Project description

edgequake-litellm

Drop-in LiteLLM replacement backed by Rust — same API, lower overhead.

PyPI Python CI License

edgequake-litellm wraps the edgequake-llm Rust core via PyO3, providing a high-performance drop-in for LiteLLM. Swap the import — the rest of your code stays unchanged.

# Before
import litellm

# After — same API, Rust-backed
import edgequake_litellm as litellm

Features

  • LiteLLM-compatible APIcompletion(), acompletion(), stream(), embedding(), same call signatures, same response shape (resp.choices[0].message.content).
  • Multi-provider routing — OpenAI, Anthropic, Gemini, Mistral, OpenRouter, xAI, Ollama, LM Studio, HuggingFace, and more, via provider/model strings.
  • Async-native — built on Tokio; sync and async Python both supported.
  • Single wheel per platform — uses PyO3's abi3-py39 stable ABI, one .whl covers Python 3.9–3.13+.
  • Zero Python runtime dependencies — the Rust extension is self-contained.
  • Full type annotations — ships with py.typed and .pyi stubs.

Installation

pip install edgequake-litellm

Quick Start

import edgequake_litellm as litellm   # drop-in import alias

# ── Synchronous chat ────────────────────────────────────────────────────────
resp = litellm.completion(
    "openai/gpt-4o-mini",
    [{"role": "user", "content": "Hello, world!"}],
)
# litellm-compatible access
print(resp.choices[0].message.content)
# convenience shortcut
print(resp.content)

# ── Asynchronous chat ───────────────────────────────────────────────────────
import asyncio

async def main():
    resp = await litellm.acompletion(
        "anthropic/claude-3-5-haiku-20241022",
        [{"role": "user", "content": "Tell me a joke."}],
        max_tokens=128,
        temperature=0.8,
    )
    print(resp.choices[0].message.content)

asyncio.run(main())

# ── Streaming (async generator) ─────────────────────────────────────────────
async def stream_example():
    messages = [{"role": "user", "content": "Count to five."}]
    async for chunk in litellm.acompletion("openai/gpt-4o", messages, stream=True):
        print(chunk.choices[0].delta.content or "", end="", flush=True)

# ── Embeddings ──────────────────────────────────────────────────────────────
result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["Hello world", "Rust is fast"],
)
# litellm-compatible access
print(result.data[0].embedding[:3])
# legacy list access still works
print(len(result), len(result[0]))  # 2 1536

Provider Routing

Pass provider/model as the first argument — the prefix selects the provider:

Provider Example model string
OpenAI openai/gpt-4o
Anthropic anthropic/claude-3-5-sonnet-20241022
Google Gemini gemini/gemini-2.0-flash
Mistral mistral/mistral-large-latest
OpenRouter openrouter/meta-llama/llama-3.1-70b-instruct
xAI xai/grok-3-beta
Ollama ollama/llama3.2
LM Studio lmstudio/local-model
HuggingFace huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1
Mock (tests) mock/any-name

API Reference

completion(model, messages, **kwargs) → ModelResponseCompat

Synchronous chat completion. Blocks but releases the GIL during Rust I/O so other Python threads keep running.

resp = litellm.completion(
    "openai/gpt-4o",
    messages,
    max_tokens=256,
    temperature=0.7,
    system="You are a helpful assistant.",
    max_completion_tokens=256,  # alias for max_tokens
    seed=42,
    response_format={"type": "json_object"},  # or "text" / "json_object"
)

# All of these access the same content:
resp.choices[0].message.content   # litellm path
resp.content                       # shortcut
resp["choices"][0]["message"]["content"]  # dict-style

resp.usage.total_tokens
resp.model
resp.response_ms   # latency in milliseconds
resp.to_dict()     # plain dict

acompletion(model, messages, stream=False, **kwargs)

Async chat completion. Returns ModelResponseCompat or (if stream=True) AsyncGenerator[StreamChunkCompat, None].

# Non-streaming
resp = await litellm.acompletion("openai/gpt-4o", messages)

# Streaming
async for chunk in await litellm.acompletion("openai/gpt-4o", messages, stream=True):
    print(chunk.choices[0].delta.content or "", end="")

stream(model, messages, **kwargs) → AsyncGenerator[StreamChunk, None]

Low-level streaming. Raw StreamChunk objects:

async for chunk in litellm.stream("openai/gpt-4o", messages):
    if chunk.content:
        print(chunk.content, end="")
    elif chunk.is_finished:
        print(f"\n[stop: {chunk.finish_reason}]")

embedding(model, input, **kwargs) → EmbeddingResponseCompat

Synchronous embeddings. Returns an EmbeddingResponseCompat that supports both litellm-style and legacy list-style access:

result = litellm.embedding("openai/text-embedding-3-small", ["foo", "bar"])

# litellm path
result.data[0].embedding

# backwards-compatible list access
for vec in result:          # iterates List[float]
    print(len(vec))
result[0]                   # List[float]
len(result)                 # number of vectors

aembedding(model, input, **kwargs) → EmbeddingResponseCompat

Async embeddings — same return type as embedding().

stream_chunk_builder(chunks, messages=None) → ModelResponseCompat

Reconstruct a full ModelResponseCompat from a collected list of streaming chunks:

from edgequake_litellm import stream_chunk_builder

chunks = []
async for chunk in litellm.stream("openai/gpt-4o", messages):
    chunks.append(chunk)

full = stream_chunk_builder(chunks, messages=messages)
print(full.content)

Configuration

Module-level globals mirror litellm:

import edgequake_litellm as litellm

litellm.set_verbose = True      # enable debug logging
litellm.drop_params = True      # drop unknown params (always True)

# Set default provider / model
litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

# Now the provider prefix can be omitted:
resp = litellm.completion("claude-3-5-haiku-20241022", messages)

Exception Hierarchy

Exceptions mirror LiteLLM for painless migration:

import edgequake_litellm as litellm

try:
    resp = litellm.completion("openai/gpt-4o", messages)
except litellm.AuthenticationError as e:
    print(f"Check your API key: {e}")
except litellm.RateLimitError:
    time.sleep(5)
except litellm.ContextWindowExceededError:
    # trim messages and retry
    pass
except litellm.NotFoundError:      # alias for ModelNotFoundError
    pass
except litellm.APIConnectionError:
    pass

All exceptions (AuthenticationError, RateLimitError, ContextWindowExceededError, ModelNotFoundError, Timeout, APIConnectionError, APIError) are also available from edgequake_litellm.exceptions.

Environment Variables

Provider credentials follow the standard naming convention:

Provider Environment variable
OpenAI OPENAI_API_KEY
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY
Mistral MISTRAL_API_KEY
OpenRouter OPENROUTER_API_KEY
xAI XAI_API_KEY
HuggingFace HF_TOKEN
Ollama OLLAMA_HOST (default: http://localhost:11434)
LM Studio LMSTUDIO_HOST (default: http://localhost:1234)

Defaults can also be set via LITELLM_EDGE_PROVIDER / LITELLM_EDGE_MODEL.

Development

Prerequisites

  • Rust ≥ 1.83 (rustup toolchain install stable)
  • Python ≥ 3.9
  • pip install maturin

Build from source

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate

pip install maturin pytest pytest-asyncio ruff mypy

# Build & install in dev mode (incremental Rust + Python)
maturin develop --release

# Run unit tests (mock provider — no API keys needed)
pytest tests/ -k "not e2e" -v

Running E2E tests

export OPENAI_API_KEY=sk-...
pytest tests/test_e2e_openai.py -v

Publishing

# Bump version in pyproject.toml AND Cargo.toml (must match), then:
git tag py-v0.2.0
git push --tags
# GitHub Actions builds and publishes to PyPI automatically.

License

Apache-2.0 — see LICENSE-APACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.1.0.tar.gz (699.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

edgequake_litellm-0.1.0-cp39-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

edgequake_litellm-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.1.0.tar.gz.

File metadata

  • Download URL: edgequake_litellm-0.1.0.tar.gz
  • Upload date:
  • Size: 699.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9269f0e0a74d712a711b2fe2019ac9006e0bf9663b8f7645ffe763b508bf9bec
MD5 bb4f47dee2c50b3ab5fa7b95ec9f22c7
BLAKE2b-256 9acd0675e041af6ac6ce88060d4de364580e37b5ee524ee49e399e0a2ede069c

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cc721152943d5697c7747e930cebf04c0b05701d372f6cb039be8a92d9e0d35c
MD5 596c72a0d2a7fd4e71093f632025f2c3
BLAKE2b-256 e571d447e09bcd8887a9e6348e9913202ff524959446808f49f7c622226d269d

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 81edc0fe7f250c2756ed2202e0c8cbe87c47e895385d4dff1191ad43868deaae
MD5 6f06a50f6f365544c9a1a2dc537342f2
BLAKE2b-256 f8111ae69751ecc776ebe098d37629e43178bf71578e2f455e423d3fcd106043

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 4e99b38d752ba93c9be2b52c98a07c891a6f62b6320c0d54526ceda48492a7ec
MD5 ea3bdc9702c754d4b18ce4593be5d8d6
BLAKE2b-256 3098931e8cdca5957716601687713d2067a798987aa9bfb62fd2bb46e888e55c

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0733c5c16eb62a8bf1950d317be787fe9007a3832587e717c7e7a077aff6035d
MD5 b8e1cfa025cc768883aef4d20972006e
BLAKE2b-256 326d6a9aaf2f9de0fe815bc6012d248d35384ec3d003d617edeca9b0aa22107a

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cc9e397c365f0dae694b56dd7b96f27d81b41756baf10460ef6b023ee397307f
MD5 7a880f9c98e66d1f590129565246bba0
BLAKE2b-256 db1dedd003c2d8aac46959a0c31c8564e788d105385ea67b1658a0c63159dac8

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 99e012e0a7c2a481a068441302876f579849d995529c54121929dfb67590111b
MD5 4aeb588de68ebc6f4697b17bae072f29
BLAKE2b-256 63007446b7391adaa8718dbb557700a7368f15a29fd28719af7842ec2e814baf

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for edgequake_litellm-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 98caa5e2ef71040a37eb2e7bdb6d7f1b85b0dcaaec1793ae4b8b7ac48ea62bd6
MD5 c08761ec91e5bd49545cf0761fe2f84d
BLAKE2b-256 169b6b6a7daf1b5bbe8c60274c4c4c181d7d0f101cb0782f628ad5eeba223b84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page