Drop-in LiteLLM replacement backed by Rust — same API, 10× lower latency

These details have not been verified by PyPI

Project description

edgequake-litellm

Drop-in LiteLLM replacement backed by Rust — same API, lower overhead.

edgequake-litellm wraps the edgequake-llm Rust core via PyO3, providing a high-performance drop-in for LiteLLM. Swap the import — the rest of your code stays unchanged.

# Before
import litellm

# After — same API, Rust-backed
import edgequake_litellm as litellm

Features

LiteLLM-compatible API — completion(), acompletion(), stream(), embedding(), same call signatures, same response shape (resp.choices[0].message.content).
Multi-provider routing — OpenAI, Anthropic, Gemini, Mistral, OpenRouter, xAI, Azure, AWS Bedrock, Ollama, LM Studio, HuggingFace, and more, via provider/model strings.
AWS Bedrock — native support for 12+ model families via the Converse API, including Amazon Nova, Anthropic Claude, Meta Llama, Mistral, and native embedding with Titan / Cohere.
Async-native — built on Tokio; sync and async Python both supported.
Single wheel per platform — uses PyO3's abi3-py39 stable ABI, one .whl covers Python 3.9–3.13+.
Zero Python runtime dependencies — the Rust extension is self-contained.
Full type annotations — ships with py.typed and .pyi stubs.
max_completion_tokens support — works for all OpenAI model families including o1, o3-mini, o4-mini, gpt-4.1, gpt-4.1-nano that require this field.
Cache hit tokens — resp.cache_hit_tokens exposes OpenAI prompt cache hits and Anthropic cache reads.
Reasoning tokens — resp.thinking_tokens surfaces o-series reasoning and Claude extended thinking token counts.

What's New in 0.2.0

AWS Bedrock provider — bedrock/<model-id> routing for 12+ model families via the Converse API: Amazon Nova, Anthropic Claude, Meta Llama, Mistral, Google Gemma, NVIDIA Nemotron, Qwen, MiniMax, DeepSeek, Z.AI, OpenAI OSS, Cohere, Writer.
Bedrock native embedding — Amazon Titan Embed Text v2/v1 and Cohere Embed v3/v4.
Inference profile auto-resolution — bare model IDs automatically resolve to cross-region inference profile IDs.
Backed by edgequake-llm v0.3.0.

See CHANGELOG.md for the full history.

Installation

pip install edgequake-litellm

Quick Start

import edgequake_litellm as litellm   # drop-in import alias

# ── Synchronous chat ────────────────────────────────────────────────────────
resp = litellm.completion(
    "openai/gpt-4o-mini",
    [{"role": "user", "content": "Hello, world!"}],
)
# litellm-compatible access
print(resp.choices[0].message.content)
# convenience shortcut
print(resp.content)

# ── Asynchronous chat ───────────────────────────────────────────────────────
import asyncio

async def main():
    resp = await litellm.acompletion(
        "anthropic/claude-3-5-haiku-20241022",
        [{"role": "user", "content": "Tell me a joke."}],
        max_tokens=128,
        temperature=0.8,
    )
    print(resp.choices[0].message.content)

asyncio.run(main())

# ── Streaming (async generator) ─────────────────────────────────────────────
async def stream_example():
    messages = [{"role": "user", "content": "Count to five."}]
    async for chunk in litellm.acompletion("openai/gpt-4o", messages, stream=True):
        print(chunk.choices[0].delta.content or "", end="", flush=True)

# ── Embeddings ──────────────────────────────────────────────────────────────
result = litellm.embedding(
    "openai/text-embedding-3-small",
    ["Hello world", "Rust is fast"],
)
# litellm-compatible access
print(result.data[0].embedding[:3])
# legacy list access still works
print(len(result), len(result[0]))  # 2 1536

Provider Routing

Pass provider/model as the first argument — the prefix selects the provider:

Provider	Example model string
OpenAI	`openai/gpt-4o`
Anthropic	`anthropic/claude-3-5-sonnet-20241022`
Google Gemini	`gemini/gemini-2.0-flash`
Mistral	`mistral/mistral-large-latest`
OpenRouter	`openrouter/meta-llama/llama-3.1-70b-instruct`
xAI	`xai/grok-3-beta`
Azure OpenAI	`azure/gpt-4o`
AWS Bedrock	`bedrock/amazon.nova-lite-v1:0`
Ollama	`ollama/llama3.2`
LM Studio	`lmstudio/local-model`
HuggingFace	`huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1`
Mock (tests)	`mock/any-name`

API Reference

`completion(model, messages, **kwargs) → ModelResponseCompat`

Synchronous chat completion. Blocks but releases the GIL during Rust I/O so other Python threads keep running.

resp = litellm.completion(
    "openai/gpt-4o",
    messages,
    max_tokens=256,
    temperature=0.7,
    system="You are a helpful assistant.",
    max_completion_tokens=256,  # alias for max_tokens; required for o1/o3/gpt-4.1 models
    seed=42,
    response_format={"type": "json_object"},  # or "text" / "json_object"
)

# All of these access the same content:
resp.choices[0].message.content   # litellm path
resp.content                       # shortcut
resp["choices"][0]["message"]["content"]  # dict-style

resp.usage.total_tokens
resp.model
resp.response_ms                  # latency in milliseconds
resp.to_dict()                    # plain dict

# New in 0.1.1 — cache and reasoning token metadata
resp.cache_hit_tokens             # int | None — tokens served from provider cache
resp.thinking_tokens              # int | None — reasoning tokens (o-series, Claude)
resp.thinking_content             # str | None — visible thinking text (Claude)

# The same data via usage object:
resp.usage.cache_read_input_tokens  # same as resp.cache_hit_tokens
resp.usage.reasoning_tokens         # same as resp.thinking_tokens

`acompletion(model, messages, stream=False, **kwargs)`

Async chat completion. Returns ModelResponseCompat or (if stream=True) AsyncGenerator[StreamChunkCompat, None].

# Non-streaming
resp = await litellm.acompletion("openai/gpt-4o", messages)

# Streaming
async for chunk in await litellm.acompletion("openai/gpt-4o", messages, stream=True):
    print(chunk.choices[0].delta.content or "", end="")

`stream(model, messages, **kwargs) → AsyncGenerator[StreamChunk, None]`

Low-level streaming. Raw StreamChunk objects:

async for chunk in litellm.stream("openai/gpt-4o", messages):
    if chunk.content:
        print(chunk.content, end="")
    elif chunk.is_finished:
        print(f"\n[stop: {chunk.finish_reason}]")

`embedding(model, input, **kwargs) → EmbeddingResponseCompat`

Synchronous embeddings. Returns an EmbeddingResponseCompat that supports both litellm-style and legacy list-style access:

result = litellm.embedding("openai/text-embedding-3-small", ["foo", "bar"])

# litellm path
result.data[0].embedding

# backwards-compatible list access
for vec in result:          # iterates List[float]
    print(len(vec))
result[0]                   # List[float]
len(result)                 # number of vectors

`aembedding(model, input, **kwargs) → EmbeddingResponseCompat`

Async embeddings — same return type as embedding().

`stream_chunk_builder(chunks, messages=None) → ModelResponseCompat`

Reconstruct a full ModelResponseCompat from a collected list of streaming chunks:

from edgequake_litellm import stream_chunk_builder

chunks = []
async for chunk in litellm.stream("openai/gpt-4o", messages):
    chunks.append(chunk)

full = stream_chunk_builder(chunks, messages=messages)
print(full.content)

Configuration

Module-level globals mirror litellm:

import edgequake_litellm as litellm

litellm.set_verbose = True      # enable debug logging
litellm.drop_params = True      # drop unknown params (always True)

# Set default provider / model
litellm.set_default_provider("anthropic")
litellm.set_default_model("claude-3-5-haiku-20241022")

# Now the provider prefix can be omitted:
resp = litellm.completion("claude-3-5-haiku-20241022", messages)

Exception Hierarchy

Exceptions mirror LiteLLM for painless migration:

import edgequake_litellm as litellm

try:
    resp = litellm.completion("openai/gpt-4o", messages)
except litellm.AuthenticationError as e:
    print(f"Check your API key: {e}")
except litellm.RateLimitError:
    time.sleep(5)
except litellm.ContextWindowExceededError:
    # trim messages and retry
    pass
except litellm.NotFoundError:      # alias for ModelNotFoundError
    pass
except litellm.APIConnectionError:
    pass

All exceptions (AuthenticationError, RateLimitError, ContextWindowExceededError, ModelNotFoundError, Timeout, APIConnectionError, APIError) are also available from edgequake_litellm.exceptions.

Environment Variables

Provider credentials follow the standard naming convention:

Provider	Environment variable
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Gemini	`GEMINI_API_KEY`
Mistral	`MISTRAL_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`
xAI	`XAI_API_KEY`
HuggingFace	`HF_TOKEN`
Ollama	`OLLAMA_HOST` (default: `http://localhost:11434`)
LM Studio	`LMSTUDIO_HOST` (default: `http://localhost:1234`)

Defaults can also be set via LITELLM_EDGE_PROVIDER / LITELLM_EDGE_MODEL.

Development

Prerequisites

Rust ≥ 1.83 (rustup toolchain install stable)
Python ≥ 3.9
pip install maturin

Build from source

git clone https://github.com/raphaelmansuy/edgequake-llm.git
cd edgequake-llm/edgequake-litellm

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate

pip install maturin pytest pytest-asyncio ruff mypy

# Build & install in dev mode (incremental Rust + Python)
maturin develop --release

# Run unit tests (mock provider — no API keys needed)
pytest tests/ -k "not e2e" -v

Running E2E tests

export OPENAI_API_KEY=sk-...
pytest tests/test_e2e_openai.py -v

Publishing

# Bump version in pyproject.toml AND Cargo.toml (must match), then:
git tag py-v0.2.0
git push --tags
# GitHub Actions builds and publishes to PyPI automatically.

License

Apache-2.0 — see LICENSE-APACHE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.12

Apr 25, 2026

0.6.7

Apr 20, 2026

0.4.0

Apr 4, 2026

This version

0.3.0

Apr 4, 2026

0.2.0

Mar 1, 2026

0.1.4

Feb 23, 2026

0.1.3

Feb 22, 2026

0.1.2

Feb 21, 2026

0.1.1

Feb 20, 2026

0.1.0

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgequake_litellm-0.3.0.tar.gz (813.1 kB view details)

Uploaded Apr 4, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edgequake_litellm-0.3.0-cp39-abi3-win_amd64.whl (7.3 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+Windows x86-64

edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl (9.7 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+musllinux: musl 1.2+ x86-64

edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl (9.4 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+musllinux: musl 1.2+ ARM64

edgequake_litellm-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+manylinux: glibc 2.17+ x86-64

edgequake_litellm-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (8.4 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+macOS 11.0+ ARM64

edgequake_litellm-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl (8.7 MB view details)

Uploaded Apr 4, 2026 CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file edgequake_litellm-0.3.0.tar.gz.

File metadata

Download URL: edgequake_litellm-0.3.0.tar.gz
Upload date: Apr 4, 2026
Size: 813.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b1e875daab2de99d7c0dd7e7f2984b45e7291c109762bac71f65a3ce8292c634`
MD5	`1f6c39289c5448f8e4a2d90329797416`
BLAKE2b-256	`8573f8f3e3c2ec38009ca6953c958c88e1de79027e340aa64b4c9a473c8d42f0`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-win_amd64.whl
Upload date: Apr 4, 2026
Size: 7.3 MB
Tags: CPython 3.9+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`dbca6ba1e0a38e152689cb5d6fed0acc410391c09316c4067ffac0b001940c4b`
MD5	`ca78e9487369fd9fc094a2e939cb13f3`
BLAKE2b-256	`ad6e364ccb93f0afc6618d2b7f2b478dfdbbea776f665bf43071bfe5bb33c783`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl
Upload date: Apr 4, 2026
Size: 9.7 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm	Hash digest
SHA256	`cea44b06772233bf21de8210c88fba356fb5507db790261a5fdf9006fc960912`
MD5	`48102d15d71cd089c995877e59ef80cb`
BLAKE2b-256	`12242d46ca1e0c1f9d7d726c663897575121108fe06224ae640db55883e03f6a`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl
Upload date: Apr 4, 2026
Size: 9.4 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm	Hash digest
SHA256	`186eca29b663b7238de212ebe2f55ba0f5c48558f340b73d4112b80fab1e6370`
MD5	`1146181e6c769296fac9748802c99c58`
BLAKE2b-256	`5594c56043a39a796601c44013b1cbd9a043ccd1a0fc6dc1f43efe79a9717a3e`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Apr 4, 2026
Size: 9.3 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`f23c3e3b6b037fbf2c7d208278cb0432da4a76741c55379549ae217221b99dd6`
MD5	`33b1b5fd20254ae51fef6fe86c042f6d`
BLAKE2b-256	`899a686d9422231785158bba732e4d7967373866d6266feb16daacfa033afdbe`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Upload date: Apr 4, 2026
Size: 8.4 MB
Tags: CPython 3.9+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`a5ca64c15e74b8e645277aa0fae1dc2a02fe65e296023fb2c96e37c20933f056`
MD5	`bf677a5e7cdce4cb6e5637ca62c13a7c`
BLAKE2b-256	`caec7220bb4674c7b026548059f38743bb39a383291adf28274719a4c288fa1d`

See more details on using hashes here.

File details

Details for the file edgequake_litellm-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: edgequake_litellm-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl
Upload date: Apr 4, 2026
Size: 8.7 MB
Tags: CPython 3.9+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for edgequake_litellm-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`e0c482759cdd155b5dd6f471efe73b73091430fe1b5ce874e84d7f71b16fa130`
MD5	`a731763701c3185bcf3dccc62c6ff3ed`
BLAKE2b-256	`2fd859d263dcc4c2dd65581f01a82bd25cf549c21dbcb5e00c1b051e8feaffbc`

See more details on using hashes here.

edgequake-litellm 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

edgequake-litellm

Features

What's New in 0.2.0

Installation

Quick Start

Provider Routing

API Reference

completion(model, messages, **kwargs) → ModelResponseCompat

acompletion(model, messages, stream=False, **kwargs)

stream(model, messages, **kwargs) → AsyncGenerator[StreamChunk, None]

embedding(model, input, **kwargs) → EmbeddingResponseCompat

aembedding(model, input, **kwargs) → EmbeddingResponseCompat

stream_chunk_builder(chunks, messages=None) → ModelResponseCompat

Configuration

Exception Hierarchy

Environment Variables

Development

Prerequisites

Build from source

Running E2E tests

Publishing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

`completion(model, messages, **kwargs) → ModelResponseCompat`

`acompletion(model, messages, stream=False, **kwargs)`

`stream(model, messages, **kwargs) → AsyncGenerator[StreamChunk, None]`

`embedding(model, input, **kwargs) → EmbeddingResponseCompat`

`aembedding(model, input, **kwargs) → EmbeddingResponseCompat`

`stream_chunk_builder(chunks, messages=None) → ModelResponseCompat`