A lightweight, concurrency-safe credential orchestration runtime for AI API systems.
Project description
🗝️ KeyMesh
Lightweight, concurrency-safe credential orchestration for AI API systems.
KeyMesh is a high-performance runtime designed to multiplex multiple API keys (e.g., OpenAI, Anthropic, Gemini) across highly concurrent workloads. It maximizes aggregate throughput by managing rate limits, cooldowns, and scheduling strategies without being tied to any specific LLM provider or SDK.
✨ Features
- 🚀 Maximized Throughput: Pool multiple lower-tier keys to behave as a single high-tier endpoint.
- 🛡️ Concurrency Safe: Native
asyncioand multi-threaded synchronous support with granular locks for high-frequency safe acquisition. - 🔌 Sync & Async Native: Identical features available in both async-first runtimes and standard synchronous/threaded architectures.
- 🔄 Pluggable Schedulers: Choose between
RoundRobin,LeastBusy, orWeightedstrategies. - ❄️ Smart Cooldowns: Automatically skips rate-limited keys and reintroduces them after a configurable backoff.
- 📊 Health Monitoring: Tracks latency (EMA), success rates, and consecutive failures to prune dead credentials.
- 💾 Flexible Storage: Memory and JSON persistent backends for both async (
MemoryStorage,JSONStorage) and sync (SyncMemoryStorage,SyncJSONStorage) runtimes. - 🔌 Framework Agnostic: Zero dependencies on
openaioranthropicSDKs. Use it with any HTTP client.
📦 Installation
KeyMesh is optimized for the uv package manager.
# Using uv
uv add keymesh
# Standard pip
pip install keymesh
🚀 Quick Start
KeyMesh stays out of your network stack. You acquire a key, use it with your preferred SDK, and report the outcome. To ensure high-throughput and concurrency safety, initialize a single client and pass the acquired keys dynamically per request.
⚡ Asynchronous Example
import asyncio
import time
from openai import AsyncOpenAI
from keymesh import KeyPool, SchedulerStrategy
# 1. Initialize a reusable LLM client once (reuses the TCP connection pool)
client = AsyncOpenAI()
async def make_request(pool: KeyPool):
# 2. Acquire a key from the pool (non-blocking selection)
key = await pool.acquire()
start_time = time.monotonic()
try:
# 3. Create a request-scoped client with the acquired key
scoped_client = client.with_options(api_key=key)
response = await scoped_client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello KeyMesh Async!"}]
)
# 4. Release key back to the pool on success with latency tracking
await pool.release(key, latency=time.monotonic() - start_time)
print(f"Response: {response.choices[0].message.content}")
except Exception as e:
# 5. Handle failures or rate limits
if "rate_limit" in str(e).lower():
await pool.mark_rate_limited(key, cooldown=60.0)
else:
await pool.mark_failed(key)
async def main():
pool = KeyPool(
keys=["sk-key-1", "sk-key-2", "sk-key-3"],
strategy=SchedulerStrategy.LEAST_BUSY
)
try:
await make_request(pool)
finally:
await pool.close()
asyncio.run(main())
🔌 Synchronous Example (Thread-Safe)
import time
from openai import OpenAI
from keymesh import SyncKeyPool, SchedulerStrategy
# 1. Initialize a reusable LLM client once
client = OpenAI()
def make_request(pool: SyncKeyPool):
# 2. Acquire a key synchronously (blocking/thread-safe)
key = pool.acquire()
start_time = time.monotonic()
try:
# 3. Create a request-scoped client with the acquired key
scoped_client = client.with_options(api_key=key)
response = scoped_client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello KeyMesh Sync!"}]
)
# 4. Release key back to the pool on success with latency tracking
pool.release(key, latency=time.monotonic() - start_time)
print(f"Response: {response.choices[0].message.content}")
except Exception as e:
# 5. Handle failures or rate limits
if "rate_limit" in str(e).lower():
pool.mark_rate_limited(key, cooldown=60.0)
else:
pool.mark_failed(key)
def main():
pool = SyncKeyPool(
keys=["sk-key-1", "sk-key-2", "sk-key-3"],
strategy=SchedulerStrategy.LEAST_BUSY
)
try:
make_request(pool)
finally:
pool.close()
main()
🔑 Key Management Integration Patterns
When load-balancing API requests concurrently, never recreate the client on every request (which destroys the connection pool) and never mutate client.api_key = key globally (which causes race conditions across concurrent tasks).
Instead, use one of these three concurrency-safe patterns:
Pattern 1: Request-Scoped Client Overrides (with_options)
Recommended for modern OpenAI SDKs. Generates a copy of the client config pointing to the new key, while sharing the underlying connection pool.
# Async
scoped_client = client.with_options(api_key=key)
response = await scoped_client.chat.completions.create(...)
# Sync
scoped_client = client.with_options(api_key=key)
response = scoped_client.chat.completions.create(...)
Pattern 2: Per-Request Custom Headers (extra_headers)
Injects the authorization key directly inside the request header without changing client-wide configurations.
# Async & Sync
response = await client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Query"}],
extra_headers={"Authorization": f"Bearer {key}"}
)
Pattern 3: Automated Lifecycle Context Managers
Encapsulates acquiring, releasing, timing, and error state tracking into reusable Python context managers to prevent key leaks.
import time
import contextlib
@contextlib.asynccontextmanager
async def key_lifecycle(pool: KeyPool):
key = await pool.acquire()
start = time.monotonic()
try:
yield key
await pool.release(key, latency=time.monotonic() - start)
except Exception:
await pool.mark_failed(key)
raise
# Usage
async with key_lifecycle(pool) as key:
scoped_client = client.with_options(api_key=key)
response = await scoped_client.chat.completions.create(...)
🛠️ Architecture
KeyMesh follows a modular, thread-safe, and async-safe design:
- KeyPool / SyncKeyPool: The central async / sync orchestrators.
- Scheduler: Stateless selection logic for choosing the next key (e.g.
RoundRobin,LeastBusy,Weighted). - KeyState / SyncKeyState: Thread-safe runtime metrics tracking per API key.
- Storage / BaseSyncStorage: Pluggable persistence layers (In-Memory or JSON-backed) for both asynchronous and synchronous runtimes.
🛠️ Development
This project uses uv for development.
# Install dependencies
uv sync
# Run tests
uv run pytest
# Lint and Format
uv run ruff check .
uv run mypy .
📄 License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keymesh-0.1.2a0.tar.gz.
File metadata
- Download URL: keymesh-0.1.2a0.tar.gz
- Upload date:
- Size: 54.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3674a7b3150614b1109fcba19dd1eae390ca0897a19d0a1b530aa82c14707575
|
|
| MD5 |
90a8f99f2805334875ea4aeec7d2190d
|
|
| BLAKE2b-256 |
dbff0d0a83e314ff363e958496bd9ba21aa1cae9abc01a06dc52305cc150529a
|
File details
Details for the file keymesh-0.1.2a0-py3-none-any.whl.
File metadata
- Download URL: keymesh-0.1.2a0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a89f83606177cdd6f63d4ffa41829621935f84175743e6be5cbd07e8214895
|
|
| MD5 |
ad3738949bd2f7e59a5a40a29e704800
|
|
| BLAKE2b-256 |
66161aea142c602d7eae496cba861c78e4856a0fc4db71676de102859ba4973d
|