Skip to main content

A lightweight, concurrency-safe credential orchestration runtime for AI API systems.

Project description

🗝️ KeyMesh

Lightweight, concurrency-safe credential orchestration for AI API systems.

PyPI version License: MIT Tool: uv

KeyMesh is a high-performance runtime designed to multiplex multiple API keys (e.g., OpenAI, Anthropic, Gemini) across highly concurrent workloads. It maximizes aggregate throughput by managing rate limits, cooldowns, and scheduling strategies without being tied to any specific LLM provider or SDK.


✨ Features

  • 🚀 Maximized Throughput: Pool multiple lower-tier keys to behave as a single high-tier endpoint.
  • 🛡️ Concurrency Safe: Native asyncio and multi-threaded synchronous support with granular locks for high-frequency safe acquisition.
  • 🔌 Sync & Async Native: Identical features available in both async-first runtimes and standard synchronous/threaded architectures.
  • 🔄 Pluggable Schedulers: Choose between RoundRobin, LeastBusy, or Weighted strategies.
  • ❄️ Smart Cooldowns: Automatically skips rate-limited keys and reintroduces them after a configurable backoff.
  • 📊 Health Monitoring: Tracks latency (EMA), success rates, and consecutive failures to prune dead credentials.
  • 💾 Flexible Storage: Memory and JSON persistent backends for both async (MemoryStorage, JSONStorage) and sync (SyncMemoryStorage, SyncJSONStorage) runtimes.
  • 🔌 Framework Agnostic: Zero dependencies on openai or anthropic SDKs. Use it with any HTTP client.

📦 Installation

KeyMesh is optimized for the uv package manager.

# Using uv
uv add keymesh

# Standard pip
pip install keymesh

🚀 Quick Start

KeyMesh stays out of your network stack. You acquire a key, use it with your preferred SDK, and report the outcome. To ensure high-throughput and concurrency safety, initialize a single client and pass the acquired keys dynamically per request.

⚡ Asynchronous Example

import asyncio
import time
from openai import AsyncOpenAI
from keymesh import KeyPool, SchedulerStrategy

# 1. Initialize a reusable LLM client once (reuses the TCP connection pool)
client = AsyncOpenAI()

async def make_request(pool: KeyPool):
    # 2. Acquire a key from the pool (non-blocking selection)
    key = await pool.acquire()
    
    start_time = time.monotonic()
    try:
        # 3. Create a request-scoped client with the acquired key
        scoped_client = client.with_options(api_key=key)
        response = await scoped_client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": "Hello KeyMesh Async!"}]
        )
        # 4. Release key back to the pool on success with latency tracking
        await pool.release(key, latency=time.monotonic() - start_time)
        print(f"Response: {response.choices[0].message.content}")
        
    except Exception as e:
        # 5. Handle failures or rate limits
        if "rate_limit" in str(e).lower():
            await pool.mark_rate_limited(key, cooldown=60.0)
        else:
            await pool.mark_failed(key)

async def main():
    pool = KeyPool(
        keys=["sk-key-1", "sk-key-2", "sk-key-3"],
        strategy=SchedulerStrategy.LEAST_BUSY
    )
    try:
        await make_request(pool)
    finally:
        await pool.close()

asyncio.run(main())

🔌 Synchronous Example (Thread-Safe)

import time
from openai import OpenAI
from keymesh import SyncKeyPool, SchedulerStrategy

# 1. Initialize a reusable LLM client once
client = OpenAI()

def make_request(pool: SyncKeyPool):
    # 2. Acquire a key synchronously (blocking/thread-safe)
    key = pool.acquire()
    
    start_time = time.monotonic()
    try:
        # 3. Create a request-scoped client with the acquired key
        scoped_client = client.with_options(api_key=key)
        response = scoped_client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": "Hello KeyMesh Sync!"}]
        )
        # 4. Release key back to the pool on success with latency tracking
        pool.release(key, latency=time.monotonic() - start_time)
        print(f"Response: {response.choices[0].message.content}")
        
    except Exception as e:
        # 5. Handle failures or rate limits
        if "rate_limit" in str(e).lower():
            pool.mark_rate_limited(key, cooldown=60.0)
        else:
            pool.mark_failed(key)

def main():
    pool = SyncKeyPool(
        keys=["sk-key-1", "sk-key-2", "sk-key-3"],
        strategy=SchedulerStrategy.LEAST_BUSY
    )
    try:
        make_request(pool)
    finally:
        pool.close()

main()

🔑 Key Management Integration Patterns

When load-balancing API requests concurrently, never recreate the client on every request (which destroys the connection pool) and never mutate client.api_key = key globally (which causes race conditions across concurrent tasks).

Instead, use one of these three concurrency-safe patterns:

Pattern 1: Request-Scoped Client Overrides (with_options)

Recommended for modern OpenAI SDKs. Generates a copy of the client config pointing to the new key, while sharing the underlying connection pool.

# Async
scoped_client = client.with_options(api_key=key)
response = await scoped_client.chat.completions.create(...)

# Sync
scoped_client = client.with_options(api_key=key)
response = scoped_client.chat.completions.create(...)

Pattern 2: Per-Request Custom Headers (extra_headers)

Injects the authorization key directly inside the request header without changing client-wide configurations.

# Async & Sync
response = await client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Query"}],
    extra_headers={"Authorization": f"Bearer {key}"}
)

Pattern 3: Automated Lifecycle Context Managers

Encapsulates acquiring, releasing, timing, and error state tracking into reusable Python context managers to prevent key leaks.

import time
import contextlib

@contextlib.asynccontextmanager
async def key_lifecycle(pool: KeyPool):
    key = await pool.acquire()
    start = time.monotonic()
    try:
        yield key
        await pool.release(key, latency=time.monotonic() - start)
    except Exception:
        await pool.mark_failed(key)
        raise

# Usage
async with key_lifecycle(pool) as key:
    scoped_client = client.with_options(api_key=key)
    response = await scoped_client.chat.completions.create(...)

🛠️ Architecture

KeyMesh follows a modular, thread-safe, and async-safe design:

  • KeyPool / SyncKeyPool: The central async / sync orchestrators.
  • Scheduler: Stateless selection logic for choosing the next key (e.g. RoundRobin, LeastBusy, Weighted).
  • KeyState / SyncKeyState: Thread-safe runtime metrics tracking per API key.
  • Storage / BaseSyncStorage: Pluggable persistence layers (In-Memory or JSON-backed) for both asynchronous and synchronous runtimes.

🛠️ Development

This project uses uv for development.

# Install dependencies
uv sync

# Run tests
uv run pytest

# Lint and Format
uv run ruff check .
uv run mypy .

📄 License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keymesh-0.1.2a0.tar.gz (54.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keymesh-0.1.2a0-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file keymesh-0.1.2a0.tar.gz.

File metadata

  • Download URL: keymesh-0.1.2a0.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for keymesh-0.1.2a0.tar.gz
Algorithm Hash digest
SHA256 3674a7b3150614b1109fcba19dd1eae390ca0897a19d0a1b530aa82c14707575
MD5 90a8f99f2805334875ea4aeec7d2190d
BLAKE2b-256 dbff0d0a83e314ff363e958496bd9ba21aa1cae9abc01a06dc52305cc150529a

See more details on using hashes here.

File details

Details for the file keymesh-0.1.2a0-py3-none-any.whl.

File metadata

  • Download URL: keymesh-0.1.2a0-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for keymesh-0.1.2a0-py3-none-any.whl
Algorithm Hash digest
SHA256 92a89f83606177cdd6f63d4ffa41829621935f84175743e6be5cbd07e8214895
MD5 ad3738949bd2f7e59a5a40a29e704800
BLAKE2b-256 66161aea142c602d7eae496cba861c78e4856a0fc4db71676de102859ba4973d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page