A multiplexer for Large Language Model APIs built on the OpenAI SDK. It combines quotas from multiple models and automatically uses fallback models when the primary models are rate limited.

These details have not been verified by PyPI

Project links

Project description

Multiplexer LLM (Python)

Unlock the Power of Distributed AI 🚀

A lightweight Python library that combines the quotas of multiple open source LLM providers with a single unified API. Seamlessly distribute your requests across various providers hosting open source models, ensuring maximum throughput and reliability.

The Problem: Limited AI Resources

❌ Rate Limit Errors: "Rate limit exceeded" errors hinder your application's performance
❌ Limited Throughput: Single provider constraints limit your AI capabilities
❌ Unpredictable Failures: Rate limits can occur at critical moments
❌ Manual Intervention: Switching providers requires code changes

The Solution: Unified Access to Multiple Providers

✅ Increased Throughput: Combine quotas from multiple open source LLM providers
✅ Error Resilience: Automatic failover when one provider hits rate limits
✅ Seamless Integration: Compatible with OpenAI SDK for easy adoption
✅ Smart Load Balancing: Weight-based distribution across providers for optimal performance

Key Benefits

🚀 Scalable AI: Combine resources from multiple providers for enhanced capabilities
🛡️ Error Prevention: Automatic failover minimizes rate limit failures
⚡ High Availability: Seamless switching between providers ensures continuous operation
🔌 OpenAI SDK Compatibility: Works with existing OpenAI SDK code
📊 Usage Analytics: Track provider performance and rate limits

How It Works

Single Model:        [Model A: 10K RPM] ❌ Rate Limit Error at 10,001 requests
Multiple Providers:  [Provider 1: 10K] + [Provider 2: 15K] + [Provider 3: 20K] = 45,000 RPM ✅
Multiple Models:     [Model A: 10K] + [Model B: 50K] + [Model C: 15K] = 75,000 RPM ✅✅

Installation

pip install multiplexer-llm

The package requires Python 3.8+ and automatically installs the OpenAI Python SDK as a dependency.

Quick Start

import asyncio
import os
from multiplexer_llm import Multiplexer
from openai import AsyncOpenAI

async def main():
    # Create client instances for a few open source models
    model1 = AsyncOpenAI(
        api_key=os.getenv("MODEL1_API_KEY"),
        base_url="https://api.model1.com/v1/",
    )

    model2 = AsyncOpenAI(
        api_key=os.getenv("MODEL2_API_KEY"),
        base_url="https://api.model2.org/v1",
    )

    # Initialize multiplexer
    async with Multiplexer() as multiplexer:
        # Add models with weights
        multiplexer.add_model(model1, 5, "model1-large")
        multiplexer.add_model(model2, 3, "model2-base")

        # Use like a regular OpenAI client
        completion = await multiplexer.chat.completions.create(
            model="placeholder",  # Will be overridden by selected model
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"},
            ],
        )

        print(completion.choices[0].message.content)
        print("Model usage stats:", multiplexer.get_stats())

# Run the async function
asyncio.run(main())

How Primary and Fallback Models Work

The multiplexer operates with a two-tier system:

Primary Models (`add_model`)

First choice: Used when available
Weight-based selection: Higher weights = higher probability of selection

Fallback Models (`add_fallback_model`)

Backup safety net: Activated when all primary models hit rate limits

API Examples

Creating a Multiplexer

from multiplexer_llm import Multiplexer

# Create multiplexer instance
multiplexer = Multiplexer()

# Or use as async context manager (recommended)
async with Multiplexer() as multiplexer:
    # Your code here
    pass

Adding Models

# Add a primary model
multiplexer.add_model(client: AsyncOpenAI, weight: int, model_name: str)

# Add a fallback model
multiplexer.add_fallback_model(client: AsyncOpenAI, weight: int, model_name: str)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About Haven Network

Haven Network builds open-source tools to help online communities produce high-quality data for multi-modal AI, with a strong focus on local inference and data privacy.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Jan 20, 2026

0.2.2

Jan 19, 2026

0.2.1

Jan 19, 2026

0.2.0

Jan 18, 2026

0.1.7

Jan 18, 2026

0.1.6

Jan 18, 2026

0.1.5

Jul 17, 2025

0.1.4

Jul 3, 2025

0.1.3

Jun 28, 2025

0.1.2

Jun 28, 2025

This version

0.1.1

Jun 27, 2025

0.1.0

Jun 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiplexer_llm-0.1.1.tar.gz (16.1 kB view details)

Uploaded Jun 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multiplexer_llm-0.1.1-py3-none-any.whl (10.1 kB view details)

Uploaded Jun 27, 2025 Python 3

File details

Details for the file multiplexer_llm-0.1.1.tar.gz.

File metadata

Download URL: multiplexer_llm-0.1.1.tar.gz
Upload date: Jun 27, 2025
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for multiplexer_llm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9ed00a923eab56d65d0c6572242abe18fe21c85e90d786af7a2a88f871bbc2da`
MD5	`462947a2b145ea11f51b01e82c825532`
BLAKE2b-256	`ac462485d786373b30bf4d98858fe962152063cc91d3fb6fd27760b3ce3308c4`

See more details on using hashes here.

File details

Details for the file multiplexer_llm-0.1.1-py3-none-any.whl.

File metadata

Download URL: multiplexer_llm-0.1.1-py3-none-any.whl
Upload date: Jun 27, 2025
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for multiplexer_llm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86720f1968beae4286c5efc85dfeacb2590daf5af40bc49d5168bc0ea8a99a68`
MD5	`87bee1ea6f30f0604d2746bf16094126`
BLAKE2b-256	`b1ace803aafae7f164ada344e9224eacbc744cdd045ae9c626265bc7ad59821b`

See more details on using hashes here.

multiplexer-llm 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Multiplexer LLM (Python)

The Problem: Limited AI Resources

The Solution: Unified Access to Multiple Providers

Key Benefits

How It Works

Installation

Quick Start

How Primary and Fallback Models Work

Primary Models (add_model)

Fallback Models (add_fallback_model)

API Examples

Creating a Multiplexer

Adding Models

Contributing

About Haven Network

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Primary Models (`add_model`)

Fallback Models (`add_fallback_model`)