Skip to main content

Lightweight Mixture of Agents pipeline framework.

Project description

mixture-llm

Test Release PyPI License: MIT Docs

Combine LLMs to beat the best single LLM.

The Mixture-of-Agents architecture achieved 65.1% on AlpacaEval 2.0 using only open-source models—surpassing GPT-4o's 57.5%. This library gives you the building blocks to construct these pipelines.

Install

pip install mixture-llm

Quick start

from mixture_llm import Propose, Aggregate, run

pipeline = [
    Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-3.3-70b"]),
    Aggregate("gpt-5-nano-2025-08-07"),
]

result, history = await run(pipeline, "What is quantum computing?", my_client)

Paper-accurate pipelines

Together MoA (65.1% AlpacaEval)

The benchmark-winning configuration from Wang et al. (2024): 3 layers, 6 diverse proposers, Qwen aggregator.

PROPOSERS = [
    "wizardlm-2-8x22b",
    "qwen1.5-110b-chat",
    "qwen1.5-72b-chat",
    "llama-3-70b-instruct",
    "mixtral-8x22b-instruct",
    "dbrx-instruct",
]

together_moa = [
    Propose(PROPOSERS, temp=0.7, max_tokens=512),
    Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
    Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
    Aggregate("qwen1.5-110b-chat"),
]

MoA-Lite (59.3% AlpacaEval)

Cost-optimized 2-layer variant—still beats GPT-4o.

moa_lite = [
    Propose(PROPOSERS, temp=0.7, max_tokens=512),
    Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
    Aggregate("qwen1.5-72b-chat"),
]

Self-MoA (+6.6% over standard MoA)

Li et al. (2025) showed that sampling one top model multiple times can outperform diverse model mixtures.

# Same model, multiple samples via temperature
self_moa = [
    Propose(["gpt-5-nano-2025-08-07"] * 6, temp=0.7),
    Aggregate("gpt-5-nano-2025-08-07"),
]

With robustness (shuffle + dropout)

Prevents positional bias and improves diversity.

robust_moa = [
    Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-70b", "gemini-2.5-flash"]),
    Shuffle(),
    Dropout(0.2),
    Aggregate("gpt-5-nano-2025-08-07"),
]

Steps

LLM steps — call models:

  • Propose(agents) — generate initial responses in parallel
  • Synthesize(agents) — each agent synthesizes all previous outputs
  • Aggregate(agent) — single model combines everything into final output
  • Refine(agents) — improve each response individually
  • Rank(agent, n) — select top n responses by quality
  • Vote(agent) — pick consensus answer

Transform steps — manipulate responses:

  • Shuffle() — randomize order (prevents position bias)
  • Dropout(rate) — randomly drop responses (improves robustness)
  • Sample(n) — random subset
  • Take(n) — first n responses
  • Filter(fn) — keep responses matching predicate
  • Map(fn) — transform each response

Configuration

Every LLM step accepts temp and max_tokens:

Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5"], temp=0.9, max_tokens=4096)

Override the synthesis prompt:

Aggregate("gpt-5-nano-2025-08-07", prompt="Pick the single best response and return it verbatim.")

Client examples

Your client is an async function with this signature:

async def client(model, messages, temp, max_tokens) -> tuple[str, int, int]:
    # Returns (response_text, input_tokens, output_tokens)

OpenAI SDK (OpenAI + Anthropic models)

from openai import AsyncOpenAI

openai_client = AsyncOpenAI()
anthropic_client = AsyncOpenAI(
    base_url="https://api.anthropic.com/v1/",
    api_key=os.environ["ANTHROPIC_API_KEY"],
)

async def multi_provider_client(model, messages, temp, max_tokens):
    client = anthropic_client if model.startswith("claude") else openai_client
    # GPT-5: max_completion_tokens, no temperature, minimal reasoning
    is_gpt5 = model.startswith("gpt-5")
    params = {"model": model, "messages": messages}
    params.update({"max_completion_tokens": max_tokens, "reasoning_effort": "minimal"} if is_gpt5 else {"max_tokens": max_tokens, "temperature": temp})
    resp = await client.chat.completions.create(**params)
    return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens

# Mix providers in one pipeline
pipeline = [
    Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "gpt-5-nano-2025-08-07"]),
    Aggregate("claude-sonnet-4-5"),
]

OpenRouter (access all models via one API)

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

async def openrouter_client(model, messages, temp, max_tokens):
    resp = await client.chat.completions.create(
        model=model, messages=messages, temperature=temp, max_tokens=max_tokens
    )
    return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens

# Together MoA models via OpenRouter
PROPOSERS = [
    "qwen/qwen-2.5-72b-instruct",
    "meta-llama/llama-3.3-70b-instruct",
    "mistralai/mixtral-8x22b-instruct",
]

together_moa_openrouter = [
    Propose(PROPOSERS, temp=0.7, max_tokens=512),
    Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
    Aggregate("qwen/qwen-2.5-72b-instruct"),
]

Groq via LiteLLM (free tier)

Groq offers free access to several models. Great for experimentation.

from litellm import acompletion

async def groq_client(model, messages, temp, max_tokens):
    resp = await acompletion(
        model=f"groq/{model}", messages=messages, temperature=temp, max_tokens=max_tokens
    )
    return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens

# Free Groq models (check console.groq.com/docs/rate-limits for current list)
GROQ_FREE = [
    "llama-3.3-70b-versatile",
    "llama-3.1-8b-instant",
    "qwen/qwen3-32b",
    "meta-llama/llama-4-scout-17b-16e-instruct",
]

free_moa = [
    Propose(GROQ_FREE, temp=0.7, max_tokens=512),
    Aggregate("llama-3.3-70b-versatile"),
]

# Self-MoA with Groq (single model, multiple samples)
free_self_moa = [
    Propose(["llama-3.3-70b-versatile"] * 4, temp=0.7),
    Aggregate("llama-3.3-70b-versatile"),
]

Examples

The examples/ directory contains tested, runnable scripts for different providers. See examples/EXAMPLES.md for detailed documentation.

Example Provider What You'll Learn
openai_basic.py OpenAI Basic MoA pattern (Propose → Aggregate), client setup, token tracking
openai_self_moa.py OpenAI Self-MoA technique—one model sampled 6 times beats diverse mixtures
multi_provider.py OpenAI + Anthropic Provider routing, Shuffle step to prevent position bias
openrouter_moa.py OpenRouter 3-layer MoA (Propose → Synthesize → Aggregate), paper configuration
groq_free.py Groq Free experimentation, LiteLLM integration, Dropout for robustness
with_history.py Groq Pipeline debugging, Rank step, execution history inspection
# Install and run
pip install -e ".[examples]"
export OPENAI_API_KEY=sk-...
python examples/openai_basic.py

# Or try free with Groq
export GROQ_API_KEY=gsk_...
python examples/groq_free.py

Key findings from the research

  • Aggregator quality matters 2x more than proposer quality — invest in your final model
  • 3 layers is the sweet spot — diminishing returns beyond this
  • Diversity vs quality tradeoff — Self-MoA shows a single great model can beat diverse mediocre ones
  • 6 proposers optimal — gains diminish after this point

References

  • Wang et al. "Mixture-of-Agents Enhances Large Language Model Capabilities" (2024) — arXiv:2406.04692
  • Li et al. "Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?" (2025) — arXiv:2502.00674

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mixture_llm-0.1.3.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mixture_llm-0.1.3-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file mixture_llm-0.1.3.tar.gz.

File metadata

  • Download URL: mixture_llm-0.1.3.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mixture_llm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 72d85880d754b54c4c49b25699b88906335154359ecc38846cdc94d35a08af7f
MD5 8f961443ca06488158455461185f354d
BLAKE2b-256 e85700e155fb298c3f7646e7fc84cb38f0aa92fe09246df6da07ecb07264e48d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mixture_llm-0.1.3.tar.gz:

Publisher: release-please.yaml on leonardosul/mixture-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mixture_llm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: mixture_llm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mixture_llm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 08c07d49cc37182c2fba8aad9aacec5c1fb19b91ec694ae1a7c59850001ea83c
MD5 9d72f12baa2f9d41fa4a728a1aefb303
BLAKE2b-256 068473fd820283e72d186c511bb68d90adb72ee639642f35e4d90b60c1deb381

See more details on using hashes here.

Provenance

The following attestation bundles were made for mixture_llm-0.1.3-py3-none-any.whl:

Publisher: release-please.yaml on leonardosul/mixture-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page