Lightweight Mixture of Agents pipeline framework.
Project description
mixture-llm
Combine LLMs to beat the best single LLM.
The Mixture-of-Agents architecture achieved 65.1% on AlpacaEval 2.0 using only open-source models—surpassing GPT-4o's 57.5%. This library gives you the building blocks to construct these pipelines.
Install
pip install mixture-llm
Quick start
from mixture_llm import Propose, Aggregate, run
pipeline = [
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-3.3-70b"]),
Aggregate("gpt-5-nano-2025-08-07"),
]
result, history = await run(pipeline, "What is quantum computing?", my_client)
Paper-accurate pipelines
Together MoA (65.1% AlpacaEval)
The benchmark-winning configuration from Wang et al. (2024): 3 layers, 6 diverse proposers, Qwen aggregator.
PROPOSERS = [
"wizardlm-2-8x22b",
"qwen1.5-110b-chat",
"qwen1.5-72b-chat",
"llama-3-70b-instruct",
"mixtral-8x22b-instruct",
"dbrx-instruct",
]
together_moa = [
Propose(PROPOSERS, temp=0.7, max_tokens=512),
Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
Aggregate("qwen1.5-110b-chat"),
]
MoA-Lite (59.3% AlpacaEval)
Cost-optimized 2-layer variant—still beats GPT-4o.
moa_lite = [
Propose(PROPOSERS, temp=0.7, max_tokens=512),
Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
Aggregate("qwen1.5-72b-chat"),
]
Self-MoA (+6.6% over standard MoA)
Li et al. (2025) showed that sampling one top model multiple times can outperform diverse model mixtures.
# Same model, multiple samples via temperature
self_moa = [
Propose(["gpt-5-nano-2025-08-07"] * 6, temp=0.7),
Aggregate("gpt-5-nano-2025-08-07"),
]
With robustness (shuffle + dropout)
Prevents positional bias and improves diversity.
robust_moa = [
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-70b", "gemini-2.5-flash"]),
Shuffle(),
Dropout(0.2),
Aggregate("gpt-5-nano-2025-08-07"),
]
Steps
LLM steps — call models:
Propose(agents)— generate initial responses in parallelSynthesize(agents)— each agent synthesizes all previous outputsAggregate(agent)— single model combines everything into final outputRefine(agents)— improve each response individuallyRank(agent, n)— select top n responses by qualityVote(agent)— pick consensus answer
Transform steps — manipulate responses:
Shuffle()— randomize order (prevents position bias)Dropout(rate)— randomly drop responses (improves robustness)Sample(n)— random subsetTake(n)— first n responsesFilter(fn)— keep responses matching predicateMap(fn)— transform each response
Configuration
Every LLM step accepts temp and max_tokens:
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5"], temp=0.9, max_tokens=4096)
Override the synthesis prompt:
Aggregate("gpt-5-nano-2025-08-07", prompt="Pick the single best response and return it verbatim.")
Client examples
Your client is an async function with this signature:
async def client(model, messages, temp, max_tokens) -> tuple[str, int, int]:
# Returns (response_text, input_tokens, output_tokens)
OpenAI SDK (OpenAI + Anthropic models)
from openai import AsyncOpenAI
openai_client = AsyncOpenAI()
anthropic_client = AsyncOpenAI(
base_url="https://api.anthropic.com/v1/",
api_key=os.environ["ANTHROPIC_API_KEY"],
)
async def multi_provider_client(model, messages, temp, max_tokens):
client = anthropic_client if model.startswith("claude") else openai_client
# GPT-5: max_completion_tokens, no temperature, minimal reasoning
is_gpt5 = model.startswith("gpt-5")
params = {"model": model, "messages": messages}
params.update({"max_completion_tokens": max_tokens, "reasoning_effort": "minimal"} if is_gpt5 else {"max_tokens": max_tokens, "temperature": temp})
resp = await client.chat.completions.create(**params)
return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens
# Mix providers in one pipeline
pipeline = [
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "gpt-5-nano-2025-08-07"]),
Aggregate("claude-sonnet-4-5"),
]
OpenRouter (access all models via one API)
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
async def openrouter_client(model, messages, temp, max_tokens):
resp = await client.chat.completions.create(
model=model, messages=messages, temperature=temp, max_tokens=max_tokens
)
return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens
# Together MoA models via OpenRouter
PROPOSERS = [
"qwen/qwen-2.5-72b-instruct",
"meta-llama/llama-3.3-70b-instruct",
"mistralai/mixtral-8x22b-instruct",
]
together_moa_openrouter = [
Propose(PROPOSERS, temp=0.7, max_tokens=512),
Synthesize(PROPOSERS, temp=0.7, max_tokens=512),
Aggregate("qwen/qwen-2.5-72b-instruct"),
]
Groq via LiteLLM (free tier)
Groq offers free access to several models. Great for experimentation.
from litellm import acompletion
async def groq_client(model, messages, temp, max_tokens):
resp = await acompletion(
model=f"groq/{model}", messages=messages, temperature=temp, max_tokens=max_tokens
)
return resp.choices[0].message.content, resp.usage.prompt_tokens, resp.usage.completion_tokens
# Free Groq models (check console.groq.com/docs/rate-limits for current list)
GROQ_FREE = [
"llama-3.3-70b-versatile",
"llama-3.1-8b-instant",
"qwen/qwen3-32b",
"meta-llama/llama-4-scout-17b-16e-instruct",
]
free_moa = [
Propose(GROQ_FREE, temp=0.7, max_tokens=512),
Aggregate("llama-3.3-70b-versatile"),
]
# Self-MoA with Groq (single model, multiple samples)
free_self_moa = [
Propose(["llama-3.3-70b-versatile"] * 4, temp=0.7),
Aggregate("llama-3.3-70b-versatile"),
]
Examples
The examples/ directory contains tested, runnable scripts for different providers. See examples/EXAMPLES.md for detailed documentation.
| Example | Provider | What You'll Learn |
|---|---|---|
openai_basic.py |
OpenAI | Basic MoA pattern (Propose → Aggregate), client setup, token tracking |
openai_self_moa.py |
OpenAI | Self-MoA technique—one model sampled 6 times beats diverse mixtures |
multi_provider.py |
OpenAI + Anthropic | Provider routing, Shuffle step to prevent position bias |
openrouter_moa.py |
OpenRouter | 3-layer MoA (Propose → Synthesize → Aggregate), paper configuration |
groq_free.py |
Groq | Free experimentation, LiteLLM integration, Dropout for robustness |
with_history.py |
Groq | Pipeline debugging, Rank step, execution history inspection |
# Install and run
pip install -e ".[examples]"
export OPENAI_API_KEY=sk-...
python examples/openai_basic.py
# Or try free with Groq
export GROQ_API_KEY=gsk_...
python examples/groq_free.py
Key findings from the research
- Aggregator quality matters 2x more than proposer quality — invest in your final model
- 3 layers is the sweet spot — diminishing returns beyond this
- Diversity vs quality tradeoff — Self-MoA shows a single great model can beat diverse mediocre ones
- 6 proposers optimal — gains diminish after this point
References
- Wang et al. "Mixture-of-Agents Enhances Large Language Model Capabilities" (2024) — arXiv:2406.04692
- Li et al. "Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?" (2025) — arXiv:2502.00674
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mixture_llm-0.1.3.tar.gz.
File metadata
- Download URL: mixture_llm-0.1.3.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72d85880d754b54c4c49b25699b88906335154359ecc38846cdc94d35a08af7f
|
|
| MD5 |
8f961443ca06488158455461185f354d
|
|
| BLAKE2b-256 |
e85700e155fb298c3f7646e7fc84cb38f0aa92fe09246df6da07ecb07264e48d
|
Provenance
The following attestation bundles were made for mixture_llm-0.1.3.tar.gz:
Publisher:
release-please.yaml on leonardosul/mixture-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mixture_llm-0.1.3.tar.gz -
Subject digest:
72d85880d754b54c4c49b25699b88906335154359ecc38846cdc94d35a08af7f - Sigstore transparency entry: 1417277627
- Sigstore integration time:
-
Permalink:
leonardosul/mixture-llm@9938230f6cc5530700fa1428b0d44671d2dc0bfc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/leonardosul
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yaml@9938230f6cc5530700fa1428b0d44671d2dc0bfc -
Trigger Event:
push
-
Statement type:
File details
Details for the file mixture_llm-0.1.3-py3-none-any.whl.
File metadata
- Download URL: mixture_llm-0.1.3-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08c07d49cc37182c2fba8aad9aacec5c1fb19b91ec694ae1a7c59850001ea83c
|
|
| MD5 |
9d72f12baa2f9d41fa4a728a1aefb303
|
|
| BLAKE2b-256 |
068473fd820283e72d186c511bb68d90adb72ee639642f35e4d90b60c1deb381
|
Provenance
The following attestation bundles were made for mixture_llm-0.1.3-py3-none-any.whl:
Publisher:
release-please.yaml on leonardosul/mixture-llm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mixture_llm-0.1.3-py3-none-any.whl -
Subject digest:
08c07d49cc37182c2fba8aad9aacec5c1fb19b91ec694ae1a7c59850001ea83c - Sigstore transparency entry: 1417277647
- Sigstore integration time:
-
Permalink:
leonardosul/mixture-llm@9938230f6cc5530700fa1428b0d44671d2dc0bfc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/leonardosul
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-please.yaml@9938230f6cc5530700fa1428b0d44671d2dc0bfc -
Trigger Event:
push
-
Statement type: