Config-driven multi-agent debate council — add a reliability layer to any LLM pipeline
Project description
agent-council
A reliability layer for LLM pipelines. Multiple agents with distinct personas debate a topic across iterative rounds; a judge synthesizes a final verdict.
Drop it into any agent to pressure-test a decision before committing to it.
Inspired by mshumer/llmcouncil and the Mixture of Agents research.
Install
pip install agent-council
Requires Python ≥ 3.11.
Programmatic usage
This is the primary interface — use it inside your own agents and pipelines.
from agent_council import CouncilOrchestrator, MemberConfig, JudgeConfig
orchestrator = CouncilOrchestrator(
members=[
MemberConfig(
id="analyst",
name="The Analyst",
provider="anthropic",
model="claude-sonnet-4-6",
persona="Rigorous analytical thinker. Evidence-based, structured.",
),
MemberConfig(
id="skeptic",
name="The Skeptic",
provider="openai",
model="gpt-4o",
persona="Challenge every assumption. Surface hidden risks.",
),
],
judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
rounds=3,
early_exit_threshold=0.85,
)
session, verdict = await orchestrator.run("Should we adopt microservices?")
print(verdict.verdict) # synthesized conclusion
print(verdict.consensus_level) # ConsensusLevel.STRONG / MODERATE / WEAK / NONE
print(verdict.consensus_score) # float 0–1
print(verdict.key_agreements) # list[str]
print(verdict.dissenting_views) # list[str]
API keys are read from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY) by default.
Streaming callbacks
Callbacks fire as each member finishes — no waiting for a full round to complete. Both sync and async callbacks are supported.
def on_member(resp):
print(f"[{resp.member_name}] {resp.stance} ({resp.confidence:.0%} confident)")
async def on_round(round_):
await db.save_round(round_) # async is fine too
session, verdict = await orchestrator.run(
"Should we rewrite in Rust?",
on_member_response=on_member,
on_round_complete=on_round,
)
Provider overrides
Override temperature, token limits, or use a custom Ollama endpoint:
from agent_council import (
CouncilOrchestrator, MemberConfig, JudgeConfig,
ProvidersConfig, AnthropicProviderConfig, OllamaProviderConfig,
)
orchestrator = CouncilOrchestrator(
members=[...],
judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
provider_configs=ProvidersConfig(
anthropic=AnthropicProviderConfig(temperature=0.5, max_tokens=1024),
ollama=OllamaProviderConfig(base_url="http://my-ollama:11434"),
),
)
Result types
@dataclass FinalVerdict:
topic: str
verdict: str
consensus_level: ConsensusLevel # STRONG / MODERATE / WEAK / NONE
consensus_score: float # 0–1
key_agreements: list[str]
dissenting_views: list[str]
rounds_completed: int
early_exit: bool
total_duration_seconds: float
@dataclass CouncilSession:
session_id: str
topic: str
started_at: datetime
rounds: list[DebateRound] # full transcript
verdict: FinalVerdict | None
@dataclass MemberResponse:
member_id, member_name, round_number: str / int
content: str # full response text
stance: str # one-line position summary
confidence: float # 0–1
changed_position: bool
Why Reliability
LLMs can be confident and wrong. A single-shot response hides uncertainty and failure modes (hallucinations, missed trade‑offs, overfitting to prompt). Agent Council creates informed disagreement and then reconciles it:
- Independent agents surface blind spots and competing views.
- Iterative rounds reward stable, consistent positions (consensus score).
- A neutral judge synthesizes agreements and dissent for transparent decision‑making.
What improves:
- Decision quality: fewer unexamined assumptions, clearer trade‑offs.
- Traceability: full transcript and structured verdict for audits/reviews.
- Safety: configurable early exit threshold to avoid premature consensus.
- Extensibility: mix providers/models (Anthropic, OpenAI, OpenRouter, Ollama).
Examples & Use Cases
- Architecture choices: “Monolith vs microservices for product X?”
- Launch reviews: “Are we production‑ready? What risks remain?”
- AI safety checks: “Could this prompt produce unsafe output?”
- Product strategy: “Should pricing move to usage‑based?”
- Code migration: “Rewrite to Rust? What are the costs/benefits?”
Programmatic snippet (minimal):
session, verdict = await CouncilOrchestrator(
members=[
MemberConfig(id="analyst", name="Analyst", provider="openrouter", model="anthropic/claude-3.5-sonnet"),
MemberConfig(id="skeptic", name="Skeptic", provider="openrouter", model="openai/gpt-4o"),
],
judge=JudgeConfig(provider="openrouter", model="anthropic/claude-3.5-sonnet"),
).run("Should we adopt microservices?")
print(verdict.verdict)
CLI:
export OPENROUTER_API_KEY=sk-or-...
council review "Remote work vs office?" -c config/council.yaml
HTTP (server):
pip install "agent-council[server]"
council serve
# POST http://127.0.0.1:8000/review {"topic":"Should we rewrite in Rust?"}
How the debate works
Round 1 — all members respond independently to the topic
Round 2..N — each member reads all peers' responses and may revise
→ early exit if consensus score ≥ threshold
Judge — reads full transcript, synthesizes final verdict
Consensus score = avg_confidence × (1 − changed_fraction)
Rewards both high confidence and stability across rounds.
Adding a provider
Implement BaseModelAdapter (one method) and register it in the factory:
# agent_council/adapters/my_provider.py
from agent_council.adapters.base import BaseModelAdapter
class MyProviderAdapter(BaseModelAdapter):
async def complete(self, system: str, user: str) -> str:
# call your model here
...
# agent_council/adapters/__init__.py — add to build_adapter()
case "myprovider":
return MyProviderAdapter(member_cfg, provider_cfg)
Config file (optional)
For teams who prefer YAML over code:
# config/council.yaml
council:
debate_rounds: 3
early_exit_threshold: 0.85
members:
- id: "analyst"
name: "The Analyst"
provider: "anthropic"
model: "claude-sonnet-4-6"
persona: "Rigorous analytical thinker."
- id: "skeptic"
name: "The Skeptic"
provider: "openai"
model: "gpt-4o"
persona: "Challenge every assumption."
judge:
provider: "anthropic"
model: "claude-opus-4-6"
providers:
# Optional: use OpenRouter with OpenAI-compatible models
openrouter:
api_key_env: "OPENROUTER_API_KEY"
base_url: "https://openrouter.ai/api/v1"
max_tokens: 2048
temperature: 0.7
orchestrator = CouncilOrchestrator.from_config_file("config/council.yaml")
CLI (convenience)
# Install with CLI support (included by default)
pip install agent-council
export ANTHROPIC_API_KEY=sk-ant-...
council review "Is Python the best language for data science?"
council review "Should we rewrite in Rust?" --config config/council.yaml
council review "Remote work vs office?" --no-rounds
HTTP server (optional)
pip install "agent-council[server]"
council serve # http://127.0.0.1:8000
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness check |
POST |
/review |
Run debate, return full JSON result |
POST |
/review/stream |
Run debate, stream events via SSE |
Interactive docs at http://localhost:8000/docs.
SSE event stream (POST /review/stream):
data: {"event": "member_response", "data": {...}}
data: {"event": "round_complete", "data": {"round_number": 1, "consensus_score": 0.62}}
data: {"event": "verdict", "data": {...}}
Tracing
Enable JSON traces to inspect every response, round, and the final verdict. Two ways:
- Programmatic recorder
from agent_council.tracing import TraceRecorder
rec = TraceRecorder()
session, verdict = await orchestrator.run("Your topic", trace=rec)
path = rec.save("traces/")
print("Trace saved to", path)
- Build a trace from a finished session
from agent_council.tracing import TraceRecorder
rec = TraceRecorder.from_session(session, verdict)
rec.save("traces/")
Each trace is a single JSON file containing per-member responses, round summaries, and the final verdict.
Budget Awareness (optional)
Control spend with a simple per‑call budget guard. Set these env vars:
COUNCIL_COST_PER_CALL_USD— estimated cost charged per model call.COUNCIL_MAX_BUDGET_USD— cap for the whole run; if exceeded andCOUNCIL_BUDGET_HARD_STOPis true (default), the run stops.COUNCIL_BUDGET_HARD_STOP— set0/falsefor soft cap.
This wraps each provider adapter and enforces the budget before calling the model. For precise accounting, plug in your own adapter with exact token‑based pricing.
Project layout
agent_council/
├── __init__.py # public API
├── orchestrator.py # CouncilOrchestrator — primary entry point
├── member.py # prompt construction + JSON parsing
├── debate.py # round loop + consensus scoring
├── judge.py # synthesis + FinalVerdict
├── config.py # Pydantic config schema
├── types.py # result dataclasses
├── server.py # FastAPI app (optional)
└── adapters/
├── base.py
├── anthropic_adapter.py
├── openai_adapter.py
└── ollama_adapter.py
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_council-0.2.1.tar.gz.
File metadata
- Download URL: agent_council-0.2.1.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c1269ab1277048b0627f741de194755c9252ae8b40d210e4df126895c26d572
|
|
| MD5 |
d36c26273b047046db990a046ce0e26b
|
|
| BLAKE2b-256 |
a91a7cd513e7d45ca4aad37be325e45b8307dc311a48a37d4aed2a89955a123e
|
File details
Details for the file agent_council-0.2.1-py3-none-any.whl.
File metadata
- Download URL: agent_council-0.2.1-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0256e48880054fc9d707126234673e92864a7cadcdee605b3dbb323c0c78e04
|
|
| MD5 |
1beb0cd6a59fbbeb56a3aa00b3f9abb8
|
|
| BLAKE2b-256 |
94c581bb41e903687aa576cb4e811070f6dea99feb079bf3489bcdf9a4fa170c
|