Skip to main content

Config-driven multi-agent debate council — add a reliability layer to any LLM pipeline

Project description

agent-council

A reliability layer for LLM pipelines. Multiple agents with distinct personas debate a topic across iterative rounds; a judge synthesizes a final verdict.

Drop it into any agent to pressure-test a decision before committing to it.

Inspired by mshumer/llmcouncil and the Mixture of Agents research.


Install

pip install agent-council

Requires Python ≥ 3.11.


Programmatic usage

This is the primary interface — use it inside your own agents and pipelines.

from agent_council import CouncilOrchestrator, MemberConfig, JudgeConfig

orchestrator = CouncilOrchestrator(
    members=[
        MemberConfig(
            id="analyst",
            name="The Analyst",
            provider="anthropic",
            model="claude-sonnet-4-6",
            persona="Rigorous analytical thinker. Evidence-based, structured.",
        ),
        MemberConfig(
            id="skeptic",
            name="The Skeptic",
            provider="openai",
            model="gpt-4o",
            persona="Challenge every assumption. Surface hidden risks.",
        ),
    ],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    rounds=3,
    early_exit_threshold=0.85,
)

session, verdict = await orchestrator.run("Should we adopt microservices?")

print(verdict.verdict)           # synthesized conclusion
print(verdict.consensus_level)   # ConsensusLevel.STRONG / MODERATE / WEAK / NONE
print(verdict.consensus_score)   # float 0–1
print(verdict.key_agreements)    # list[str]
print(verdict.dissenting_views)  # list[str]

API keys are read from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY) by default.

Streaming callbacks

Callbacks fire as each member finishes — no waiting for a full round to complete. Both sync and async callbacks are supported.

def on_member(resp):
    print(f"[{resp.member_name}] {resp.stance} ({resp.confidence:.0%} confident)")

async def on_round(round_):
    await db.save_round(round_)   # async is fine too

session, verdict = await orchestrator.run(
    "Should we rewrite in Rust?",
    on_member_response=on_member,
    on_round_complete=on_round,
)

Provider overrides

Override temperature, token limits, or use a custom Ollama endpoint:

from agent_council import (
    CouncilOrchestrator, MemberConfig, JudgeConfig,
    ProvidersConfig, AnthropicProviderConfig, OllamaProviderConfig,
)

orchestrator = CouncilOrchestrator(
    members=[...],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    provider_configs=ProvidersConfig(
        anthropic=AnthropicProviderConfig(temperature=0.5, max_tokens=1024),
        ollama=OllamaProviderConfig(base_url="http://my-ollama:11434"),
    ),
)

Result types

@dataclass FinalVerdict:
    topic: str
    verdict: str
    consensus_level: ConsensusLevel      # STRONG / MODERATE / WEAK / NONE
    consensus_score: float               # 0–1
    key_agreements: list[str]
    dissenting_views: list[str]
    rounds_completed: int
    early_exit: bool
    total_duration_seconds: float

@dataclass CouncilSession:
    session_id: str
    topic: str
    started_at: datetime
    rounds: list[DebateRound]            # full transcript
    verdict: FinalVerdict | None

@dataclass MemberResponse:
    member_id, member_name, round_number: str / int
    content: str                         # full response text
    stance: str                          # one-line position summary
    confidence: float                    # 0–1
    changed_position: bool

Why Reliability

LLMs can be confident and wrong. A single-shot response hides uncertainty and failure modes (hallucinations, missed trade‑offs, overfitting to prompt). Agent Council creates informed disagreement and then reconciles it:

  • Independent agents surface blind spots and competing views.
  • Iterative rounds reward stable, consistent positions (consensus score).
  • A neutral judge synthesizes agreements and dissent for transparent decision‑making.

What improves:

  • Decision quality: fewer unexamined assumptions, clearer trade‑offs.
  • Traceability: full transcript and structured verdict for audits/reviews.
  • Safety: configurable early exit threshold to avoid premature consensus.
  • Extensibility: mix providers/models (Anthropic, OpenAI, OpenRouter, Ollama).

Examples & Use Cases

  • Architecture choices: “Monolith vs microservices for product X?”
  • Launch reviews: “Are we production‑ready? What risks remain?”
  • AI safety checks: “Could this prompt produce unsafe output?”
  • Product strategy: “Should pricing move to usage‑based?”
  • Code migration: “Rewrite to Rust? What are the costs/benefits?”

Programmatic snippet (minimal):

session, verdict = await CouncilOrchestrator(
    members=[
        MemberConfig(id="analyst", name="Analyst", provider="openrouter", model="anthropic/claude-3.5-sonnet"),
        MemberConfig(id="skeptic", name="Skeptic", provider="openrouter", model="openai/gpt-4o"),
    ],
    judge=JudgeConfig(provider="openrouter", model="anthropic/claude-3.5-sonnet"),
).run("Should we adopt microservices?")
print(verdict.verdict)

CLI:

export OPENROUTER_API_KEY=sk-or-...
council review "Remote work vs office?" -c config/council.yaml

HTTP (server):

pip install "agent-council[server]"
council serve
# POST http://127.0.0.1:8000/review {"topic":"Should we rewrite in Rust?"}

How the debate works

Round 1 — all members respond independently to the topic
Round 2..N — each member reads all peers' responses and may revise
             → early exit if consensus score ≥ threshold
Judge — reads full transcript, synthesizes final verdict

Consensus score = avg_confidence × (1 − changed_fraction) Rewards both high confidence and stability across rounds.


Adding a provider

Implement BaseModelAdapter (one method) and register it in the factory:

# agent_council/adapters/my_provider.py
from agent_council.adapters.base import BaseModelAdapter

class MyProviderAdapter(BaseModelAdapter):
    async def complete(self, system: str, user: str) -> str:
        # call your model here
        ...
# agent_council/adapters/__init__.py — add to build_adapter()
case "myprovider":
    return MyProviderAdapter(member_cfg, provider_cfg)

Config file (optional)

For teams who prefer YAML over code:

# config/council.yaml
council:
  debate_rounds: 3
  early_exit_threshold: 0.85

members:
  - id: "analyst"
    name: "The Analyst"
    provider: "anthropic"
    model: "claude-sonnet-4-6"
    persona: "Rigorous analytical thinker."

  - id: "skeptic"
    name: "The Skeptic"
    provider: "openai"
    model: "gpt-4o"
    persona: "Challenge every assumption."

judge:
  provider: "anthropic"
  model: "claude-opus-4-6"

providers:
  # Optional: use OpenRouter with OpenAI-compatible models
  openrouter:
    api_key_env: "OPENROUTER_API_KEY"
    base_url: "https://openrouter.ai/api/v1"
    max_tokens: 2048
    temperature: 0.7
orchestrator = CouncilOrchestrator.from_config_file("config/council.yaml")

CLI (convenience)

# Install with CLI support (included by default)
pip install agent-council

export ANTHROPIC_API_KEY=sk-ant-...
council review "Is Python the best language for data science?"
council review "Should we rewrite in Rust?" --config config/council.yaml
council review "Remote work vs office?" --no-rounds

HTTP server (optional)

pip install "agent-council[server]"
council serve                       # http://127.0.0.1:8000
Method Path Description
GET /health Liveness check
POST /review Run debate, return full JSON result
POST /review/stream Run debate, stream events via SSE

Interactive docs at http://localhost:8000/docs.

SSE event stream (POST /review/stream):

data: {"event": "member_response", "data": {...}}
data: {"event": "round_complete",  "data": {"round_number": 1, "consensus_score": 0.62}}
data: {"event": "verdict",         "data": {...}}

Tracing

Enable JSON traces to inspect every response, round, and the final verdict. Two ways:

  1. Programmatic recorder
from agent_council.tracing import TraceRecorder

rec = TraceRecorder()
session, verdict = await orchestrator.run("Your topic", trace=rec)
path = rec.save("traces/")
print("Trace saved to", path)
  1. Build a trace from a finished session
from agent_council.tracing import TraceRecorder
rec = TraceRecorder.from_session(session, verdict)
rec.save("traces/")

Each trace is a single JSON file containing per-member responses, round summaries, and the final verdict.


Budget Awareness (optional)

Control spend with a simple per‑call budget guard. Set these env vars:

  • COUNCIL_COST_PER_CALL_USD — estimated cost charged per model call.
  • COUNCIL_MAX_BUDGET_USD — cap for the whole run; if exceeded and COUNCIL_BUDGET_HARD_STOP is true (default), the run stops.
  • COUNCIL_BUDGET_HARD_STOP — set 0/false for soft cap.

This wraps each provider adapter and enforces the budget before calling the model. For precise accounting, plug in your own adapter with exact token‑based pricing.


Project layout

agent_council/
├── __init__.py          # public API
├── orchestrator.py      # CouncilOrchestrator — primary entry point
├── member.py            # prompt construction + JSON parsing
├── debate.py            # round loop + consensus scoring
├── judge.py             # synthesis + FinalVerdict
├── config.py            # Pydantic config schema
├── types.py             # result dataclasses
├── server.py            # FastAPI app (optional)
└── adapters/
    ├── base.py
    ├── anthropic_adapter.py
    ├── openai_adapter.py
    └── ollama_adapter.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_council-0.2.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_council-0.2.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file agent_council-0.2.0.tar.gz.

File metadata

  • Download URL: agent_council-0.2.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0754e6f9f016cd4af516e65b74b196798d99a64caf61a3ef29e9bb8176d1e226
MD5 b869307d1091e8b135f93fb65c29f1af
BLAKE2b-256 50eb75c53e589c67a52e78c8d9ce198d8a7428e93123b6a4c7ec1046a9226a5d

See more details on using hashes here.

File details

Details for the file agent_council-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: agent_council-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c391652839c68a8ebafaa069a9467dbdf2c0360c644fc86c27978b89c6bfe2e6
MD5 6cc92ba37ed46391d70ef4fd32ad9b76
BLAKE2b-256 1f304b4cd768f9af353c78c3561de1f26fcff3aaf501a9f783461f2cb929f6d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page