Skip to main content

Config-driven multi-agent debate council — add a reliability layer to any LLM pipeline

Project description

agent-council

A reliability layer for LLM pipelines. Multiple agents with distinct personas debate a topic across iterative rounds; a judge synthesizes a final verdict.

Drop it into any agent to pressure-test a decision before committing to it.

Inspired by mshumer/llmcouncil and the Mixture of Agents research.


Install

pip install agent-council

Requires Python ≥ 3.11.


Programmatic usage

This is the primary interface — use it inside your own agents and pipelines.

from agent_council import CouncilOrchestrator, MemberConfig, JudgeConfig

orchestrator = CouncilOrchestrator(
    members=[
        MemberConfig(
            id="analyst",
            name="The Analyst",
            provider="anthropic",
            model="claude-sonnet-4-6",
            persona="Rigorous analytical thinker. Evidence-based, structured.",
        ),
        MemberConfig(
            id="skeptic",
            name="The Skeptic",
            provider="openai",
            model="gpt-4o",
            persona="Challenge every assumption. Surface hidden risks.",
        ),
    ],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    rounds=3,
    early_exit_threshold=0.85,
)

session, verdict = await orchestrator.run("Should we adopt microservices?")

print(verdict.verdict)           # synthesized conclusion
print(verdict.consensus_level)   # ConsensusLevel.STRONG / MODERATE / WEAK / NONE
print(verdict.consensus_score)   # float 0–1
print(verdict.key_agreements)    # list[str]
print(verdict.dissenting_views)  # list[str]

API keys are read from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY) by default.

Streaming callbacks

Callbacks fire as each member finishes — no waiting for a full round to complete. Both sync and async callbacks are supported.

def on_member(resp):
    print(f"[{resp.member_name}] {resp.stance} ({resp.confidence:.0%} confident)")

async def on_round(round_):
    await db.save_round(round_)   # async is fine too

session, verdict = await orchestrator.run(
    "Should we rewrite in Rust?",
    on_member_response=on_member,
    on_round_complete=on_round,
)

Provider overrides

Override temperature, token limits, or use a custom Ollama endpoint:

from agent_council import (
    CouncilOrchestrator, MemberConfig, JudgeConfig,
    ProvidersConfig, AnthropicProviderConfig, OllamaProviderConfig,
)

orchestrator = CouncilOrchestrator(
    members=[...],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    provider_configs=ProvidersConfig(
        anthropic=AnthropicProviderConfig(temperature=0.5, max_tokens=1024),
        ollama=OllamaProviderConfig(base_url="http://my-ollama:11434"),
    ),
)

Result types

@dataclass FinalVerdict:
    topic: str
    verdict: str
    consensus_level: ConsensusLevel      # STRONG / MODERATE / WEAK / NONE
    consensus_score: float               # 0–1
    key_agreements: list[str]
    dissenting_views: list[str]
    rounds_completed: int
    early_exit: bool
    total_duration_seconds: float

@dataclass CouncilSession:
    session_id: str
    topic: str
    started_at: datetime
    rounds: list[DebateRound]            # full transcript
    verdict: FinalVerdict | None

@dataclass MemberResponse:
    member_id, member_name, round_number: str / int
    content: str                         # full response text
    stance: str                          # one-line position summary
    confidence: float                    # 0–1
    changed_position: bool

Why Reliability

LLMs can be confident and wrong. A single-shot response hides uncertainty and failure modes (hallucinations, missed trade‑offs, overfitting to prompt). Agent Council creates informed disagreement and then reconciles it:

  • Independent agents surface blind spots and competing views.
  • Iterative rounds reward stable, consistent positions (consensus score).
  • A neutral judge synthesizes agreements and dissent for transparent decision‑making.

What improves:

  • Decision quality: fewer unexamined assumptions, clearer trade‑offs.
  • Traceability: full transcript and structured verdict for audits/reviews.
  • Safety: configurable early exit threshold to avoid premature consensus.
  • Extensibility: mix providers/models (Anthropic, OpenAI, OpenRouter, Ollama).

Examples & Use Cases

  • Architecture choices: “Monolith vs microservices for product X?”
  • Launch reviews: “Are we production‑ready? What risks remain?”
  • AI safety checks: “Could this prompt produce unsafe output?”
  • Product strategy: “Should pricing move to usage‑based?”
  • Code migration: “Rewrite to Rust? What are the costs/benefits?”

Programmatic snippet (minimal):

session, verdict = await CouncilOrchestrator(
    members=[
        MemberConfig(id="analyst", name="Analyst", provider="openrouter", model="anthropic/claude-3.5-sonnet"),
        MemberConfig(id="skeptic", name="Skeptic", provider="openrouter", model="openai/gpt-4o"),
    ],
    judge=JudgeConfig(provider="openrouter", model="anthropic/claude-3.5-sonnet"),
).run("Should we adopt microservices?")
print(verdict.verdict)

CLI:

export OPENROUTER_API_KEY=sk-or-...
council review "Remote work vs office?" -c config/council.yaml

HTTP (server):

pip install "agent-council[server]"
council serve
# POST http://127.0.0.1:8000/review {"topic":"Should we rewrite in Rust?"}

How the debate works

Round 1 — all members respond independently to the topic
Round 2..N — each member reads all peers' responses and may revise
             → early exit if consensus score ≥ threshold
Judge — reads full transcript, synthesizes final verdict

Consensus score = avg_confidence × (1 − changed_fraction) Rewards both high confidence and stability across rounds.


Adding a provider

Implement BaseModelAdapter (one method) and register it in the factory:

# agent_council/adapters/my_provider.py
from agent_council.adapters.base import BaseModelAdapter

class MyProviderAdapter(BaseModelAdapter):
    async def complete(self, system: str, user: str) -> str:
        # call your model here
        ...
# agent_council/adapters/__init__.py — add to build_adapter()
case "myprovider":
    return MyProviderAdapter(member_cfg, provider_cfg)

Config file (optional)

For teams who prefer YAML over code:

# config/council.yaml
council:
  debate_rounds: 3
  early_exit_threshold: 0.85

members:
  - id: "analyst"
    name: "The Analyst"
    provider: "anthropic"
    model: "claude-sonnet-4-6"
    persona: "Rigorous analytical thinker."

  - id: "skeptic"
    name: "The Skeptic"
    provider: "openai"
    model: "gpt-4o"
    persona: "Challenge every assumption."

judge:
  provider: "anthropic"
  model: "claude-opus-4-6"

providers:
  # Optional: use OpenRouter with OpenAI-compatible models
  openrouter:
    api_key_env: "OPENROUTER_API_KEY"
    base_url: "https://openrouter.ai/api/v1"
    max_tokens: 2048
    temperature: 0.7
orchestrator = CouncilOrchestrator.from_config_file("config/council.yaml")

CLI (convenience)

# Install with CLI support (included by default)
pip install agent-council

export ANTHROPIC_API_KEY=sk-ant-...
council review "Is Python the best language for data science?"
council review "Should we rewrite in Rust?" --config config/council.yaml
council review "Remote work vs office?" --no-rounds

HTTP server (optional)

pip install "agent-council[server]"
council serve                       # http://127.0.0.1:8000
Method Path Description
GET /health Liveness check
POST /review Run debate, return full JSON result
POST /review/stream Run debate, stream events via SSE

Interactive docs at http://localhost:8000/docs.

SSE event stream (POST /review/stream):

data: {"event": "member_response", "data": {...}}
data: {"event": "round_complete",  "data": {"round_number": 1, "consensus_score": 0.62}}
data: {"event": "verdict",         "data": {...}}

Tracing

Enable JSON traces to inspect every response, round, and the final verdict. Two ways:

  1. Programmatic recorder
from agent_council.tracing import TraceRecorder

rec = TraceRecorder()
session, verdict = await orchestrator.run("Your topic", trace=rec)
path = rec.save("traces/")
print("Trace saved to", path)
  1. Build a trace from a finished session
from agent_council.tracing import TraceRecorder
rec = TraceRecorder.from_session(session, verdict)
rec.save("traces/")

Each trace is a single JSON file containing per-member responses, round summaries, and the final verdict.


Budget Awareness (optional)

Control spend with a simple per‑call budget guard. Set these env vars:

  • COUNCIL_COST_PER_CALL_USD — estimated cost charged per model call.
  • COUNCIL_MAX_BUDGET_USD — cap for the whole run; if exceeded and COUNCIL_BUDGET_HARD_STOP is true (default), the run stops.
  • COUNCIL_BUDGET_HARD_STOP — set 0/false for soft cap.

This wraps each provider adapter and enforces the budget before calling the model. For precise accounting, plug in your own adapter with exact token‑based pricing.


Project layout

agent_council/
├── __init__.py          # public API
├── orchestrator.py      # CouncilOrchestrator — primary entry point
├── member.py            # prompt construction + JSON parsing
├── debate.py            # round loop + consensus scoring
├── judge.py             # synthesis + FinalVerdict
├── config.py            # Pydantic config schema
├── types.py             # result dataclasses
├── server.py            # FastAPI app (optional)
└── adapters/
    ├── base.py
    ├── anthropic_adapter.py
    ├── openai_adapter.py
    └── ollama_adapter.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_council-0.2.1.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_council-0.2.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file agent_council-0.2.1.tar.gz.

File metadata

  • Download URL: agent_council-0.2.1.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6c1269ab1277048b0627f741de194755c9252ae8b40d210e4df126895c26d572
MD5 d36c26273b047046db990a046ce0e26b
BLAKE2b-256 a91a7cd513e7d45ca4aad37be325e45b8307dc311a48a37d4aed2a89955a123e

See more details on using hashes here.

File details

Details for the file agent_council-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: agent_council-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c0256e48880054fc9d707126234673e92864a7cadcdee605b3dbb323c0c78e04
MD5 1beb0cd6a59fbbeb56a3aa00b3f9abb8
BLAKE2b-256 94c581bb41e903687aa576cb4e811070f6dea99feb079bf3489bcdf9a4fa170c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page