Config-driven multi-agent debate council — add a reliability layer to any LLM pipeline

These details have not been verified by PyPI

Project links

Project description

agent-council

A reliability layer for LLM pipelines. Multiple agents with distinct personas debate a topic across iterative rounds; a judge synthesizes a final verdict.

Drop it into any agent to pressure-test a decision before committing to it.

Inspired by mshumer/llmcouncil and the Mixture of Agents research.

Install

pip install agent-council

Requires Python ≥ 3.11.

Programmatic usage

This is the primary interface — use it inside your own agents and pipelines.

from agent_council import CouncilOrchestrator, MemberConfig, JudgeConfig

orchestrator = CouncilOrchestrator(
    members=[
        MemberConfig(
            id="analyst",
            name="The Analyst",
            provider="anthropic",
            model="claude-sonnet-4-6",
            persona="Rigorous analytical thinker. Evidence-based, structured.",
        ),
        MemberConfig(
            id="skeptic",
            name="The Skeptic",
            provider="openai",
            model="gpt-4o",
            persona="Challenge every assumption. Surface hidden risks.",
        ),
    ],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    rounds=3,
    early_exit_threshold=0.85,
)

session, verdict = await orchestrator.run("Should we adopt microservices?")

print(verdict.verdict)           # synthesized conclusion
print(verdict.consensus_level)   # ConsensusLevel.STRONG / MODERATE / WEAK / NONE
print(verdict.consensus_score)   # float 0–1
print(verdict.key_agreements)    # list[str]
print(verdict.dissenting_views)  # list[str]

API keys are read from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY) by default.

Streaming callbacks

Callbacks fire as each member finishes — no waiting for a full round to complete. Both sync and async callbacks are supported.

def on_member(resp):
    print(f"[{resp.member_name}] {resp.stance} ({resp.confidence:.0%} confident)")

async def on_round(round_):
    await db.save_round(round_)   # async is fine too

session, verdict = await orchestrator.run(
    "Should we rewrite in Rust?",
    on_member_response=on_member,
    on_round_complete=on_round,
)

Provider overrides

Override temperature, token limits, or use a custom Ollama endpoint:

from agent_council import (
    CouncilOrchestrator, MemberConfig, JudgeConfig,
    ProvidersConfig, AnthropicProviderConfig, OllamaProviderConfig,
)

orchestrator = CouncilOrchestrator(
    members=[...],
    judge=JudgeConfig(provider="anthropic", model="claude-opus-4-6"),
    provider_configs=ProvidersConfig(
        anthropic=AnthropicProviderConfig(temperature=0.5, max_tokens=1024),
        ollama=OllamaProviderConfig(base_url="http://my-ollama:11434"),
    ),
)

Result types

@dataclass FinalVerdict:
    topic: str
    verdict: str
    consensus_level: ConsensusLevel      # STRONG / MODERATE / WEAK / NONE
    consensus_score: float               # 0–1
    key_agreements: list[str]
    dissenting_views: list[str]
    rounds_completed: int
    early_exit: bool
    total_duration_seconds: float

@dataclass CouncilSession:
    session_id: str
    topic: str
    started_at: datetime
    rounds: list[DebateRound]            # full transcript
    verdict: FinalVerdict | None

@dataclass MemberResponse:
    member_id, member_name, round_number: str / int
    content: str                         # full response text
    stance: str                          # one-line position summary
    confidence: float                    # 0–1
    changed_position: bool

Why Reliability

LLMs can be confident and wrong. A single-shot response hides uncertainty and failure modes (hallucinations, missed trade‑offs, overfitting to prompt). Agent Council creates informed disagreement and then reconciles it:

Independent agents surface blind spots and competing views.
Iterative rounds reward stable, consistent positions (consensus score).
A neutral judge synthesizes agreements and dissent for transparent decision‑making.

What improves:

Decision quality: fewer unexamined assumptions, clearer trade‑offs.
Traceability: full transcript and structured verdict for audits/reviews.
Safety: configurable early exit threshold to avoid premature consensus.
Extensibility: mix providers/models (Anthropic, OpenAI, OpenRouter, Ollama).

Examples & Use Cases

Architecture choices: “Monolith vs microservices for product X?”
Launch reviews: “Are we production‑ready? What risks remain?”
AI safety checks: “Could this prompt produce unsafe output?”
Product strategy: “Should pricing move to usage‑based?”
Code migration: “Rewrite to Rust? What are the costs/benefits?”

Programmatic snippet (minimal):

session, verdict = await CouncilOrchestrator(
    members=[
        MemberConfig(id="analyst", name="Analyst", provider="openrouter", model="anthropic/claude-3.5-sonnet"),
        MemberConfig(id="skeptic", name="Skeptic", provider="openrouter", model="openai/gpt-4o"),
    ],
    judge=JudgeConfig(provider="openrouter", model="anthropic/claude-3.5-sonnet"),
).run("Should we adopt microservices?")
print(verdict.verdict)

CLI:

export OPENROUTER_API_KEY=sk-or-...
council review "Remote work vs office?" -c config/council.yaml

HTTP (server):

pip install "agent-council[server]"
council serve
# POST http://127.0.0.1:8000/review {"topic":"Should we rewrite in Rust?"}

How the debate works

Round 1 — all members respond independently to the topic
Round 2..N — each member reads all peers' responses and may revise
             → early exit if consensus score ≥ threshold
Judge — reads full transcript, synthesizes final verdict

Consensus score = avg_confidence × (1 − changed_fraction) Rewards both high confidence and stability across rounds.

Adding a provider

Implement BaseModelAdapter (one method) and register it in the factory:

# agent_council/adapters/my_provider.py
from agent_council.adapters.base import BaseModelAdapter

class MyProviderAdapter(BaseModelAdapter):
    async def complete(self, system: str, user: str) -> str:
        # call your model here
        ...

# agent_council/adapters/__init__.py — add to build_adapter()
case "myprovider":
    return MyProviderAdapter(member_cfg, provider_cfg)

Config file (optional)

For teams who prefer YAML over code:

# config/council.yaml
council:
  debate_rounds: 3
  early_exit_threshold: 0.85

members:
  - id: "analyst"
    name: "The Analyst"
    provider: "anthropic"
    model: "claude-sonnet-4-6"
    persona: "Rigorous analytical thinker."

  - id: "skeptic"
    name: "The Skeptic"
    provider: "openai"
    model: "gpt-4o"
    persona: "Challenge every assumption."

judge:
  provider: "anthropic"
  model: "claude-opus-4-6"

providers:
  # Optional: use OpenRouter with OpenAI-compatible models
  openrouter:
    api_key_env: "OPENROUTER_API_KEY"
    base_url: "https://openrouter.ai/api/v1"
    max_tokens: 2048
    temperature: 0.7

orchestrator = CouncilOrchestrator.from_config_file("config/council.yaml")

CLI (convenience)

# Install with CLI support (included by default)
pip install agent-council

export ANTHROPIC_API_KEY=sk-ant-...
council review "Is Python the best language for data science?"
council review "Should we rewrite in Rust?" --config config/council.yaml
council review "Remote work vs office?" --no-rounds

HTTP server (optional)

pip install "agent-council[server]"
council serve                       # http://127.0.0.1:8000

Method	Path	Description
`GET`	`/health`	Liveness check
`POST`	`/review`	Run debate, return full JSON result
`POST`	`/review/stream`	Run debate, stream events via SSE

Interactive docs at http://localhost:8000/docs.

SSE event stream (POST /review/stream):

data: {"event": "member_response", "data": {...}}
data: {"event": "round_complete",  "data": {"round_number": 1, "consensus_score": 0.62}}
data: {"event": "verdict",         "data": {...}}

Tracing

Enable JSON traces to inspect every response, round, and the final verdict. Two ways:

Programmatic recorder

from agent_council.tracing import TraceRecorder

rec = TraceRecorder()
session, verdict = await orchestrator.run("Your topic", trace=rec)
path = rec.save("traces/")
print("Trace saved to", path)

Build a trace from a finished session

from agent_council.tracing import TraceRecorder
rec = TraceRecorder.from_session(session, verdict)
rec.save("traces/")

Each trace is a single JSON file containing per-member responses, round summaries, and the final verdict.

Budget Awareness (optional)

Control spend with a simple per‑call budget guard. Set these env vars:

COUNCIL_COST_PER_CALL_USD — estimated cost charged per model call.
COUNCIL_MAX_BUDGET_USD — cap for the whole run; if exceeded and COUNCIL_BUDGET_HARD_STOP is true (default), the run stops.
COUNCIL_BUDGET_HARD_STOP — set 0/false for soft cap.

This wraps each provider adapter and enforces the budget before calling the model. For precise accounting, plug in your own adapter with exact token‑based pricing.

Project layout

agent_council/
├── __init__.py          # public API
├── orchestrator.py      # CouncilOrchestrator — primary entry point
├── member.py            # prompt construction + JSON parsing
├── debate.py            # round loop + consensus scoring
├── judge.py             # synthesis + FinalVerdict
├── config.py            # Pydantic config schema
├── types.py             # result dataclasses
├── server.py            # FastAPI app (optional)
└── adapters/
    ├── base.py
    ├── anthropic_adapter.py
    ├── openai_adapter.py
    └── ollama_adapter.py

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Feb 27, 2026

This version

0.2.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_council-0.2.0.tar.gz (21.9 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_council-0.2.0-py3-none-any.whl (24.9 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file agent_council-0.2.0.tar.gz.

File metadata

Download URL: agent_council-0.2.0.tar.gz
Upload date: Feb 27, 2026
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0754e6f9f016cd4af516e65b74b196798d99a64caf61a3ef29e9bb8176d1e226`
MD5	`b869307d1091e8b135f93fb65c29f1af`
BLAKE2b-256	`50eb75c53e589c67a52e78c8d9ce198d8a7428e93123b6a4c7ec1046a9226a5d`

See more details on using hashes here.

File details

Details for the file agent_council-0.2.0-py3-none-any.whl.

File metadata

Download URL: agent_council-0.2.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for agent_council-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c391652839c68a8ebafaa069a9467dbdf2c0360c644fc86c27978b89c6bfe2e6`
MD5	`6cc92ba37ed46391d70ef4fd32ad9b76`
BLAKE2b-256	`1f304b4cd768f9af353c78c3561de1f26fcff3aaf501a9f783461f2cb929f6d0`

See more details on using hashes here.

agent-council 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-council

Install

Programmatic usage

Streaming callbacks

Provider overrides

Result types

Why Reliability

Examples & Use Cases

How the debate works

Adding a provider

Config file (optional)

CLI (convenience)

HTTP server (optional)

Tracing

Budget Awareness (optional)

Project layout

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes