Skip to main content

Multi-LLM Council Framework for adversarial debate, cross-validation, and structured decision-making

Project description

The LLM Council

$ council run drafter --mode arch "Design a mass hallucination prevention system"

                    ╔══════════════════════════════════════════════════════════╗
                    ║             ⚖️  THE LLM COUNCIL CONVENES  ⚖️              ║
                    ╚══════════════════════════════════════════════════════════╝

      ┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
      │  ┌───────────┐  │      │  ┌───────────┐  │      │  ┌───────────┐  │
      │  │ ╭───────╮ │  │      │  │ ╭───────╮ │  │      │  │ ╭───────╮ │  │
      │  │ │GPT5.4│ │  │      │  │ │ CLAUDE│ │  │      │  │ │GEMINI │ │  │
      │  │ ╰───────╯ │  │      │  │ ╰───────╯ │  │      │  │ ╰───────╯ │  │
      │  │   ◉ ◉     │  │      │  │   ◉ ◉     │  │      │  │   ◉ ◉     │  │
      │  │    ⌣      │  │      │  │    ▽      │  │      │  │    ○      │  │
      │  └───────────┘  │      │  └───────────┘  │      │  └───────────┘  │
      │    JUDGE #1     │      │    JUDGE #2     │      │    JUDGE #3     │
      └────────┬────────┘      └────────┬────────┘      └────────┬────────┘
               │                        │                        │
               │ "I propose we use      │ "Actually, I must      │ "Interesting, but
               │  a vector database..." │  respectfully disagree" │  what about...?"
               │                        │                        │
               └────────────────────────┼────────────────────────┘
                                        ▼
                         ┌──────────────────────────────┐
                         │     🔥 ADVERSARIAL DEBATE 🔥   │
                         │                              │
                         │  GPT5.4: "Your approach has  │
                         │          a cold start issue" │
                         │                              │
                         │  CLAUDE: "Fair, but yours    │
                         │          doesn't scale"      │
                         │                              │
                         │  GEMINI: "Both valid. What   │
                         │          if we combine..."   │
                         └──────────────┬───────────────┘
                                        ▼
                         ┌──────────────────────────────┐
                         │      ✅ VERDICT REACHED ✅     │
                         │                              │
                         │   Synthesized best ideas     │
                         │   Schema-validated output    │
                         │   Confidence: 94%            │
                         └──────────────────────────────┘

[Council] Task completed in 45.2s | 3 judges | 2 debate rounds | Cost: $0.12

The LLM Council - Multiple AI models debating as judges

License: MIT Python 3.10+ OS: Cross-platform Code style: ruff Type checked: mypy

A multi-model orchestration package that runs adversarial council workflows across OpenAI, Anthropic, Google, Vertex, OpenRouter, and local CLIs.

This is not a Claude-only framework. Claude Code is one supported client and one supported provider path among several.

This release also includes a mode-aware execution path with runtime profiles, routed handoff, capability planning, and deterministic eval tooling. Those capabilities materially extend the package runtime and make the public surface more explicit for planning, review, security, and research workflows.

Why Use a Council?

Single-model outputs have blind spots. By running multiple models in parallel and having them critique each other, the council:

  • Catches errors that any single model might miss
  • Reduces hallucination through cross-validation
  • Produces higher-quality outputs via adversarial refinement
  • Validates structure with JSON schema enforcement and retry logic

Features

Feature Description
Multi-Model Council Run Claude, GPT-5.4, and Gemini in parallel via OpenRouter or direct APIs
Mode-Aware Runtime drafter, critic, and planner honor runtime modes and execution profiles
Adversarial Critique Built-in critique phase identifies weaknesses and blind spots
Schema Validation JSON schema validation with automatic retry for structured outputs
Provider Agnostic Swap between OpenRouter, direct APIs, or CLI-based providers
Deep Doctor council doctor --deep checks real non-interactive generation readiness
Graceful Degradation Automatic retry, fallback, and skip strategies for failures
Artifact Store Persistent storage of drafts with tiered summarization
Eval Tooling eval, eval-compare, and local-only eval-import-pr support reproducible checks
Secret-Safe Logging Redaction pipeline prevents credential leakage

Requirements

Requirement Details
Python 3.10, 3.11, or 3.12
OS macOS, Linux, Windows (native or WSL)
Credentials At least one provider credential or authenticated local CLI (see below)

Supported Providers

Provider Environment Variable / Auth Notes
OpenRouter OPENROUTER_API_KEY Recommended - single key for all models
OpenAI OPENAI_API_KEY Direct OpenAI API access (GPT models)
Anthropic ANTHROPIC_API_KEY Direct Anthropic API access (Claude models)
Gemini API GOOGLE_API_KEY or GEMINI_API_KEY Direct Gemini API access
Vertex AI GOOGLE_CLOUD_PROJECT or ANTHROPIC_VERTEX_PROJECT_ID + ADC Enterprise GCP - Gemini + Claude
Claude Code Local claude CLI login CLI subprocess provider
Codex CLI Local codex CLI login CLI subprocess provider
Gemini CLI Local gemini CLI login CLI subprocess provider

Installation

pip install the-llm-council

With specific providers:

# OpenRouter (recommended - single API key for all models)
pip install the-llm-council

# Direct APIs
pip install the-llm-council[anthropic,openai,gemini]

# Vertex AI (Enterprise GCP)
pip install the-llm-council[vertex]

# All providers
pip install the-llm-council[all]

# Development
pip install the-llm-council[dev]

Agent Skills and Plugins

The LLM Council is available as an Agent Skill following the open Agent Skills standard. It works across OpenAI Codex, Claude Code, Cursor, VS Code, and other skill-compatible agents.

OpenAI Codex

# Copy skills directory to Codex skills location
cp -r skills/council ~/.codex/skills/

Claude Code

# Step 1: Add the repo as a marketplace
/plugin marketplace add sherifkozman/the-llm-council

# Step 2: Install the plugin
/plugin install llm-council@the-llm-council

Once installed, the council skill is auto-invoked when relevant, or use the /council command:

/council drafter --mode impl "Build a login page with OAuth"

Other Agents (Cursor, VS Code, GitHub, etc.)

Copy the skills/council/ directory to your agent's skills folder. The skill follows the open Agent Skills spec and works with any compatible agent.

Quick Start

CLI Usage

# Set your API key
export OPENROUTER_API_KEY="your-key"

# Run a council task (v0.7.x syntax with modes)
council run drafter --mode impl "Build a login page with OAuth"

# Multi-model council (Claude + GPT-5 + Gemini debating)
council run drafter --mode arch "Design a caching layer" \
  --models "anthropic/claude-opus-4-6,openai/gpt-5.4,google/gemini-3.1-pro-preview"

# Or set via environment variable
export COUNCIL_MODELS="anthropic/claude-opus-4-6,openai/gpt-5.4,google/gemini-3.1-pro-preview"
council run drafter "Build a login page"

# OpenRouter model IDs keep vendor namespaces like google/... .
# Those model IDs are separate from council provider names such as gemini or gemini-cli.

# Code review with security analysis
council run critic --mode review "Review auth changes" --verbose

# Ask the router to choose the next subagent/mode, then follow through
council run router "Assess whether we should add a hosted vector store" --route

# Bound latency/cost for review-style runs
council run critic --mode review "Review auth changes" \
  --runtime-profile bounded \
  --reasoning-profile off

# Disable artifact storage for faster runs
council run drafter "Quick fix" --no-artifacts

# Get structured JSON output
council run planner "Add user authentication" --json

# Legacy aliases still work, but prefer canonical subagents and modes
council run drafter --mode impl "Build a login page"

Python API

from llm_council import Council
from llm_council.protocol.types import CouncilConfig

# With mode configuration
config = CouncilConfig(providers=["openrouter"], mode="impl")
council = Council(config=config)
result = await council.run(
    task="Build a login page with OAuth",
    subagent="drafter"
)
print(result.output)

Check Provider Health

council doctor

# Verify actual non-interactive generation readiness
# May incur API/CLI usage.
council doctor --deep --provider claude --provider gemini-cli --provider codex

The Codex CLI adapter runs nested codex exec calls under an isolated temporary HOME, copying only the Codex auth files required for login. That keeps council subprocesses from inheriting the parent Codex agent's MCP tools, plugins, or skills while preserving your local Codex authentication.

Run Deterministic Evals

# Public deterministic runtime checks
council eval evals/runtime-baseline.yaml --providers openrouter

# Compare named variants on the same dataset
council eval-compare evals/runtime-baseline.yaml variants.yaml --providers openai

For private benchmark creation, import external PR review material into .council-private/ only:

council eval-import-pr owner/repo 123

That path is gitignored and intended for local-only evaluation inputs. Do not commit imported diffs, copied code, or review fixtures into tracked repo paths.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                           LLM Council                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐ │
│  │    CLI      │───▶│  Council    │───▶│     Orchestrator        │ │
│  │  (typer)    │    │   (API)     │    │                         │ │
│  └─────────────┘    └─────────────┘    │  ┌───────────────────┐  │ │
│                                        │  │  Health Checker   │  │ │
│  ┌─────────────────────────────────┐   │  ├───────────────────┤  │ │
│  │        Provider Registry        │◀──│  │ Degradation Policy│  │ │
│  │  ┌─────────┐ ┌─────────┐       │   │  ├───────────────────┤  │ │
│  │  │OpenRouter│ │Anthropic│ ...  │   │  │  Artifact Store   │  │ │
│  │  └─────────┘ └─────────┘       │   │  └───────────────────┘  │ │
│  └─────────────────────────────────┘   └─────────────────────────┘ │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    Subagent Configs                          │   │
│  │  router | planner | researcher | drafter | critic | ...     │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                     JSON Schemas                             │   │
│  │  Validation & retry logic for structured outputs             │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Pipeline Flow

0. HEALTH CHECK (optional)
   └── Preflight check of all providers, skip unhealthy ones

1. PARALLEL DRAFTS
   ├── Provider A generates draft
   ├── Provider B generates draft
   └── Provider C generates draft
   └── (Graceful degradation on failures)

2. ADVERSARIAL CRITIQUE
   └── Critic identifies weaknesses, contradictions, blind spots

3. SYNTHESIS
   └── Merge best elements, address critique, validate schema

4. VALIDATION
   └── JSON schema check with retry on failure

5. ARTIFACT STORAGE (optional)
   └── Store drafts and outputs for context management

Subagents (v0.7.x)

Core Agents

Subagent Modes Purpose Example
drafter impl, arch, test Generate code, designs, tests "Build the login page"
critic review, security Review and analyze "Review this PR for security"
synthesizer - Merge and finalize "Generate changelog for v1.2"
researcher - Technical research "Research OAuth providers"
planner plan, assess Roadmaps and decisions "Plan the auth implementation"
router - Classify and route tasks "Is this a bug or feature?"

Agent Modes

# drafter modes
council run drafter --mode impl "Build login page"     # Implementation (default)
council run drafter --mode arch "Design caching layer" # Architecture
council run drafter --mode test "Design test suite"    # Test design

# critic modes
council run critic --mode review "Review PR"           # Code review (default)
council run critic --mode security "Analyze auth"      # Security analysis

# planner modes
council run planner --mode plan "Plan implementation"  # Planning (default)
council run planner --mode assess "Redis vs Memcached" # Build vs buy

Legacy Aliases

Legacy names such as implementer, architect, reviewer, red-team, assessor, test-designer, and shipper still work for backwards compatibility, but public docs and examples now use the canonical subagents and modes.

Writing a Provider

Providers are pluggable via Python entry points. See the full Provider Development Guide for detailed instructions.

Quick Example

from llm_council.providers.base import ProviderAdapter, GenerateRequest, GenerateResponse

class MyProvider(ProviderAdapter):
    name = "myprovider"

    async def generate(self, request: GenerateRequest) -> GenerateResponse:
        # Your implementation
        return GenerateResponse(text="...", content="...")

    async def doctor(self) -> DoctorResult:
        return DoctorResult(ok=True, message="Healthy")

Register via pyproject.toml:

[project.entry-points."llm_council.providers"]
myprovider = "my_package.providers:MyProvider"

Reference Implementations

Provider Type File
OpenRouter HTTP API src/llm_council/providers/openrouter.py
Anthropic Native SDK src/llm_council/providers/anthropic.py
OpenAI Native SDK src/llm_council/providers/openai.py
Gemini API Native SDK src/llm_council/providers/gemini.py
Vertex AI Native SDK src/llm_council/providers/vertex.py
Claude Code CLI subprocess src/llm_council/providers/cli/claude_code.py
Codex CLI CLI subprocess src/llm_council/providers/cli/codex.py
Gemini CLI CLI subprocess src/llm_council/providers/cli/gemini_cli.py

Configuration

Environment Variables

# OpenRouter (recommended - single key for all models)
export OPENROUTER_API_KEY="your-key"

# Direct APIs
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Vertex AI - Gemini (Enterprise GCP)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="global"  # optional, default for Gemini on Vertex
export VERTEX_AI_MODEL="gemini-3.1-pro-preview"  # optional

# Vertex AI - Claude (Enterprise GCP)
export ANTHROPIC_VERTEX_PROJECT_ID="your-project-id"
export CLOUD_ML_REGION="global"              # Claude uses global region
export ANTHROPIC_MODEL="claude-opus-4-6@20260301"  # model with version

# Auth for Vertex AI: gcloud auth application-default login OR
# export GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa.json"

# Multi-model council: comma-separated OpenRouter model IDs
export COUNCIL_MODELS="anthropic/claude-opus-4-6,openai/gpt-5.4,google/gemini-3.1-pro-preview"

# Per-provider model override (v0.7.0+)
export OPENAI_MODEL="gpt-5.4"               # Override OpenAI default
export ANTHROPIC_MODEL="claude-opus-4-6"     # Override Anthropic default
export GEMINI_MODEL="gemini-3.1-pro-preview" # Override Gemini API default
export OPENROUTER_MODEL="anthropic/claude-opus-4-6"  # Override OpenRouter default

# Model pack overrides for specific task types
export COUNCIL_MODEL_FAST="anthropic/claude-haiku-4-5"        # Quick tasks
export COUNCIL_MODEL_REASONING="anthropic/claude-opus-4-6"    # Deep analysis
export COUNCIL_MODEL_CODE="openai/gpt-5.4"                   # Code generation
export COUNCIL_MODEL_CRITIC="anthropic/claude-sonnet-4-6"     # Adversarial critique
export COUNCIL_MODEL_GROUNDED="google/gemini-3.1-pro-preview" # Research tasks
export COUNCIL_MODEL_CODE_COMPLEX="anthropic/claude-opus-4-6" # Complex refactoring

Per-Subagent Reasoning Configuration (v0.3.0+)

Subagents can be configured with provider preferences, model overrides, and extended reasoning/thinking budgets in their YAML configs:

# src/llm_council/subagents/critic.yaml (security mode)
name: critic
model_pack: harsh_critic

# Provider preferences
providers:
  preferred: [anthropic, openai]
  fallback: [openrouter]
  exclude: [gemini]

# Model overrides per provider
models:
  anthropic: claude-opus-4-6
  openai: o3-mini
  gemini: gemini-3-pro

# Extended reasoning/thinking configuration
reasoning:
  enabled: true
  effort: high           # OpenAI o-series: low/medium/high
  budget_tokens: 32768   # Anthropic: 1024-128000
  thinking_level: high   # Google Gemini 3.x: minimal/low/medium/high
Provider Parameter Values Description
OpenAI effort low/medium/high Reasoning effort for o-series models
Anthropic budget_tokens 1024-128000 Extended thinking token budget
Gemini API thinking_level minimal/low/medium/high Gemini 3.x thinking level

Default Reasoning Tiers (v0.4.0+)

All subagents have pre-configured reasoning defaults based on task complexity:

Tier Subagents Config Use Case
High drafter (arch), critic, planner effort: high, budget_tokens: 16384 Deep analysis, critical decisions
Medium drafter (impl), researcher effort: medium, budget_tokens: 8192 Balanced code/research tasks
Disabled router, synthesizer, drafter (test) enabled: false Fast tasks, no overhead

Config File

# ~/.config/llm-council/config.yaml
providers:
  - name: openrouter
    default_model: anthropic/claude-opus-4-6
  - name: openai
    default_model: gpt-5.4
  - name: gemini
    default_model: gemini-3.1-pro-preview

defaults:
  providers:
    - openrouter
  timeout: 120
  max_retries: 3
  summary_tier: actions
  output_format: json  # or "rich" (default) — useful for non-interactive clients

Provider default_model is forwarded to each provider's constructor, overriding the hardcoded default. Per-provider env vars (e.g. OPENAI_MODEL) take precedence over the config file, and the --models CLI flag overrides both.

CLI Reference

council run <subagent> "<task>"    # Run a council task
council doctor                      # Check provider health
council config                      # Show configuration
council version                     # Show installed version

# Options
--mode             Agent mode (impl/arch/test for drafter, review/security for critic, etc.)
--providers, -p    Comma-separated provider list
--models, -m       Comma-separated OpenRouter model IDs for multi-model council
--files, -f        File paths as context (repeatable or comma-separated; 50KB/file, 200KB total)
--context, --system  Additional system context/instructions
--timeout, -t      Request timeout in seconds
--temperature      Model temperature (0.0-2.0)
--max-tokens       Max output tokens
--input, -i        Read task from file (use '-' for stdin)
--output, -o       Write output to file
--schema           Custom output schema JSON file
--dry-run          Show what would run without executing
--no-artifacts     Disable artifact storage
--json             Output structured JSON
--verbose, -v      Verbose output

File Context

Pass source files directly to council for review, implementation, or analysis:

# Comma-separated
council run critic --mode review --files src/auth.py,src/middleware.py "Review these files"

# Repeatable -f (cleaner for many files)
council run critic --mode review \
  -f src/auth.py \
  -f src/middleware.py \
  -f src/handler.py \
  "Review these files"

# Security audit
council run critic --mode security -f src/payment.py "Audit payment handler"

Limits: 50KB per file, 200KB total. Files exceeding limits are truncated with a warning.

Development

# Clone the repository
git clone https://github.com/sherifkozman/the-llm-council.git
cd the-llm-council

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/
mypy src/llm_council

Contributing

Contributions are welcome! See our Roadmap for planned features and Contributing Guide for details.

Quick Start

# Fork and clone
git clone https://github.com/YOUR_USERNAME/the-llm-council.git
cd the-llm-council

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/ && mypy src/llm_council

Contribution Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Test your changes (pytest)
  5. Lint your code (ruff check src/ && mypy src/llm_council)
  6. Commit with a clear message (git commit -m 'Add amazing feature')
  7. Push to your branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

What We're Looking For

  • New Providers: Add support for more LLM backends
  • New Subagents: Create specialized agents for specific tasks
  • Bug Fixes: Found a bug? We'd love a fix!
  • Documentation: Improvements to docs are always welcome
  • Tests: More test coverage is great

Security

For security concerns, please see our Security Policy or email vibecode@sherifkozman.com.

Key security features:

  • CLI adapters use exec-style subprocess (no shell injection)
  • Environment variable allowlisting prevents secret leakage
  • Path traversal protection in artifact storage
  • Configurable secret redaction in logs

License

MIT License - see LICENSE for details.

Acknowledgments

Built with:


When one model isn't enough, convene a council.

~ vibe coded by Sherif Kozman & The LLM Council ~

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

the_llm_council-0.7.8.tar.gz (203.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

the_llm_council-0.7.8-py3-none-any.whl (205.1 kB view details)

Uploaded Python 3

File details

Details for the file the_llm_council-0.7.8.tar.gz.

File metadata

  • Download URL: the_llm_council-0.7.8.tar.gz
  • Upload date:
  • Size: 203.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for the_llm_council-0.7.8.tar.gz
Algorithm Hash digest
SHA256 72fa80994eef2e3cc605bbcf11696eb0ac8588e402b5e8e20ac6e92158f45ffc
MD5 5d24ecf63eba35c873aa9103e41592ae
BLAKE2b-256 72764afe092808f89c53ca1a0e3193320b0e752e4beceb708402831b99fbef1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for the_llm_council-0.7.8.tar.gz:

Publisher: publish.yml on sherifkozman/the-llm-council

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file the_llm_council-0.7.8-py3-none-any.whl.

File metadata

File hashes

Hashes for the_llm_council-0.7.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9d751eddf2d9f3650877c665f4ddbd5f07f406f536fee18e975a21635ed5e193
MD5 ab5431f4d1cfb971f38c8b270ff80113
BLAKE2b-256 53c2d9c61e07f117c0f702c01563632d63d938bef7fe5304b37f111cf154ff2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for the_llm_council-0.7.8-py3-none-any.whl:

Publisher: publish.yml on sherifkozman/the-llm-council

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page