Skip to main content

Multi-critic quality validation for agents, research, and configurations

Project description

Quorum — Reference Implementation

A working Python implementation of the Quorum multi-critic quality validation system.

Quorum evaluates artifacts (agent configurations, research documents, code) against domain-specific rubrics using specialized critics that are required to provide grounded evidence for every finding.


Quick Start

1. Install

cd reference-implementation
pip install -e .

Or without installing:

pip install -r requirements.txt
python -m quorum --help

2. Configure API Keys

Quorum uses LiteLLM as its universal provider, supporting Anthropic, OpenAI, Mistral, Groq, and 100+ others.

# Anthropic (Claude)
export ANTHROPIC_API_KEY=your-key-here

# OpenAI
export OPENAI_API_KEY=your-key-here

3. Run Your First Validation

# Validate the included example research document
quorum run --target examples/sample-research.md --depth quick

# Validate the example agent config
quorum run --target examples/sample-agent-config.yaml --rubric agent-config

# Use a specific rubric
quorum run --target my-research.md --rubric research-synthesis --depth standard

# Batch: validate all markdown files in a directory
quorum run --target ./docs/ --pattern "*.md" --rubric research-synthesis

# Cross-artifact: validate with a relationships manifest
quorum run --target my-spec.md --relationships quorum-relationships.yaml

Usage

Usage: quorum [OPTIONS] COMMAND [ARGS]...

  Quorum — Multi-critic quality validation.

Options:
  --version  Show the version and exit.
  -v         Enable debug logging
  --help     Show this message and exit.

Commands:
  run            Validate an artifact against a rubric
  rubrics list   List available built-in rubrics
  rubrics show   Show criteria for a specific rubric
  config init    Interactive first-run setup

quorum run

quorum run \
  --target <file-or-dir>              # required: artifact, directory, or glob to validate
  --depth quick|standard|thorough     # depth profile (default: quick)
  --rubric <name-or-path>             # rubric to use (auto-detected if omitted)
  --pattern "*.md"                    # filter files when --target is a directory
  --relationships <manifest.yaml>     # cross-artifact manifest for Phase 2 consistency checks
  --output-dir ./my-runs              # where to write outputs (default: ./quorum-runs/)
  --verbose                           # show full evidence for all findings

Exit codes:

  • 0 — PASS or PASS_WITH_NOTES (no blocking issues)
  • 1 — Error (bad arguments, missing file, API failure)
  • 2 — REVISE or REJECT (validation failed; artifact needs work)

Depth Profiles

Depth Critics Fix Loops Use For
quick correctness, completeness 0 Fast feedback, drafts
standard correctness, completeness, security, code_hygiene 0 Most work, PR reviews
thorough all 4 shipped critics (more as they land) 1 (Fixer proposals) Critical decisions, production changes

All depth profiles include the deterministic pre-screen (10 checks, no LLM cost) before any critics run.

Edit quorum/configs/*.yaml to customize model assignments and critic panels.


Rubrics

Rubrics define what "good" looks like for a domain. Built-in rubrics:

Name Domain Criteria
research-synthesis Research documents Citations, logic, completeness, causation
agent-config Agent configurations Model assignments, permissions, error handling
python-code Python source files 25 criteria (PC-001–PC-025); auto-detected on .py files
# List available rubrics
quorum rubrics list

# Show rubric criteria
quorum rubrics show research-synthesis

Custom Rubrics

Create a JSON file matching this schema:

{
  "name": "My Custom Rubric",
  "domain": "my-domain",
  "version": "1.0",
  "criteria": [
    {
      "id": "CR-001",
      "criterion": "What to check",
      "severity": "HIGH",
      "evidence_required": "What proof must be shown",
      "why": "Why this matters"
    }
  ]
}

Then: quorum run --target my-file.txt --rubric ./my-rubric.json


Outputs

Each quorum run creates a timestamped directory:

quorum-runs/
└── 20260223-143022-sample-research/
    ├── run-manifest.json              # Run parameters, flags, model config
    ├── artifact.txt                   # The artifact (copy)
    ├── rubric.json                    # Rubric used
    ├── prescreen.json                 # Deterministic pre-screen results (PS-001–PS-010)
    ├── critics/
    │   ├── correctness-findings.json
    │   ├── completeness-findings.json
    │   ├── security-findings.json
    │   ├── code_hygiene-findings.json
    │   └── cross_consistency-findings.json  # Phase 2 (if --relationships used)
    ├── verdict.json                   # Machine-readable verdict
    └── report.md                      # Human-readable report

For batch runs, each file gets its own timestamped sub-directory. A top-level batch-verdict.json summarizes the full run.


Architecture

quorum run --target file --relationships manifest.yaml
  ↓
pipeline.py          load config, rubric, artifact
  ↓
prescreen.py         10 deterministic checks (PS-001–PS-010)
                     → prescreen.json (no LLM, runs instantly)
  ↓
supervisor.py        Phase 1: classify domain, dispatch critics concurrently
                     (ThreadPoolExecutor, max 4 critics in parallel;
                      batch files run concurrently max 3)
  ↓
correctness.py    }
completeness.py   }  each critic → LLM → structured findings (parallel)
security.py       }  (framework-grounded: OWASP ASVS, CWE, NIST SA-11,
code_hygiene.py   }   ISO 25010:2023, CISQ)
  ↓
fixer.py             Phase 1.5: proposes text replacements for CRITICAL/HIGH
                     (only if max_fix_loops > 0; thorough depth default: 1)
  ↓
cross_artifact.py    Phase 2: cross-artifact consistency critic
                     (only if --relationships provided)
                     receives Phase 1 findings as context — NOT verdicts
  ↓
aggregator.py        deduplicate, resolve conflicts, assign verdict
  ↓
output.py            terminal report + write run directory

The core principle: Every finding must have evidence (a quote, a tool result, a rubric citation). The Aggregator rejects ungrounded claims. This prevents LLM hand-waving.

Phase 1 vs Phase 2: Phase 1 critics evaluate each file independently. Phase 2 (Cross-Artifact Consistency) receives Phase 1 findings — not verdicts — as context and evaluates declared relationships between files. This keeps phases independent: Phase 2 sees what was found, not a judgment.


Configuration

Quorum uses YAML config files for depth profiles. See quorum/configs/:

# quorum/configs/quick.yaml
critics:
  - correctness
  - completeness

model_tier1: claude-opus-4     # Strong model (judgment-heavy roles)
model_tier2: claude-sonnet-4   # Efficient model (critic execution)

max_fix_loops: 0
depth_profile: quick
temperature: 0.1
max_tokens: 4096

Model names follow LiteLLM conventions — any provider LiteLLM supports works here.


Cross-Artifact Validation

When your project has multiple files that should agree with each other — a spec and its implementation, an API contract and its consumers, a config and its schema — use --relationships to declare those relationships and let Quorum check them.

Relationship Manifest

Create quorum-relationships.yaml:

version: "1.0"
relationships:
  - source: src/api_handler.py
    target: docs/api-spec.md
    type: implements
    description: "Handler must implement all endpoints declared in spec"

  - source: docs/api-spec.md
    target: src/api_handler.py
    type: documents
    description: "Spec must document all public endpoints in handler"

  - source: quorum/critics/security.py
    target: quorum/configs/standard.yaml
    type: delegates
    description: "Security critic is enabled in standard depth profile"

  - source: data/output-schema.json
    target: src/pipeline.py
    type: schema_contract
    description: "Pipeline output must conform to declared schema"

Relationship types: implements, documents, delegates, schema_contract

Running with Relationships

quorum run \
  --target ./src/ \
  --relationships quorum-relationships.yaml \
  --depth standard

The Cross-Artifact Consistency critic evaluates each declared relationship, looking for mismatches, undocumented behavior, and broken contracts. Findings use a Locus model — each finding references both files with role annotations (source_role, target_role) and a source_hash to ensure the finding is pinned to the artifact version that was evaluated.


Extending Quorum

Adding a New Critic

  1. Create quorum/critics/my_critic.py inheriting from BaseCritic
  2. Implement name, system_prompt, and build_prompt()
  3. Register it in quorum/agents/supervisor.pyCRITIC_REGISTRY
  4. Add the name to your config's critics list

See quorum/critics/correctness.py for a complete example.

Adding a Cross-Artifact Critic

The Cross-Artifact Consistency critic does not inherit from BaseCritic — it uses a separate CrossArtifactCritic base class with a different interface, because it operates on pairs/groups of files rather than a single artifact. It also receives Phase 1 findings as additional context via build_prompt().

Supported relationship types for manifest declarations: implements, documents, delegates, schema_contract. New types can be added by extending the relationship type registry.


License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quorum_validator-0.7.3.tar.gz (196.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quorum_validator-0.7.3-py3-none-any.whl (137.5 kB view details)

Uploaded Python 3

File details

Details for the file quorum_validator-0.7.3.tar.gz.

File metadata

  • Download URL: quorum_validator-0.7.3.tar.gz
  • Upload date:
  • Size: 196.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quorum_validator-0.7.3.tar.gz
Algorithm Hash digest
SHA256 fe69e0f7de68512e1ab035c6cde0669830258e6787bf9e2660962e3549a2f207
MD5 45b456c27a028d837d0d79f50793779f
BLAKE2b-256 5fd7a7761394c0af91f15923f59f125e69591325a334c9cc628deff936740a3b

See more details on using hashes here.

File details

Details for the file quorum_validator-0.7.3-py3-none-any.whl.

File metadata

File hashes

Hashes for quorum_validator-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 db74ecf7531a22a7135381eca83d8a7681ef8650339135e9766a8f3762b26805
MD5 1a6c64afba91bd28e2a625ce3b349361
BLAKE2b-256 0a8daa33782b3fb7ac9bf36b0a56a803678e01e44f823d3132d4e6c7e966d1e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page