Skip to main content

Multi-critic quality validation for agents, research, and configurations

Project description

Quorum — Reference Implementation

A working Python implementation of the Quorum multi-critic quality validation system.

Quorum evaluates artifacts (agent configurations, research documents, code) against domain-specific rubrics using specialized critics that are required to provide grounded evidence for every finding.


Quick Start

1. Install

cd reference-implementation
pip install -e .

Or without installing:

pip install -r requirements.txt
python -m quorum --help

2. Configure API Keys

Quorum uses LiteLLM as its universal provider, supporting Anthropic, OpenAI, Mistral, Groq, and 100+ others.

# Anthropic (Claude)
export ANTHROPIC_API_KEY=your-key-here

# OpenAI
export OPENAI_API_KEY=your-key-here

3. Run Your First Validation

# Validate the included example research document
quorum run --target examples/sample-research.md --depth quick

# Validate the example agent config
quorum run --target examples/sample-agent-config.yaml --rubric agent-config

# Use a specific rubric
quorum run --target my-research.md --rubric research-synthesis --depth standard

# Batch: validate all markdown files in a directory
quorum run --target ./docs/ --pattern "*.md" --rubric research-synthesis

# Cross-artifact: validate with a relationships manifest
quorum run --target my-spec.md --relationships quorum-relationships.yaml

Usage

Usage: quorum [OPTIONS] COMMAND [ARGS]...

  Quorum — Multi-critic quality validation.

Options:
  --version  Show the version and exit.
  -v         Enable debug logging
  --help     Show this message and exit.

Commands:
  run            Validate an artifact against a rubric
  rubrics list   List available built-in rubrics
  rubrics show   Show criteria for a specific rubric
  config init    Interactive first-run setup

quorum run

quorum run \
  --target <file-or-dir>              # required: artifact, directory, or glob to validate
  --depth quick|standard|thorough     # depth profile (default: quick)
  --rubric <name-or-path>             # rubric to use (auto-detected if omitted)
  --pattern "*.md"                    # filter files when --target is a directory
  --relationships <manifest.yaml>     # cross-artifact manifest for Phase 2 consistency checks
  --output-dir ./my-runs              # where to write outputs (default: ./quorum-runs/)
  --verbose                           # show full evidence for all findings

Exit codes:

  • 0 — PASS or PASS_WITH_NOTES (no blocking issues)
  • 1 — Error (bad arguments, missing file, API failure)
  • 2 — REVISE or REJECT (validation failed; artifact needs work)

Depth Profiles

Depth Critics Fix Loops Use For
quick correctness, completeness 0 Fast feedback, drafts
standard correctness, completeness, security, code_hygiene 0 Most work, PR reviews
thorough all 4 shipped critics (more as they land) 1 (Fixer proposals) Critical decisions, production changes

All depth profiles include the deterministic pre-screen (10 checks, no LLM cost) before any critics run.

Edit quorum/configs/*.yaml to customize model assignments and critic panels.


Rubrics

Rubrics define what "good" looks like for a domain. Built-in rubrics:

Name Domain Criteria
research-synthesis Research documents Citations, logic, completeness, causation
agent-config Agent configurations Model assignments, permissions, error handling
python-code Python source files 25 criteria (PC-001–PC-025); auto-detected on .py files
# List available rubrics
quorum rubrics list

# Show rubric criteria
quorum rubrics show research-synthesis

Custom Rubrics

Create a JSON file matching this schema:

{
  "name": "My Custom Rubric",
  "domain": "my-domain",
  "version": "1.0",
  "criteria": [
    {
      "id": "CR-001",
      "criterion": "What to check",
      "severity": "HIGH",
      "evidence_required": "What proof must be shown",
      "why": "Why this matters"
    }
  ]
}

Then: quorum run --target my-file.txt --rubric ./my-rubric.json


Outputs

Each quorum run creates a timestamped directory:

quorum-runs/
└── 20260223-143022-sample-research/
    ├── run-manifest.json              # Run parameters, flags, model config
    ├── artifact.txt                   # The artifact (copy)
    ├── rubric.json                    # Rubric used
    ├── prescreen.json                 # Deterministic pre-screen results (PS-001–PS-010)
    ├── critics/
    │   ├── correctness-findings.json
    │   ├── completeness-findings.json
    │   ├── security-findings.json
    │   ├── code_hygiene-findings.json
    │   └── cross_consistency-findings.json  # Phase 2 (if --relationships used)
    ├── verdict.json                   # Machine-readable verdict
    └── report.md                      # Human-readable report

For batch runs, each file gets its own timestamped sub-directory. A top-level batch-verdict.json summarizes the full run.


Architecture

quorum run --target file --relationships manifest.yaml
  ↓
pipeline.py          load config, rubric, artifact
  ↓
prescreen.py         10 deterministic checks (PS-001–PS-010)
                     → prescreen.json (no LLM, runs instantly)
  ↓
supervisor.py        Phase 1: classify domain, dispatch critics concurrently
                     (ThreadPoolExecutor, max 4 critics in parallel;
                      batch files run concurrently max 3)
  ↓
correctness.py    }
completeness.py   }  each critic → LLM → structured findings (parallel)
security.py       }  (framework-grounded: OWASP ASVS, CWE, NIST SA-11,
code_hygiene.py   }   ISO 25010:2023, CISQ)
  ↓
fixer.py             Phase 1.5: proposes text replacements for CRITICAL/HIGH
                     (only if max_fix_loops > 0; thorough depth default: 1)
  ↓
cross_artifact.py    Phase 2: cross-artifact consistency critic
                     (only if --relationships provided)
                     receives Phase 1 findings as context — NOT verdicts
  ↓
aggregator.py        deduplicate, resolve conflicts, assign verdict
  ↓
output.py            terminal report + write run directory

The core principle: Every finding must have evidence (a quote, a tool result, a rubric citation). The Aggregator rejects ungrounded claims. This prevents LLM hand-waving.

Phase 1 vs Phase 2: Phase 1 critics evaluate each file independently. Phase 2 (Cross-Artifact Consistency) receives Phase 1 findings — not verdicts — as context and evaluates declared relationships between files. This keeps phases independent: Phase 2 sees what was found, not a judgment.


Configuration

Quorum uses YAML config files for depth profiles. See quorum/configs/:

# quorum/configs/quick.yaml
critics:
  - correctness
  - completeness

model_tier1: claude-opus-4     # Strong model (judgment-heavy roles)
model_tier2: claude-sonnet-4   # Efficient model (critic execution)

max_fix_loops: 0
depth_profile: quick
temperature: 0.1
max_tokens: 4096

Model names follow LiteLLM conventions — any provider LiteLLM supports works here.


Cross-Artifact Validation

When your project has multiple files that should agree with each other — a spec and its implementation, an API contract and its consumers, a config and its schema — use --relationships to declare those relationships and let Quorum check them.

Relationship Manifest

Create quorum-relationships.yaml:

version: "1.0"
relationships:
  - source: src/api_handler.py
    target: docs/api-spec.md
    type: implements
    description: "Handler must implement all endpoints declared in spec"

  - source: docs/api-spec.md
    target: src/api_handler.py
    type: documents
    description: "Spec must document all public endpoints in handler"

  - source: quorum/critics/security.py
    target: quorum/configs/standard.yaml
    type: delegates
    description: "Security critic is enabled in standard depth profile"

  - source: data/output-schema.json
    target: src/pipeline.py
    type: schema_contract
    description: "Pipeline output must conform to declared schema"

Relationship types: implements, documents, delegates, schema_contract

Running with Relationships

quorum run \
  --target ./src/ \
  --relationships quorum-relationships.yaml \
  --depth standard

The Cross-Artifact Consistency critic evaluates each declared relationship, looking for mismatches, undocumented behavior, and broken contracts. Findings use a Locus model — each finding references both files with role annotations (source_role, target_role) and a source_hash to ensure the finding is pinned to the artifact version that was evaluated.


Extending Quorum

Adding a New Critic

  1. Create quorum/critics/my_critic.py inheriting from BaseCritic
  2. Implement name, system_prompt, and build_prompt()
  3. Register it in quorum/agents/supervisor.pyCRITIC_REGISTRY
  4. Add the name to your config's critics list

See quorum/critics/correctness.py for a complete example.

Adding a Cross-Artifact Critic

The Cross-Artifact Consistency critic does not inherit from BaseCritic — it uses a separate CrossArtifactCritic base class with a different interface, because it operates on pairs/groups of files rather than a single artifact. It also receives Phase 1 findings as additional context via build_prompt().

Supported relationship types for manifest declarations: implements, documents, delegates, schema_contract. New types can be added by extending the relationship type registry.


License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quorum_validator-0.7.2.tar.gz (195.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quorum_validator-0.7.2-py3-none-any.whl (137.1 kB view details)

Uploaded Python 3

File details

Details for the file quorum_validator-0.7.2.tar.gz.

File metadata

  • Download URL: quorum_validator-0.7.2.tar.gz
  • Upload date:
  • Size: 195.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quorum_validator-0.7.2.tar.gz
Algorithm Hash digest
SHA256 b773b0eae3f0dd3a0fc72767d99d8ea2f4fc25f1245322d8227ef8bf55a0b19c
MD5 366e3d62688f62bdb7d5c5c3f0753510
BLAKE2b-256 43200c5f900efc38d1266cb7b0477876004b604d97a9d0697b5ea0c5ff5582b0

See more details on using hashes here.

File details

Details for the file quorum_validator-0.7.2-py3-none-any.whl.

File metadata

File hashes

Hashes for quorum_validator-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c2e06f7b98e50b55c6ebf88c24512e1b0ec995f27e75ef9fa8a953e7889a7871
MD5 9a9224b3f1bf4c4d629fb599e66dbee5
BLAKE2b-256 2bc4015245a15cd24463618df8e7705d855e518f1cd300efdba337bcb5ff25c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page