Multi-critic quality validation for agents, research, and configurations

Project description

Quorum — Reference Implementation

A working Python implementation of the Quorum multi-critic quality validation system.

Quorum evaluates artifacts (agent configurations, research documents, code) against domain-specific rubrics using specialized critics that are required to provide grounded evidence for every finding.

Quick Start

1. Install

cd reference-implementation
pip install -e .

Or without installing:

pip install -r requirements.txt
python -m quorum --help

2. Configure API Keys

Quorum uses LiteLLM as its universal provider, supporting Anthropic, OpenAI, Mistral, Groq, and 100+ others.

# Anthropic (Claude)
export ANTHROPIC_API_KEY=your-key-here

# OpenAI
export OPENAI_API_KEY=your-key-here

3. Run Your First Validation

# Validate the included example research document
quorum run --target examples/sample-research.md --depth quick

# Validate the example agent config
quorum run --target examples/sample-agent-config.yaml --rubric agent-config

# Use a specific rubric
quorum run --target my-research.md --rubric research-synthesis --depth standard

# Batch: validate all markdown files in a directory
quorum run --target ./docs/ --pattern "*.md" --rubric research-synthesis

# Cross-artifact: validate with a relationships manifest
quorum run --target my-spec.md --relationships quorum-relationships.yaml

Usage

Usage: quorum [OPTIONS] COMMAND [ARGS]...

  Quorum — Multi-critic quality validation.

Options:
  --version  Show the version and exit.
  -v         Enable debug logging
  --help     Show this message and exit.

Commands:
  run            Validate an artifact against a rubric
  rubrics list   List available built-in rubrics
  rubrics show   Show criteria for a specific rubric
  config init    Interactive first-run setup

`quorum run`

quorum run \
  --target <file-or-dir>              # required: artifact, directory, or glob to validate
  --depth quick|standard|thorough     # depth profile (default: quick)
  --rubric <name-or-path>             # rubric to use (auto-detected if omitted)
  --pattern "*.md"                    # filter files when --target is a directory
  --relationships <manifest.yaml>     # cross-artifact manifest for Phase 2 consistency checks
  --output-dir ./my-runs              # where to write outputs (default: ./quorum-runs/)
  --verbose                           # show full evidence for all findings

Exit codes:

0 — PASS or PASS_WITH_NOTES (no blocking issues)
1 — Error (bad arguments, missing file, API failure)
2 — REVISE or REJECT (validation failed; artifact needs work)

Depth Profiles

Depth	Critics	Fix Loops	Use For
`quick`	correctness, completeness	0	Fast feedback, drafts
`standard`	correctness, completeness, security, code_hygiene	0	Most work, PR reviews
`thorough`	all 4 shipped critics (more as they land)	1 (Fixer proposals)	Critical decisions, production changes

All depth profiles include the deterministic pre-screen (10 checks, no LLM cost) before any critics run.

Edit quorum/configs/*.yaml to customize model assignments and critic panels.

Rubrics

Rubrics define what "good" looks like for a domain. Built-in rubrics:

Name	Domain	Criteria
`research-synthesis`	Research documents	Citations, logic, completeness, causation
`agent-config`	Agent configurations	Model assignments, permissions, error handling
`python-code`	Python source files	25 criteria (PC-001–PC-025); auto-detected on `.py` files

# List available rubrics
quorum rubrics list

# Show rubric criteria
quorum rubrics show research-synthesis

Custom Rubrics

Create a JSON file matching this schema:

{
  "name": "My Custom Rubric",
  "domain": "my-domain",
  "version": "1.0",
  "criteria": [
    {
      "id": "CR-001",
      "criterion": "What to check",
      "severity": "HIGH",
      "evidence_required": "What proof must be shown",
      "why": "Why this matters"
    }
  ]
}

Then: quorum run --target my-file.txt --rubric ./my-rubric.json

Outputs

Each quorum run creates a timestamped directory:

quorum-runs/
└── 20260223-143022-sample-research/
    ├── run-manifest.json              # Run parameters, flags, model config
    ├── artifact.txt                   # The artifact (copy)
    ├── rubric.json                    # Rubric used
    ├── prescreen.json                 # Deterministic pre-screen results (PS-001–PS-010)
    ├── critics/
    │   ├── correctness-findings.json
    │   ├── completeness-findings.json
    │   ├── security-findings.json
    │   ├── code_hygiene-findings.json
    │   └── cross_consistency-findings.json  # Phase 2 (if --relationships used)
    ├── verdict.json                   # Machine-readable verdict
    └── report.md                      # Human-readable report

For batch runs, each file gets its own timestamped sub-directory. A top-level batch-verdict.json summarizes the full run.

Architecture

quorum run --target file --relationships manifest.yaml
  ↓
pipeline.py          load config, rubric, artifact
  ↓
prescreen.py         10 deterministic checks (PS-001–PS-010)
                     → prescreen.json (no LLM, runs instantly)
  ↓
supervisor.py        Phase 1: classify domain, dispatch critics concurrently
                     (ThreadPoolExecutor, max 4 critics in parallel;
                      batch files run concurrently max 3)
  ↓
correctness.py    }
completeness.py   }  each critic → LLM → structured findings (parallel)
security.py       }  (framework-grounded: OWASP ASVS, CWE, NIST SA-11,
code_hygiene.py   }   ISO 25010:2023, CISQ)
  ↓
fixer.py             Phase 1.5: proposes text replacements for CRITICAL/HIGH
                     (only if max_fix_loops > 0; thorough depth default: 1)
  ↓
cross_artifact.py    Phase 2: cross-artifact consistency critic
                     (only if --relationships provided)
                     receives Phase 1 findings as context — NOT verdicts
  ↓
aggregator.py        deduplicate, resolve conflicts, assign verdict
  ↓
output.py            terminal report + write run directory

The core principle: Every finding must have evidence (a quote, a tool result, a rubric citation). The Aggregator rejects ungrounded claims. This prevents LLM hand-waving.

Phase 1 vs Phase 2: Phase 1 critics evaluate each file independently. Phase 2 (Cross-Artifact Consistency) receives Phase 1 findings — not verdicts — as context and evaluates declared relationships between files. This keeps phases independent: Phase 2 sees what was found, not a judgment.

Configuration

Quorum uses YAML config files for depth profiles. See quorum/configs/:

# quorum/configs/quick.yaml
critics:
  - correctness
  - completeness

model_tier1: claude-opus-4     # Strong model (judgment-heavy roles)
model_tier2: claude-sonnet-4   # Efficient model (critic execution)

max_fix_loops: 0
depth_profile: quick
temperature: 0.1
max_tokens: 4096

Model names follow LiteLLM conventions — any provider LiteLLM supports works here.

Cross-Artifact Validation

When your project has multiple files that should agree with each other — a spec and its implementation, an API contract and its consumers, a config and its schema — use --relationships to declare those relationships and let Quorum check them.

Relationship Manifest

Create quorum-relationships.yaml:

version: "1.0"
relationships:
  - source: src/api_handler.py
    target: docs/api-spec.md
    type: implements
    description: "Handler must implement all endpoints declared in spec"

  - source: docs/api-spec.md
    target: src/api_handler.py
    type: documents
    description: "Spec must document all public endpoints in handler"

  - source: quorum/critics/security.py
    target: quorum/configs/standard.yaml
    type: delegates
    description: "Security critic is enabled in standard depth profile"

  - source: data/output-schema.json
    target: src/pipeline.py
    type: schema_contract
    description: "Pipeline output must conform to declared schema"

Relationship types: implements, documents, delegates, schema_contract

Running with Relationships

quorum run \
  --target ./src/ \
  --relationships quorum-relationships.yaml \
  --depth standard

The Cross-Artifact Consistency critic evaluates each declared relationship, looking for mismatches, undocumented behavior, and broken contracts. Findings use a Locus model — each finding references both files with role annotations (source_role, target_role) and a source_hash to ensure the finding is pinned to the artifact version that was evaluated.

Extending Quorum

Adding a New Critic

Create quorum/critics/my_critic.py inheriting from BaseCritic
Implement name, system_prompt, and build_prompt()
Register it in quorum/agents/supervisor.py → CRITIC_REGISTRY
Add the name to your config's critics list

See quorum/critics/correctness.py for a complete example.

Adding a Cross-Artifact Critic

The Cross-Artifact Consistency critic does not inherit from BaseCritic — it uses a separate CrossArtifactCritic base class with a different interface, because it operates on pairs/groups of files rather than a single artifact. It also receives Phase 1 findings as additional context via build_prompt().

Supported relationship types for manifest declarations: implements, documents, delegates, schema_contract. New types can be added by extending the relationship type registry.

License

MIT — see LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.7.3

Mar 16, 2026

0.7.2

Mar 12, 2026

This version

0.5.2

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quorum_validator-0.5.2.tar.gz (130.1 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quorum_validator-0.5.2-py3-none-any.whl (101.2 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file quorum_validator-0.5.2.tar.gz.

File metadata

Download URL: quorum_validator-0.5.2.tar.gz
Upload date: Mar 9, 2026
Size: 130.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quorum_validator-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`a1919f910ae18e6f9e4939465ffdba03db556492cb9e985ebea19e6ab7ebbb70`
MD5	`eb7f5c619262a60c836eadd8edd07fa4`
BLAKE2b-256	`137025af687bf0ff9b9a59b29356b1e8399c5a09b73279ed52bd36622fe8abc0`

See more details on using hashes here.

File details

Details for the file quorum_validator-0.5.2-py3-none-any.whl.

File metadata

Download URL: quorum_validator-0.5.2-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 101.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quorum_validator-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69fe9376b38f97aa1aee4908e6aff41483af6593b574d9fd32e3471480593cdb`
MD5	`899fcd871a3996a6c30e6197c1f18808`
BLAKE2b-256	`3b8ef26ca97caa4dc3782b74009d0b9992e2eb7041a014ada8a091ee35151939`

See more details on using hashes here.

quorum-validator 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Quorum — Reference Implementation

Quick Start

1. Install

2. Configure API Keys

3. Run Your First Validation

Usage

quorum run

Depth Profiles

Rubrics

Custom Rubrics

Outputs

Architecture

Configuration

Cross-Artifact Validation

Relationship Manifest

Running with Relationships

Extending Quorum

Adding a New Critic

Adding a Cross-Artifact Critic

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`quorum run`