Multi-critic quality validation for agents, research, and configurations
Project description
Quorum — Reference Implementation
A working Python implementation of the Quorum multi-critic quality validation system.
Quorum evaluates artifacts (agent configurations, research documents, code) against domain-specific rubrics using specialized critics that are required to provide grounded evidence for every finding.
Quick Start
1. Install
cd reference-implementation
pip install -e .
Or without installing:
pip install -r requirements.txt
python -m quorum --help
2. Configure API Keys
Quorum uses LiteLLM as its universal provider, supporting Anthropic, OpenAI, Mistral, Groq, and 100+ others.
# Anthropic (Claude)
export ANTHROPIC_API_KEY=your-key-here
# OpenAI
export OPENAI_API_KEY=your-key-here
3. Run Your First Validation
# Validate the included example research document
quorum run --target examples/sample-research.md --depth quick
# Validate the example agent config
quorum run --target examples/sample-agent-config.yaml --rubric agent-config
# Use a specific rubric
quorum run --target my-research.md --rubric research-synthesis --depth standard
# Batch: validate all markdown files in a directory
quorum run --target ./docs/ --pattern "*.md" --rubric research-synthesis
# Cross-artifact: validate with a relationships manifest
quorum run --target my-spec.md --relationships quorum-relationships.yaml
Usage
Usage: quorum [OPTIONS] COMMAND [ARGS]...
Quorum — Multi-critic quality validation.
Options:
--version Show the version and exit.
-v Enable debug logging
--help Show this message and exit.
Commands:
run Validate an artifact against a rubric
rubrics list List available built-in rubrics
rubrics show Show criteria for a specific rubric
config init Interactive first-run setup
quorum run
quorum run \
--target <file-or-dir> # required: artifact, directory, or glob to validate
--depth quick|standard|thorough # depth profile (default: quick)
--rubric <name-or-path> # rubric to use (auto-detected if omitted)
--pattern "*.md" # filter files when --target is a directory
--relationships <manifest.yaml> # cross-artifact manifest for Phase 2 consistency checks
--output-dir ./my-runs # where to write outputs (default: ./quorum-runs/)
--verbose # show full evidence for all findings
Exit codes:
0— PASS or PASS_WITH_NOTES (no blocking issues)1— Error (bad arguments, missing file, API failure)2— REVISE or REJECT (validation failed; artifact needs work)
Depth Profiles
| Depth | Critics | Fix Loops | Use For |
|---|---|---|---|
quick |
correctness, completeness | 0 | Fast feedback, drafts |
standard |
correctness, completeness, security, code_hygiene | 0 | Most work, PR reviews |
thorough |
all 4 shipped critics (more as they land) | 1 (Fixer proposals) | Critical decisions, production changes |
All depth profiles include the deterministic pre-screen (10 checks, no LLM cost) before any critics run.
Edit quorum/configs/*.yaml to customize model assignments and critic panels.
Rubrics
Rubrics define what "good" looks like for a domain. Built-in rubrics:
| Name | Domain | Criteria |
|---|---|---|
research-synthesis |
Research documents | Citations, logic, completeness, causation |
agent-config |
Agent configurations | Model assignments, permissions, error handling |
python-code |
Python source files | 25 criteria (PC-001–PC-025); auto-detected on .py files |
# List available rubrics
quorum rubrics list
# Show rubric criteria
quorum rubrics show research-synthesis
Custom Rubrics
Create a JSON file matching this schema:
{
"name": "My Custom Rubric",
"domain": "my-domain",
"version": "1.0",
"criteria": [
{
"id": "CR-001",
"criterion": "What to check",
"severity": "HIGH",
"evidence_required": "What proof must be shown",
"why": "Why this matters"
}
]
}
Then: quorum run --target my-file.txt --rubric ./my-rubric.json
Outputs
Each quorum run creates a timestamped directory:
quorum-runs/
└── 20260223-143022-sample-research/
├── run-manifest.json # Run parameters, flags, model config
├── artifact.txt # The artifact (copy)
├── rubric.json # Rubric used
├── prescreen.json # Deterministic pre-screen results (PS-001–PS-010)
├── critics/
│ ├── correctness-findings.json
│ ├── completeness-findings.json
│ ├── security-findings.json
│ ├── code_hygiene-findings.json
│ └── cross_consistency-findings.json # Phase 2 (if --relationships used)
├── verdict.json # Machine-readable verdict
└── report.md # Human-readable report
For batch runs, each file gets its own timestamped sub-directory. A top-level batch-verdict.json summarizes the full run.
Architecture
quorum run --target file --relationships manifest.yaml
↓
pipeline.py load config, rubric, artifact
↓
prescreen.py 10 deterministic checks (PS-001–PS-010)
→ prescreen.json (no LLM, runs instantly)
↓
supervisor.py Phase 1: classify domain, dispatch critics concurrently
(ThreadPoolExecutor, max 4 critics in parallel;
batch files run concurrently max 3)
↓
correctness.py }
completeness.py } each critic → LLM → structured findings (parallel)
security.py } (framework-grounded: OWASP ASVS, CWE, NIST SA-11,
code_hygiene.py } ISO 25010:2023, CISQ)
↓
fixer.py Phase 1.5: proposes text replacements for CRITICAL/HIGH
(only if max_fix_loops > 0; thorough depth default: 1)
↓
cross_artifact.py Phase 2: cross-artifact consistency critic
(only if --relationships provided)
receives Phase 1 findings as context — NOT verdicts
↓
aggregator.py deduplicate, resolve conflicts, assign verdict
↓
output.py terminal report + write run directory
The core principle: Every finding must have evidence (a quote, a tool result, a rubric citation). The Aggregator rejects ungrounded claims. This prevents LLM hand-waving.
Phase 1 vs Phase 2: Phase 1 critics evaluate each file independently. Phase 2 (Cross-Artifact Consistency) receives Phase 1 findings — not verdicts — as context and evaluates declared relationships between files. This keeps phases independent: Phase 2 sees what was found, not a judgment.
Configuration
Quorum uses YAML config files for depth profiles. See quorum/configs/:
# quorum/configs/quick.yaml
critics:
- correctness
- completeness
model_tier1: claude-opus-4 # Strong model (judgment-heavy roles)
model_tier2: claude-sonnet-4 # Efficient model (critic execution)
max_fix_loops: 0
depth_profile: quick
temperature: 0.1
max_tokens: 4096
Model names follow LiteLLM conventions — any provider LiteLLM supports works here.
Cross-Artifact Validation
When your project has multiple files that should agree with each other — a spec and its implementation, an API contract and its consumers, a config and its schema — use --relationships to declare those relationships and let Quorum check them.
Relationship Manifest
Create quorum-relationships.yaml:
version: "1.0"
relationships:
- source: src/api_handler.py
target: docs/api-spec.md
type: implements
description: "Handler must implement all endpoints declared in spec"
- source: docs/api-spec.md
target: src/api_handler.py
type: documents
description: "Spec must document all public endpoints in handler"
- source: quorum/critics/security.py
target: quorum/configs/standard.yaml
type: delegates
description: "Security critic is enabled in standard depth profile"
- source: data/output-schema.json
target: src/pipeline.py
type: schema_contract
description: "Pipeline output must conform to declared schema"
Relationship types: implements, documents, delegates, schema_contract
Running with Relationships
quorum run \
--target ./src/ \
--relationships quorum-relationships.yaml \
--depth standard
The Cross-Artifact Consistency critic evaluates each declared relationship, looking for mismatches, undocumented behavior, and broken contracts. Findings use a Locus model — each finding references both files with role annotations (source_role, target_role) and a source_hash to ensure the finding is pinned to the artifact version that was evaluated.
Extending Quorum
Adding a New Critic
- Create
quorum/critics/my_critic.pyinheriting fromBaseCritic - Implement
name,system_prompt, andbuild_prompt() - Register it in
quorum/agents/supervisor.py→CRITIC_REGISTRY - Add the name to your config's
criticslist
See quorum/critics/correctness.py for a complete example.
Adding a Cross-Artifact Critic
The Cross-Artifact Consistency critic does not inherit from BaseCritic — it uses a separate CrossArtifactCritic base class with a different interface, because it operates on pairs/groups of files rather than a single artifact. It also receives Phase 1 findings as additional context via build_prompt().
Supported relationship types for manifest declarations: implements, documents, delegates, schema_contract. New types can be added by extending the relationship type registry.
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quorum_validator-0.7.2.tar.gz.
File metadata
- Download URL: quorum_validator-0.7.2.tar.gz
- Upload date:
- Size: 195.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b773b0eae3f0dd3a0fc72767d99d8ea2f4fc25f1245322d8227ef8bf55a0b19c
|
|
| MD5 |
366e3d62688f62bdb7d5c5c3f0753510
|
|
| BLAKE2b-256 |
43200c5f900efc38d1266cb7b0477876004b604d97a9d0697b5ea0c5ff5582b0
|
File details
Details for the file quorum_validator-0.7.2-py3-none-any.whl.
File metadata
- Download URL: quorum_validator-0.7.2-py3-none-any.whl
- Upload date:
- Size: 137.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2e06f7b98e50b55c6ebf88c24512e1b0ec995f27e75ef9fa8a953e7889a7871
|
|
| MD5 |
9a9224b3f1bf4c4d629fb599e66dbee5
|
|
| BLAKE2b-256 |
2bc4015245a15cd24463618df8e7705d855e518f1cd300efdba337bcb5ff25c1
|