Correctover 6-Dimensional Verification adapter for Patronus AI evaluation framework
Project description
Correctover Patronus Adapter
6-Dimensional Verification for Patronus AI Evaluation Framework
Bridges Correctover's deterministic verification engine into the Patronus AI evaluation framework. Run Correctover's 87 rules across 6 dimensions on any LLM output, with recomputable proof hashes.
Failover ≠ Correctover.™
What This Does
Patronus AI evaluates LLM outputs. Correctover verifies them. This adapter lets you run Correctover's verification as Patronus evaluators — so you can combine Correctover's deterministic checks with Patronus's experiment tracking, tracing, and benchmarking.
| Patronus Evaluators | Correctover Verification |
|---|---|
| Hallucination detection (Lynx) | Structure + Schema validation |
| Toxicity / PII detection | Integrity checks (forbidden patterns) |
| Context relevance (Lynx) | Identity verification (semantic relevance) |
| Custom judges (Glider) | Full 6-dim deterministic verification |
| This adapter | Adds: recomputable proof hashes, drift detection, failover signals |
Quick Start
Installation
pip install correctover-patronus
Full 6-Dimension Verification (Recommended)
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig
# Configure verification rules
config = CorrectoverConfig(
structure_rules={"format": "json", "required_keys": ["answer", "confidence"]},
schema_rules={
"require_json": True,
"fields": {
"answer": {"type": "string", "required": True},
"confidence": {"type": "number", "required": True},
},
},
integrity_rules={
"forbidden_patterns": [r"Traceback", r"Error:\s", r"Exception:"]
},
latency_rules={"max_ms": 5000},
cost_rules={"max_tokens": 2000},
identity_rules={"min_similarity": 0.3},
)
evaluator = CorrectoverEvaluator(config=config)
# Evaluate an LLM output
result = evaluator.evaluate(
task_input="What is 2+2?",
task_output='{"answer": "4", "confidence": 0.95}',
)
print(f"Verdict: {result.text_output}") # "pass"
print(f"Score: {result.score:.2f}") # 1.00
print(f"Passed: {result.pass_}") # True
print(f"Proof: {result.metadata['correctover_proof_hash'][:16]}...")
Individual Dimensions
Each dimension is also available as a standalone Patronus evaluator:
from correctover_patronus import (
correctover_structure,
correctover_schema,
correctover_identity,
correctover_integrity,
)
# Check structure only
result = correctover_structure(
evaluated_model_input="What is 2+2?",
evaluated_model_output="The answer is 4.",
)
# Check integrity (forbidden patterns)
result = correctover_integrity(
evaluated_model_input="Process data",
evaluated_model_output="Traceback (most recent call last): Error",
)
print(f"Passed: {result.pass_}") # False
Using in Patronus Experiments
from patronus import init
from patronus.experiments import run_experiment
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig
init(api_key="your-patronus-api-key")
config = CorrectoverConfig(
structure_rules={"format": "json"},
integrity_rules={"forbidden_patterns": ["Traceback", "Error:"]},
)
correctover_eval = CorrectoverEvaluator(config=config)
def my_task(row, **kwargs):
# Your LLM call here
return call_my_llm(row["task_input"])
experiment = run_experiment(
project_name="My Agent Evaluation",
dataset=[
{"task_input": "What is 2+2?", "gold_answer": "4"},
{"task_input": "Capital of France?", "gold_answer": "Paris"},
],
task=my_task,
evaluators=[correctover_eval],
)
experiment.to_csv("./results.csv")
The correctover_full Shorthand
For quick integration without configuration:
from correctover_patronus import correctover_full
result = correctover_full(
evaluated_model_input="Tell me about AI",
evaluated_model_output="AI is a field of computer science focused on building intelligent systems.",
)
print(f"Verdict: {result.text_output}, Score: {result.score:.2f}")
The 6 Dimensions
| # | Dimension | What It Checks | Failure Signal |
|---|---|---|---|
| D1 | Structure | JSON parseability, required keys, minimum length | Output is malformed or missing required structure |
| D2 | Schema | Field types, constraints, required values | Type mismatch (e.g., "confidence": "high" instead of 0.95) |
| D3 | Latency | Response time within acceptable bounds | Provider degradation, network issues |
| D4 | Cost | Token consumption within budget | Runaway token usage, inefficient prompting |
| D5 | Identity | Output semantically relevant to input | Hallucination drift, provider confusion |
| D6 | Integrity | No forbidden patterns (errors, stack traces, PII) | Leaked internals, unsafe content |
Each dimension produces a score (0.0–1.0) and a pass/fail verdict. The 6 dimensions aggregate into an overall verdict:
- pass — all 6 dimensions pass (confidence = 1.0)
- partial — confidence ≥ threshold (default 0.6) but some dimensions fail
- fail — confidence < threshold, or critical dimensions fail catastrophically
Recomputable Proof
Every evaluation produces a proof hash — a SHA-256 digest binding the input, rules, and verdict. Anyone can recompute the same hash independently:
# The proof hash is in the result metadata
proof_hash = result.metadata["correctover_proof_hash"]
# Anyone with the same inputs + rules gets the same hash
# This proves the verdict was computed correctly
This is the foundation of the Correctover standard: verification is reproducible, not just reported.
- Standards: STANDARDS.md
- Conformance Board: api.babyblueviper.com/conformance
- Independent Verification: All verdicts independently verified by babyblueviper1
Configuration Reference
from correctover_patronus import CorrectoverConfig
config = CorrectoverConfig(
# D1: Structure rules
structure_rules={
"format": "json", # "json" or "text"
"required_keys": ["answer"], # Required top-level keys (JSON mode)
"min_length": 1, # Minimum output length (text mode)
},
# D2: Schema rules
schema_rules={
"require_json": True, # Fail if output isn't valid JSON
"fields": {
"answer": {"type": "string", "required": True},
"confidence": {"type": "number", "required": True},
},
},
# D3: Latency rules
latency_rules={
"max_ms": 5000, # Maximum acceptable latency
},
# D4: Cost rules
cost_rules={
"max_tokens": 2000, # Maximum token budget per call
},
# D5: Identity rules
identity_rules={
"min_similarity": 0.3, # Minimum input-output similarity
},
# D6: Integrity rules
integrity_rules={
"forbidden_patterns": [ # Regex patterns that must NOT appear
r"Traceback \(most recent call last\)",
r"Error:\s",
r"Exception:\s",
r"Internal Server Error",
],
},
# Aggregate settings
min_confidence=0.6, # Below this → verdict="fail"
# Metadata (for proof package)
provider_name="openai",
model_name="gpt-4",
agent_role="Researcher",
task_description="Research task",
)
Result Metadata
Every EvaluationResult includes rich metadata:
| Key | Description |
|---|---|
correctover_verdict |
pass, partial, or fail |
correctover_confidence |
Aggregate confidence score (0.0–1.0) |
correctover_drift_score |
Output drift magnitude (1.0 - confidence) |
correctover_proof_hash |
SHA-256 recomputable proof |
correctover_input_hash |
SHA-256 of canonical input |
correctover_verification_latency_ms |
Verification time (typically < 1ms) |
correctover_should_failover |
Whether failover is recommended |
correctover_{dim}_passed |
Per-dimension pass/fail |
correctover_{dim}_score |
Per-dimension score (0.0–1.0) |
correctover_{dim}_detail |
Per-dimension human-readable detail |
Requirements
- Python 3.10+
correctover-crewai >= 0.1.0patronus >= 0.1.10
Performance
| Metric | Value |
|---|---|
| P50 verification latency | 22 μs |
| SDK size | 586 KB |
| Rules evaluated | 87 |
| Dimensions | 6 |
| Supported providers | 7 |
Links
- Correctover Standards: STANDARDS.md
- Conformance Board: api.babyblueviper.com/conformance
- PyPI: correctover-patronus
- GitHub: Correctover/correctover-patronus
License
Apache 2.0
Correctover — Failover ≠ Correctover™
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file correctover_patronus-0.1.0.tar.gz.
File metadata
- Download URL: correctover_patronus-0.1.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
071314c7089432b4674d5fd18d8d77256e851920652c34b07a568a13fe1441e0
|
|
| MD5 |
39cb228dbcc0c1798d2450f78e136fc6
|
|
| BLAKE2b-256 |
aa257432706461e1a513e0bf31fb8c9986dd9f6d40dfbbbd81705714dfa88ad1
|
File details
Details for the file correctover_patronus-0.1.0-py3-none-any.whl.
File metadata
- Download URL: correctover_patronus-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1d7fcade58ca16e89a1640bfb82c0c58754ddfa7a1a23f0016a1f60bbd4dda
|
|
| MD5 |
a7d25ab7e0734296b86c000b7b81c224
|
|
| BLAKE2b-256 |
d899a2d9c7eb50a5bb1549bf65696fdaf2bfbc204ace588b8269d2802b031aaa
|