Skip to main content

Correctover 6-Dimensional Verification adapter for Patronus AI evaluation framework

Project description

Correctover Patronus Adapter

6-Dimensional Verification for Patronus AI Evaluation Framework

Bridges Correctover's deterministic verification engine into the Patronus AI evaluation framework. Run Correctover's 87 rules across 6 dimensions on any LLM output, with recomputable proof hashes.

Failover ≠ Correctover.


What This Does

Patronus AI evaluates LLM outputs. Correctover verifies them. This adapter lets you run Correctover's verification as Patronus evaluators — so you can combine Correctover's deterministic checks with Patronus's experiment tracking, tracing, and benchmarking.

Patronus Evaluators Correctover Verification
Hallucination detection (Lynx) Structure + Schema validation
Toxicity / PII detection Integrity checks (forbidden patterns)
Context relevance (Lynx) Identity verification (semantic relevance)
Custom judges (Glider) Full 6-dim deterministic verification
This adapter Adds: recomputable proof hashes, drift detection, failover signals

Quick Start

Installation

pip install correctover-patronus

Full 6-Dimension Verification (Recommended)

from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

# Configure verification rules
config = CorrectoverConfig(
    structure_rules={"format": "json", "required_keys": ["answer", "confidence"]},
    schema_rules={
        "require_json": True,
        "fields": {
            "answer": {"type": "string", "required": True},
            "confidence": {"type": "number", "required": True},
        },
    },
    integrity_rules={
        "forbidden_patterns": [r"Traceback", r"Error:\s", r"Exception:"]
    },
    latency_rules={"max_ms": 5000},
    cost_rules={"max_tokens": 2000},
    identity_rules={"min_similarity": 0.3},
)

evaluator = CorrectoverEvaluator(config=config)

# Evaluate an LLM output
result = evaluator.evaluate(
    task_input="What is 2+2?",
    task_output='{"answer": "4", "confidence": 0.95}',
)

print(f"Verdict: {result.text_output}")     # "pass"
print(f"Score: {result.score:.2f}")          # 1.00
print(f"Passed: {result.pass_}")             # True
print(f"Proof: {result.metadata['correctover_proof_hash'][:16]}...")

Individual Dimensions

Each dimension is also available as a standalone Patronus evaluator:

from correctover_patronus import (
    correctover_structure,
    correctover_schema,
    correctover_identity,
    correctover_integrity,
)

# Check structure only
result = correctover_structure(
    evaluated_model_input="What is 2+2?",
    evaluated_model_output="The answer is 4.",
)

# Check integrity (forbidden patterns)
result = correctover_integrity(
    evaluated_model_input="Process data",
    evaluated_model_output="Traceback (most recent call last): Error",
)
print(f"Passed: {result.pass_}")  # False

Using in Patronus Experiments

from patronus import init
from patronus.experiments import run_experiment
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

init(api_key="your-patronus-api-key")

config = CorrectoverConfig(
    structure_rules={"format": "json"},
    integrity_rules={"forbidden_patterns": ["Traceback", "Error:"]},
)
correctover_eval = CorrectoverEvaluator(config=config)

def my_task(row, **kwargs):
    # Your LLM call here
    return call_my_llm(row["task_input"])

experiment = run_experiment(
    project_name="My Agent Evaluation",
    dataset=[
        {"task_input": "What is 2+2?", "gold_answer": "4"},
        {"task_input": "Capital of France?", "gold_answer": "Paris"},
    ],
    task=my_task,
    evaluators=[correctover_eval],
)
experiment.to_csv("./results.csv")

The correctover_full Shorthand

For quick integration without configuration:

from correctover_patronus import correctover_full

result = correctover_full(
    evaluated_model_input="Tell me about AI",
    evaluated_model_output="AI is a field of computer science focused on building intelligent systems.",
)
print(f"Verdict: {result.text_output}, Score: {result.score:.2f}")

The 6 Dimensions

# Dimension What It Checks Failure Signal
D1 Structure JSON parseability, required keys, minimum length Output is malformed or missing required structure
D2 Schema Field types, constraints, required values Type mismatch (e.g., "confidence": "high" instead of 0.95)
D3 Latency Response time within acceptable bounds Provider degradation, network issues
D4 Cost Token consumption within budget Runaway token usage, inefficient prompting
D5 Identity Output semantically relevant to input Hallucination drift, provider confusion
D6 Integrity No forbidden patterns (errors, stack traces, PII) Leaked internals, unsafe content

Each dimension produces a score (0.0–1.0) and a pass/fail verdict. The 6 dimensions aggregate into an overall verdict:

  • pass — all 6 dimensions pass (confidence = 1.0)
  • partial — confidence ≥ threshold (default 0.6) but some dimensions fail
  • fail — confidence < threshold, or critical dimensions fail catastrophically

Recomputable Proof

Every evaluation produces a proof hash — a SHA-256 digest binding the input, rules, and verdict. Anyone can recompute the same hash independently:

# The proof hash is in the result metadata
proof_hash = result.metadata["correctover_proof_hash"]

# Anyone with the same inputs + rules gets the same hash
# This proves the verdict was computed correctly

This is the foundation of the Correctover standard: verification is reproducible, not just reported.

Configuration Reference

from correctover_patronus import CorrectoverConfig

config = CorrectoverConfig(
    # D1: Structure rules
    structure_rules={
        "format": "json",           # "json" or "text"
        "required_keys": ["answer"], # Required top-level keys (JSON mode)
        "min_length": 1,            # Minimum output length (text mode)
    },

    # D2: Schema rules
    schema_rules={
        "require_json": True,        # Fail if output isn't valid JSON
        "fields": {
            "answer": {"type": "string", "required": True},
            "confidence": {"type": "number", "required": True},
        },
    },

    # D3: Latency rules
    latency_rules={
        "max_ms": 5000,              # Maximum acceptable latency
    },

    # D4: Cost rules
    cost_rules={
        "max_tokens": 2000,          # Maximum token budget per call
    },

    # D5: Identity rules
    identity_rules={
        "min_similarity": 0.3,       # Minimum input-output similarity
    },

    # D6: Integrity rules
    integrity_rules={
        "forbidden_patterns": [      # Regex patterns that must NOT appear
            r"Traceback \(most recent call last\)",
            r"Error:\s",
            r"Exception:\s",
            r"Internal Server Error",
        ],
    },

    # Aggregate settings
    min_confidence=0.6,              # Below this → verdict="fail"

    # Metadata (for proof package)
    provider_name="openai",
    model_name="gpt-4",
    agent_role="Researcher",
    task_description="Research task",
)

Result Metadata

Every EvaluationResult includes rich metadata:

Key Description
correctover_verdict pass, partial, or fail
correctover_confidence Aggregate confidence score (0.0–1.0)
correctover_drift_score Output drift magnitude (1.0 - confidence)
correctover_proof_hash SHA-256 recomputable proof
correctover_input_hash SHA-256 of canonical input
correctover_verification_latency_ms Verification time (typically < 1ms)
correctover_should_failover Whether failover is recommended
correctover_{dim}_passed Per-dimension pass/fail
correctover_{dim}_score Per-dimension score (0.0–1.0)
correctover_{dim}_detail Per-dimension human-readable detail

Requirements

  • Python 3.10+
  • correctover-crewai >= 0.1.0
  • patronus >= 0.1.10

Performance

Metric Value
P50 verification latency 22 μs
SDK size 586 KB
Rules evaluated 87
Dimensions 6
Supported providers 7

Links

License

Apache 2.0


Correctover — Failover ≠ Correctover™

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

correctover_patronus-0.1.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

correctover_patronus-0.1.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file correctover_patronus-0.1.0.tar.gz.

File metadata

  • Download URL: correctover_patronus-0.1.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for correctover_patronus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 071314c7089432b4674d5fd18d8d77256e851920652c34b07a568a13fe1441e0
MD5 39cb228dbcc0c1798d2450f78e136fc6
BLAKE2b-256 aa257432706461e1a513e0bf31fb8c9986dd9f6d40dfbbbd81705714dfa88ad1

See more details on using hashes here.

File details

Details for the file correctover_patronus-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for correctover_patronus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db1d7fcade58ca16e89a1640bfb82c0c58754ddfa7a1a23f0016a1f60bbd4dda
MD5 a7d25ab7e0734296b86c000b7b81c224
BLAKE2b-256 d899a2d9c7eb50a5bb1549bf65696fdaf2bfbc204ace588b8269d2802b031aaa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page