A self-verification model that critiques agent output — it doesn't generate, it flags errors.

These details have not been verified by PyPI

Project description

ReasonCritic

A self-verification model that critiques agent output. It doesn't generate — it flags errors.

Overview

ReasonCritic is a verification model trained to detect bugs, security issues, logic errors, and style problems in code generated by AI agents. Unlike generative models, it focuses exclusively on critique: given code, it produces a structured verdict (PASS/FAIL), confidence score, issue list, and actionable suggestions.

Data Sources

v-Fable verification phase: 62.2% of traces contain verification steps — extracted as (code, pass/fail) pairs
Glint error/recovery pairs: 3,725 examples of agent mistakes and their corrections

Architecture

Base model: Qwen3-7B
Training: Three-stage pipeline (contrastive → LoRA → DPO)
Output: Structured verification result with verdict, confidence, issues, and suggestions

Installation

pip install -e .

# With DPO training support:
pip install -e ".[dpo]"

# With development tools:
pip install -e ".[dev]"

Quick Start

CLI

# Verify a code snippet
critic verify --code "def add(a, b): return a + b"

# Verify a file
critic verify --file app.py

# Verify an agent trace
critic verify --trace trace.jsonl

# Train the critic model
critic train --data pairs.jsonl --model Qwen/Qwen3-7B

# Start the API server
critic serve --port 8000

Python API

from reason_critic import ReasonCritic, VerificationResult

# Initialize critic
critic = ReasonCritic(backend="local", model_name="reason-critic-7b")

# Verify code
result = critic.verify(
    code="def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n - 1)",
    language="python",
)

print(f"Verdict: {result.pass_fail}")      # PASS or FAIL
print(f"Confidence: {result.confidence}")  # 0.0 to 1.0
print(f"Issues: {result.issues}")          # List of issues
print(f"Suggestions: {result.suggestions}") # List of suggestions

Verify an Agent Step

step = {
    "index": 0,
    "type": "code_generation",
    "code": "for i in range(11):\n    print(data[i])",
    "name": "process_data",
}
step_result = critic.verify_step(step, context="Processing user data")
print(step_result.result.pass_fail)  # FAIL (off-by-one)

Verify a Full Agent Run

run = {
    "id": "run-abc123",
    "steps": [
        {"index": 0, "type": "generation", "code": "x = 1", "name": "init"},
        {"index": 1, "type": "generation", "code": "y = x + 1", "name": "compute"},
    ]
}
run_result = critic.verify_run(run)
print(f"Overall: {run_result.overall_verdict}")  # PASS or FAIL
print(f"Steps passed: {run_result.num_passed}/{len(run_result.step_verifications)}")

Generate-then-Verify Pipeline

from reason_critic.pipeline import GenerateVerifyPipeline, GeneratorWrapper
from reason_critic import ReasonCritic

pipeline = GenerateVerifyPipeline(
    generator=GeneratorWrapper(model_name="Qwen/Qwen3-7B"),
    critic=ReasonCritic(backend="local", model_name="reason-critic-7b"),
    max_attempts=3,
)

result = pipeline.generate_and_verify(
    task="Write a function that checks if a string is a palindrome",
    language="python",
)

print(f"Passed: {result.passed}")
print(f"Attempts: {result.total_attempts}")
print(f"Final code:\n{result.final_code}")

If verification fails, the pipeline feeds issues back to the generator for re-generation, up to max_attempts cycles.

API Server

# Start the server
critic serve --port 8000

Endpoints

`POST /verify` — Verify code

{
    "code": "def add(a, b): return a - b",
    "context": "Addition function",
    "language": "python"
}

Response:

{
    "pass_fail": "FAIL",
    "confidence": 0.92,
    "issues": ["Subtraction instead of addition"],
    "suggestions": ["Use + instead of -"],
    "explanation": "Function uses subtraction where addition is expected",
    "language": "python"
}

`POST /verify/step` — Verify a single step

{
    "step": {
        "index": 0,
        "type": "code_generation",
        "code": "for i in range(11): print(data[i])",
        "name": "loop_data"
    },
    "context": "Processing array"
}

`POST /verify/run` — Verify a full agent run

{
    "run": {
        "id": "run-123",
        "steps": [
            {"index": 0, "type": "generation", "code": "x = 1"},
            {"index": 1, "type": "generation", "code": "y = x / 0"}
        ]
    },
    "context": "Data processing pipeline"
}

`POST /pipeline` — Generate-then-verify

{
    "task": "Write a sorting function",
    "max_attempts": 3,
    "language": "python"
}

`GET /health` — Health check

{
    "status": "healthy",
    "model": "reason-critic-7b",
    "backend": "local"
}

Training Pipeline

Three-Stage Training

ReasonCritic uses a three-stage training pipeline:

Stage 1: Contrastive Learning — Train on correct/incorrect code pairs to learn the difference
Stage 2: LoRA Fine-Tuning — Efficient fine-tuning with Low-Rank Adaptation
Stage 3: DPO Alignment — Direct Preference Optimization for better verification preferences

Data Preparation

from reason_critic.data_prep import (
    extract_verification_pairs,
    generate_incorrect_versions,
    create_contrastive_pairs,
    load_glint_error_recovery,
)

# Extract from agent traces
examples = extract_verification_pairs(traces)

# Generate buggy versions for contrastive learning
buggy = generate_incorrect_versions(correct_code, num_versions=3)

# Create pairs
pair = create_contrastive_pairs(correct_code, incorrect_code)

# Load Glint error/recovery data
glint_examples = load_glint_error_recovery("glint_data.jsonl")

Bug Templates

generate_incorrect_versions applies systematic bug-introduction strategies:

Bug Type	Description
`off_by_one`	Off-by-one errors in loop bounds
`wrong_operator`	Swapped comparison operators
`missing_none_check`	Missing None check before attribute access
`forgotten_await`	Missing await on async call
`mutable_default`	Mutable default arguments
`shadowed_variable`	Variable shadowing in inner scope

Training

from reason_critic.trainer import TrainingConfig, run_three_stage_pipeline

config = TrainingConfig(
    model_name="Qwen/Qwen3-7B",
    output_dir="./reason-critic-output",
    contrastive_epochs=3,
    lora_epochs=2,
    dpo_epochs=1,
)

results = run_three_stage_pipeline(examples, pairs, output_dir="./output", config=config)

Or via CLI:

critic train --data pairs.jsonl --model Qwen/Qwen3-7B --stage all
critic train --data pairs.jsonl --stage contrastive
critic train --data pairs.jsonl --stage lora
critic train --data pairs.jsonl --stage dpo

Benchmarks

The project includes 130 verification benchmark tasks across 4 categories:

Category	Count	Description
Code Correctness	50	Off-by-one, wrong operators, missing checks, mutations, async bugs
Security Issues	30	SQL injection, XSS, CSRF, command injection, crypto weaknesses
Logic Errors	30	Condition order, inverted logic, De Morgan's law, scope issues
Style Issues	20	Missing docs, magic numbers, god objects, naming, logging

from reason_critic.benchmarks import BENCHMARK_CATEGORIES
import json
from pathlib import Path

for category in BENCHMARK_CATEGORIES:
    path = Path(__file__).parent / "benchmarks" / category / "tasks.json"
    tasks = json.loads(path.read_text())
    print(f"{category}: {len(tasks)} tasks")

Architecture

ReasonCritic
├── critic.py           # Core verification model + backends (local, API, hybrid)
├── data_prep.py        # Training data preparation from traces
├── trainer.py           # Three-stage training pipeline
├── pipeline.py          # Generate-then-verify pipeline
├── server.py            # FastAPI server
├── cli.py               # CLI interface
└── benchmarks/          # Verification benchmark tasks
    ├── code_correctness/  # 50 tasks
    ├── security_issues/    # 30 tasks
    ├── logic_errors/        # 30 tasks
    └── style_issues/         # 20 tasks

Backends

Local: Load model via transformers/Unsloth for local inference
API: Call a remote verification service
Hybrid: Try local first, fall back to API for low-confidence results

VerificationResult Schema

@dataclass
class VerificationResult:
    pass_fail: str         # "PASS" or "FAIL"
    confidence: float      # 0.0 to 1.0
    issues: list[str]      # List of detected issues
    suggestions: list[str] # List of suggested fixes
    explanation: str       # Brief explanation
    language: str          # Programming language
    raw_output: str        # Raw model output
    model_name: str        # Model that produced this result

Running Tests

pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project	Description
Anvil	Self-verified coding agent
VerifyLoop	Plan→Execute→Verify→Recover framework
ErrorRecovery	Self-healing middleware (3,725 error patterns)
FableForge-14B	The fine-tuned 14B model (4-stage training)
ShellWhisperer	1.5B edge agent (phone/RPi, 50ms)
ReasonCritic	Verification model (130 benchmark tasks)
TraceCompiler	Compile traces → LoRA skills
AgentRuntime	Persistent agent daemon (systemd for AI)
AgentSwarm	Multi-agent from real trace transitions
AgentTelemetry	Datadog for agents (token tracking, costs)
BenchAgent	HumanEval for tool-use (107 tasks)
AgentDev	VSCode extension with verification
TraceViz	Trace replay visualizer (Next.js)
AgentSkills	npm for agent behaviors
AgentCurriculum	5-stage progressive training
AgentFuzzer	Adversarial testing for agents
AgentConstitution	Safety guardrails from traces
CostOptimizer	Token cost reduction (50-80%)
AgentProfiler	Behavioral fingerprinting
TrajectoryDistiller	Trace→training data pipeline
Fable5-Dataset	HuggingFace dataset release

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reason_critic-0.1.0.tar.gz (33.2 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reason_critic-0.1.0-py3-none-any.whl (28.8 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file reason_critic-0.1.0.tar.gz.

File metadata

Download URL: reason_critic-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 33.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reason_critic-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`82396e0b06e16d186f2ab39a6d1c039a7be5f15930e6e01cc0fdf5d2ddec3852`
MD5	`3d1538c941f3d6ec69b054db54acc208`
BLAKE2b-256	`7a09fe6b2ccf05952f1a4d60e0f35d5420a0c351c8ce0d36854f3d2818a078c5`

See more details on using hashes here.

File details

Details for the file reason_critic-0.1.0-py3-none-any.whl.

File metadata

Download URL: reason_critic-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 28.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reason_critic-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45ce5705b5d021c4a1a26fc18f365111fbfc7045e53999e42970500542734586`
MD5	`79ee59e01e53971a8295fd8dea51d0a5`
BLAKE2b-256	`45390e9b2dae3915686c3b8e9a4f8c3afffff5f5b4ccf1bb2b010f9907650848`

See more details on using hashes here.

reason-critic 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ReasonCritic

Overview

Data Sources

Architecture

Installation

Quick Start

CLI

Python API

Verify an Agent Step

Verify a Full Agent Run

Generate-then-Verify Pipeline

API Server

Endpoints

POST /verify — Verify code

POST /verify/step — Verify a single step

POST /verify/run — Verify a full agent run

POST /pipeline — Generate-then-verify

GET /health — Health check

Training Pipeline

Three-Stage Training

Data Preparation

Bug Templates

Training

Benchmarks

Architecture

Backends

VerificationResult Schema

Running Tests

License

Ecosystem

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /verify` — Verify code

`POST /verify/step` — Verify a single step

`POST /verify/run` — Verify a full agent run

`POST /pipeline` — Generate-then-verify

`GET /health` — Health check