Skip to main content

Agent framework implementing Plan → Execute → Verify → Recover with trained verification

Project description

VerifyLoop

License: MIT Python 3.10+ Tests

The Instagram moment for agents. Plan → Execute → Verify → Recover.

VerifyLoop is an agent framework where the verify step uses a trained model — not a prompt. Every other agent framework verifies with the same LLM that generated the code. That's like asking the person who wrote the bug to confirm there's no bug.

Architecture

┌─────────────────────────────────────────────────────────┐
│                     AgentPipeline                        │
│                                                          │
│  ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌──────┐ │
│  │  PLAN    │───▶│ EXECUTE  │───▶│ VERIFY  │───▶│ DONE │ │
│  │         │    │          │    │         │    │  ✓   │ │
│  └─────────┘    └──────────┘    └────┬────┘    └──────┘ │
│                                      │                    │
│                               ┌──────▼──────┐            │
│                               │  Confidence  │            │
│                               │   < 0.8 ?    │            │
│                               └──────┬──────┘            │
│                                      │ Yes               │
│                               ┌──────▼──────┐            │
│                               │  RECOVER    │            │
│                               │  Fix errors │            │
│                               └──────┬──────┘            │
│                                      │                    │
│                              Loop back to EXECUTE         │
└─────────────────────────────────────────────────────────┘

Why VerifyLoop is different

Feature Other Agents VerifyLoop
Verification LLM prompt (same model) Trained ReasonCritic model
Error recovery Retry or re-prompt Pattern-matched recovery strategies
Confidence scoring None or vibes Numeric confidence threshold
Recovery loop None or ad-hoc Structured Plan→Exec→Verify→Recover
Token tracking Best-effort Built-in per-phase tracking

Quick Start

Install

pip install verifyloop

CLI

# Run a task
vl run "add authentication to app.py"

# Run from a task file
vl run --task-file tasks/fix_bug.json

# Interactive mode (confirm each step)
vl run --interactive "refactor the database layer"

# Specify models
vl run --model gpt-4o --verify-model reason-critic-7b "write tests"

# Dry run (plan only, don't execute)
vl run --dry-run "create a REST API"

# Limit iterations
vl run --max-iterations 3 "fix the flaky test"

# Docker sandbox for bash commands
vl run --sandbox "install dependencies and run tests"

Python API

import asyncio
from verifyloop import AgentPipeline, PipelineConfig

async def main():
    config = PipelineConfig(
        model="gpt-4o",
        verify_model="reason-critic-7b",
        max_iterations=5,
        confidence_threshold=0.8,
    )

    pipeline = AgentPipeline(config)

    # Stream events
    async def on_event(event, data):
        print(f"[{event}] {data}")

    pipeline.on_event(on_event)

    result = await pipeline.run(
        task="Add a hello() function to app.py",
        context="Python project with a Flask web app",
    )

    print(f"Status: {result.status}")
    print(f"Steps: {len(result.steps)}")
    print(f"Duration: {result.duration_seconds:.2f}s")

asyncio.run(main())

Individual Components

from verifyloop import PlanGenerator, Executor, Verifier, VerifierConfig, Recoverer

# Use components individually
planner = PlanGenerator(model="gpt-4o")
plan = await planner.generate_plan("Fix the login bug in auth.py")

executor = Executor(working_dir=".")
step = await executor.bash("pytest tests/")

verifier = Verifier(VerifierConfig(verify_model="reason-critic-7b"))
result = await verifier.verify_file_state("auth.py", expected_content="def login()")

recoverer = Recoverer(model="gpt-4o")
recovery = await recoverer.recover("FileNotFoundError: auth.py not found")

API Reference

PipelineConfig

Field Type Default Description
model str "gpt-4o" LLM model for planning/recovery
verify_model str "reason-critic-7b" Trained verification model
max_iterations int 5 Max Plan→Execute→Verify loops
confidence_threshold float 0.8 Minimum confidence to accept result
max_recovery_attempts int 3 Max recovery attempts per iteration
working_dir str "." Working directory for file ops
dry_run bool False Plan only, don't execute
interactive bool False Confirm each step before execution
sandbox bool False Run bash in Docker container
sandbox_image str "python:3.11-slim" Docker image for sandbox

AgentPipeline

pipeline = AgentPipeline(config)

# Run a task
result: AgentRun = await pipeline.run(task, context, max_iterations)

# Register event callbacks
pipeline.on_event(callback)  # async def callback(event: str, data: dict)

# Access token usage
print(pipeline.token_usage)

AgentRun

Field Type Description
task str Original task description
steps list[Step] All plan/execute/verify/recover steps
status RunStatus pending / planning / executing / verifying / recovering / completed / failed
token_usage TokenUsage Prompt + completion token counts
duration_seconds float Total wall-clock time
iteration int Which iteration completed
metadata dict Additional metadata

Executor

executor = Executor(working_dir=".", sandbox=False)

# Tools
result = await executor.bash("ls -la")
result = await executor.read("app.py")
result = await executor.write("new_file.py", content)
result = await executor.edit("app.py", old_content, new_content)
result = await executor.web_search("python requests library")
result = await executor.web_fetch("https://example.com/docs")

# File history and rollback
history = executor.get_file_history("app.py")
executor.rollback_file("app.py")

Verifier

verifier = Verifier(VerifierConfig(
    verify_model="reason-critic-7b",
    confidence_threshold=0.8,
    prefer_trained_model=True,
))

# Verification methods
result = await verifier.verify_code_edits(plan, execute_steps)
result = await verifier.verify_bash_output("pytest", output, expected="passed")
result = await verifier.verify_file_state("app.py", expected_content="def hello")
result = await verifier.verify_tests("pytest tests/", working_dir=".")

Recoverer

recoverer = Recoverer(model="gpt-4o", max_recovery_attempts=3)

# Recovery with pattern matching
recovery = await recoverer.recover(
    error="SyntaxError: invalid syntax",
    context="File: app.py, Line 42",
    attempt=1,
)

# Pattern types: edit, create, retry, simplify, analyze
print(recovery.recovery_type)   # "edit"
print(recovery.recovery_attempt) # "Fix syntax error in the file"
print(recovery.exhausted)        # False

# Check if retry is worthwhile
should_retry = recoverer.should_retry("TimeoutError", attempt=2)  # True

InMemoryStore / FileStore

from verifyloop import InMemoryStore, FileStore

# In-memory (default)
memory = InMemoryStore()
await memory.store("key", {"data": "value"})
result = await memory.retrieve("key")
results = await memory.search("value")

# Persistent file storage
memory = FileStore(base_dir=".verifyloop_memory")
await memory.store("key", {"data": "value"}, namespace="project1")

ConversationContext

from verifyloop.memory import ConversationContext

ctx = ConversationContext()
ctx.add_message("user", "Fix the bug in main.py")
ctx.add_file_context("main.py", "def broken():\n    return 1/0")

# Build context string for LLM
context = ctx.build_context_string()

Configuration

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key (for GPT models)
ANTHROPIC_API_KEY Anthropic API key (for Claude models)
VERIFYLOOP_VERIFY_MODEL Override the verification model
VERIFYLOOP_CONFIDENCE Override confidence threshold (0.0-1.0)

Task File Format

{
  "task": "Add authentication to app.py",
  "context": "Flask application with a login route",
  "model": "gpt-4o",
  "verify_model": "reason-critic-7b",
  "max_iterations": 3
}

Comparison with Other Agent Frameworks

vs. AutoGPT / BabyAGI

Aspect AutoGPT VerifyLoop
Planning Single prompt Decomposed substeps with tool estimation
Verification None Trained model with confidence scoring
Recovery Basic retry Pattern-matched strategies (5 types)
Loop control Infinite loop risk Bounded iterations + convergence check

vs. LangChain Agents

Aspect LangChain VerifyLoop
Verification LLM-as-judge (same model) Dedicated trained verification model
Structured output Optional Enforced via Pydantic models
Recovery Chain retries Typed recovery with strategy selection
Token tracking Callback-based Built-in per-phase tracking

vs. Claude Code / Cursor

Aspect Claude Code VerifyLoop
Verification Same model self-review Dedicated ReasonCritic model
Recovery Re-prompt Pattern-matched with LLM fallback
Programmatic Limited CLI Full Python API + CLI
Extensibility Plugin system Tool interface + plugin system

Verification Model: ReasonCritic

The key differentiator. VerifyLoop uses ReasonCritic, a trained model specifically for verification:

  1. Not a prompt — It's a model fine-tuned on verification tasks (code review, test analysis, output comparison)
  2. Falls back gracefully — If ReasonCritic is unavailable, falls back to a general LLM with structured verification prompts
  3. Confidence scoring — Numeric 0-1 confidence score, not binary pass/fail
  4. Actionable failures — Every failure comes with fix suggestions, not just "it broke"

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project Description
Anvil Self-verified coding agent
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error patterns)
FableForge-14B The fine-tuned 14B model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyloop-0.1.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verifyloop-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file verifyloop-0.1.0.tar.gz.

File metadata

  • Download URL: verifyloop-0.1.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for verifyloop-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1feb422a6834ee46dffc138020a76579e1dd9ee4784034541bcc7d3b79e18455
MD5 fc67d0fb58db6feeb74d030073323fe0
BLAKE2b-256 f52ea70b5ded3f750e1a36c2059d0ff712eec38a871a78fce8e675f6d0a51fc6

See more details on using hashes here.

File details

Details for the file verifyloop-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: verifyloop-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for verifyloop-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3314f971935bd9637ab41708a8e0642f41c40dfad31a74f89b9b61abf6a32ae8
MD5 fa809cb0105444f03362017ce4b1719f
BLAKE2b-256 231340a713ecf2c23e68307dd24b5ffe8eb522c71b3f31a0660eea1594458183

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page