Agent framework implementing Plan → Execute → Verify → Recover with trained verification

These details have not been verified by PyPI

Project links

Project description

VerifyLoop

The Instagram moment for agents. Plan → Execute → Verify → Recover.

VerifyLoop is an agent framework where the verify step uses a trained model — not a prompt. Every other agent framework verifies with the same LLM that generated the code. That's like asking the person who wrote the bug to confirm there's no bug.

Architecture

┌─────────────────────────────────────────────────────────┐
│                     AgentPipeline                        │
│                                                          │
│  ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌──────┐ │
│  │  PLAN    │───▶│ EXECUTE  │───▶│ VERIFY  │───▶│ DONE │ │
│  │         │    │          │    │         │    │  ✓   │ │
│  └─────────┘    └──────────┘    └────┬────┘    └──────┘ │
│                                      │                    │
│                               ┌──────▼──────┐            │
│                               │  Confidence  │            │
│                               │   < 0.8 ?    │            │
│                               └──────┬──────┘            │
│                                      │ Yes               │
│                               ┌──────▼──────┐            │
│                               │  RECOVER    │            │
│                               │  Fix errors │            │
│                               └──────┬──────┘            │
│                                      │                    │
│                              Loop back to EXECUTE         │
└─────────────────────────────────────────────────────────┘

Why VerifyLoop is different

Feature	Other Agents	VerifyLoop
Verification	LLM prompt (same model)	Trained ReasonCritic model
Error recovery	Retry or re-prompt	Pattern-matched recovery strategies
Confidence scoring	None or vibes	Numeric confidence threshold
Recovery loop	None or ad-hoc	Structured Plan→Exec→Verify→Recover
Token tracking	Best-effort	Built-in per-phase tracking

Quick Start

Install

pip install verifyloop

CLI

# Run a task
vl run "add authentication to app.py"

# Run from a task file
vl run --task-file tasks/fix_bug.json

# Interactive mode (confirm each step)
vl run --interactive "refactor the database layer"

# Specify models
vl run --model gpt-4o --verify-model reason-critic-7b "write tests"

# Dry run (plan only, don't execute)
vl run --dry-run "create a REST API"

# Limit iterations
vl run --max-iterations 3 "fix the flaky test"

# Docker sandbox for bash commands
vl run --sandbox "install dependencies and run tests"

Python API

import asyncio
from verifyloop import AgentPipeline, PipelineConfig

async def main():
    config = PipelineConfig(
        model="gpt-4o",
        verify_model="reason-critic-7b",
        max_iterations=5,
        confidence_threshold=0.8,
    )

    pipeline = AgentPipeline(config)

    # Stream events
    async def on_event(event, data):
        print(f"[{event}] {data}")

    pipeline.on_event(on_event)

    result = await pipeline.run(
        task="Add a hello() function to app.py",
        context="Python project with a Flask web app",
    )

    print(f"Status: {result.status}")
    print(f"Steps: {len(result.steps)}")
    print(f"Duration: {result.duration_seconds:.2f}s")

asyncio.run(main())

Individual Components

from verifyloop import PlanGenerator, Executor, Verifier, VerifierConfig, Recoverer

# Use components individually
planner = PlanGenerator(model="gpt-4o")
plan = await planner.generate_plan("Fix the login bug in auth.py")

executor = Executor(working_dir=".")
step = await executor.bash("pytest tests/")

verifier = Verifier(VerifierConfig(verify_model="reason-critic-7b"))
result = await verifier.verify_file_state("auth.py", expected_content="def login()")

recoverer = Recoverer(model="gpt-4o")
recovery = await recoverer.recover("FileNotFoundError: auth.py not found")

API Reference

`PipelineConfig`

Field	Type	Default	Description
`model`	`str`	`"gpt-4o"`	LLM model for planning/recovery
`verify_model`	`str`	`"reason-critic-7b"`	Trained verification model
`max_iterations`	`int`	`5`	Max Plan→Execute→Verify loops
`confidence_threshold`	`float`	`0.8`	Minimum confidence to accept result
`max_recovery_attempts`	`int`	`3`	Max recovery attempts per iteration
`working_dir`	`str`	`"."`	Working directory for file ops
`dry_run`	`bool`	`False`	Plan only, don't execute
`interactive`	`bool`	`False`	Confirm each step before execution
`sandbox`	`bool`	`False`	Run bash in Docker container
`sandbox_image`	`str`	`"python:3.11-slim"`	Docker image for sandbox

`AgentPipeline`

pipeline = AgentPipeline(config)

# Run a task
result: AgentRun = await pipeline.run(task, context, max_iterations)

# Register event callbacks
pipeline.on_event(callback)  # async def callback(event: str, data: dict)

# Access token usage
print(pipeline.token_usage)

`AgentRun`

Field	Type	Description
`task`	`str`	Original task description
`steps`	`list[Step]`	All plan/execute/verify/recover steps
`status`	`RunStatus`	`pending` / `planning` / `executing` / `verifying` / `recovering` / `completed` / `failed`
`token_usage`	`TokenUsage`	Prompt + completion token counts
`duration_seconds`	`float`	Total wall-clock time
`iteration`	`int`	Which iteration completed
`metadata`	`dict`	Additional metadata

`Executor`

executor = Executor(working_dir=".", sandbox=False)

# Tools
result = await executor.bash("ls -la")
result = await executor.read("app.py")
result = await executor.write("new_file.py", content)
result = await executor.edit("app.py", old_content, new_content)
result = await executor.web_search("python requests library")
result = await executor.web_fetch("https://example.com/docs")

# File history and rollback
history = executor.get_file_history("app.py")
executor.rollback_file("app.py")

`Verifier`

verifier = Verifier(VerifierConfig(
    verify_model="reason-critic-7b",
    confidence_threshold=0.8,
    prefer_trained_model=True,
))

# Verification methods
result = await verifier.verify_code_edits(plan, execute_steps)
result = await verifier.verify_bash_output("pytest", output, expected="passed")
result = await verifier.verify_file_state("app.py", expected_content="def hello")
result = await verifier.verify_tests("pytest tests/", working_dir=".")

`Recoverer`

recoverer = Recoverer(model="gpt-4o", max_recovery_attempts=3)

# Recovery with pattern matching
recovery = await recoverer.recover(
    error="SyntaxError: invalid syntax",
    context="File: app.py, Line 42",
    attempt=1,
)

# Pattern types: edit, create, retry, simplify, analyze
print(recovery.recovery_type)   # "edit"
print(recovery.recovery_attempt) # "Fix syntax error in the file"
print(recovery.exhausted)        # False

# Check if retry is worthwhile
should_retry = recoverer.should_retry("TimeoutError", attempt=2)  # True

`InMemoryStore` / `FileStore`

from verifyloop import InMemoryStore, FileStore

# In-memory (default)
memory = InMemoryStore()
await memory.store("key", {"data": "value"})
result = await memory.retrieve("key")
results = await memory.search("value")

# Persistent file storage
memory = FileStore(base_dir=".verifyloop_memory")
await memory.store("key", {"data": "value"}, namespace="project1")

`ConversationContext`

from verifyloop.memory import ConversationContext

ctx = ConversationContext()
ctx.add_message("user", "Fix the bug in main.py")
ctx.add_file_context("main.py", "def broken():\n    return 1/0")

# Build context string for LLM
context = ctx.build_context_string()

Configuration

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (for GPT models)
`ANTHROPIC_API_KEY`	Anthropic API key (for Claude models)
`VERIFYLOOP_VERIFY_MODEL`	Override the verification model
`VERIFYLOOP_CONFIDENCE`	Override confidence threshold (0.0-1.0)

Task File Format

{
  "task": "Add authentication to app.py",
  "context": "Flask application with a login route",
  "model": "gpt-4o",
  "verify_model": "reason-critic-7b",
  "max_iterations": 3
}

Comparison with Other Agent Frameworks

vs. AutoGPT / BabyAGI

Aspect	AutoGPT	VerifyLoop
Planning	Single prompt	Decomposed substeps with tool estimation
Verification	None	Trained model with confidence scoring
Recovery	Basic retry	Pattern-matched strategies (5 types)
Loop control	Infinite loop risk	Bounded iterations + convergence check

vs. LangChain Agents

Aspect	LangChain	VerifyLoop
Verification	LLM-as-judge (same model)	Dedicated trained verification model
Structured output	Optional	Enforced via Pydantic models
Recovery	Chain retries	Typed recovery with strategy selection
Token tracking	Callback-based	Built-in per-phase tracking

vs. Claude Code / Cursor

Aspect	Claude Code	VerifyLoop
Verification	Same model self-review	Dedicated ReasonCritic model
Recovery	Re-prompt	Pattern-matched with LLM fallback
Programmatic	Limited CLI	Full Python API + CLI
Extensibility	Plugin system	Tool interface + plugin system

Verification Model: ReasonCritic

The key differentiator. VerifyLoop uses ReasonCritic, a trained model specifically for verification:

Not a prompt — It's a model fine-tuned on verification tasks (code review, test analysis, output comparison)
Falls back gracefully — If ReasonCritic is unavailable, falls back to a general LLM with structured verification prompts
Confidence scoring — Numeric 0-1 confidence score, not binary pass/fail
Actionable failures — Every failure comes with fix suggestions, not just "it broke"

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project	Description
Anvil	Self-verified coding agent
VerifyLoop	Plan→Execute→Verify→Recover framework
ErrorRecovery	Self-healing middleware (3,725 error patterns)
FableForge-14B	The fine-tuned 14B model (4-stage training)
ShellWhisperer	1.5B edge agent (phone/RPi, 50ms)
ReasonCritic	Verification model (130 benchmark tasks)
TraceCompiler	Compile traces → LoRA skills
AgentRuntime	Persistent agent daemon (systemd for AI)
AgentSwarm	Multi-agent from real trace transitions
AgentTelemetry	Datadog for agents (token tracking, costs)
BenchAgent	HumanEval for tool-use (107 tasks)
AgentDev	VSCode extension with verification
TraceViz	Trace replay visualizer (Next.js)
AgentSkills	npm for agent behaviors
AgentCurriculum	5-stage progressive training
AgentFuzzer	Adversarial testing for agents
AgentConstitution	Safety guardrails from traces
CostOptimizer	Token cost reduction (50-80%)
AgentProfiler	Behavioral fingerprinting
TrajectoryDistiller	Trace→training data pipeline
Fable5-Dataset	HuggingFace dataset release

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyloop-0.1.0.tar.gz (26.1 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verifyloop-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file verifyloop-0.1.0.tar.gz.

File metadata

Download URL: verifyloop-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 26.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for verifyloop-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1feb422a6834ee46dffc138020a76579e1dd9ee4784034541bcc7d3b79e18455`
MD5	`fc67d0fb58db6feeb74d030073323fe0`
BLAKE2b-256	`f52ea70b5ded3f750e1a36c2059d0ff712eec38a871a78fce8e675f6d0a51fc6`

See more details on using hashes here.

File details

Details for the file verifyloop-0.1.0-py3-none-any.whl.

File metadata

Download URL: verifyloop-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 26.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for verifyloop-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3314f971935bd9637ab41708a8e0642f41c40dfad31a74f89b9b61abf6a32ae8`
MD5	`fa809cb0105444f03362017ce4b1719f`
BLAKE2b-256	`231340a713ecf2c23e68307dd24b5ffe8eb522c71b3f31a0660eea1594458183`

See more details on using hashes here.

verifyloop 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VerifyLoop

Architecture

Why VerifyLoop is different

Quick Start

Install

CLI

Python API

Individual Components

API Reference

PipelineConfig

AgentPipeline

AgentRun

Executor

Verifier

Recoverer

InMemoryStore / FileStore

ConversationContext

Configuration

Environment Variables

Task File Format

Comparison with Other Agent Frameworks

vs. AutoGPT / BabyAGI

vs. LangChain Agents

vs. Claude Code / Cursor

Verification Model: ReasonCritic

License

Ecosystem

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PipelineConfig`

`AgentPipeline`

`AgentRun`

`Executor`

`Verifier`

`Recoverer`

`InMemoryStore` / `FileStore`

`ConversationContext`