Agent framework implementing Plan → Execute → Verify → Recover with trained verification
Project description
VerifyLoop
The Instagram moment for agents. Plan → Execute → Verify → Recover.
VerifyLoop is an agent framework where the verify step uses a trained model — not a prompt. Every other agent framework verifies with the same LLM that generated the code. That's like asking the person who wrote the bug to confirm there's no bug.
Architecture
┌─────────────────────────────────────────────────────────┐
│ AgentPipeline │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────┐ │
│ │ PLAN │───▶│ EXECUTE │───▶│ VERIFY │───▶│ DONE │ │
│ │ │ │ │ │ │ │ ✓ │ │
│ └─────────┘ └──────────┘ └────┬────┘ └──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Confidence │ │
│ │ < 0.8 ? │ │
│ └──────┬──────┘ │
│ │ Yes │
│ ┌──────▼──────┐ │
│ │ RECOVER │ │
│ │ Fix errors │ │
│ └──────┬──────┘ │
│ │ │
│ Loop back to EXECUTE │
└─────────────────────────────────────────────────────────┘
Why VerifyLoop is different
| Feature | Other Agents | VerifyLoop |
|---|---|---|
| Verification | LLM prompt (same model) | Trained ReasonCritic model |
| Error recovery | Retry or re-prompt | Pattern-matched recovery strategies |
| Confidence scoring | None or vibes | Numeric confidence threshold |
| Recovery loop | None or ad-hoc | Structured Plan→Exec→Verify→Recover |
| Token tracking | Best-effort | Built-in per-phase tracking |
Quick Start
Install
pip install verifyloop
CLI
# Run a task
vl run "add authentication to app.py"
# Run from a task file
vl run --task-file tasks/fix_bug.json
# Interactive mode (confirm each step)
vl run --interactive "refactor the database layer"
# Specify models
vl run --model gpt-4o --verify-model reason-critic-7b "write tests"
# Dry run (plan only, don't execute)
vl run --dry-run "create a REST API"
# Limit iterations
vl run --max-iterations 3 "fix the flaky test"
# Docker sandbox for bash commands
vl run --sandbox "install dependencies and run tests"
Python API
import asyncio
from verifyloop import AgentPipeline, PipelineConfig
async def main():
config = PipelineConfig(
model="gpt-4o",
verify_model="reason-critic-7b",
max_iterations=5,
confidence_threshold=0.8,
)
pipeline = AgentPipeline(config)
# Stream events
async def on_event(event, data):
print(f"[{event}] {data}")
pipeline.on_event(on_event)
result = await pipeline.run(
task="Add a hello() function to app.py",
context="Python project with a Flask web app",
)
print(f"Status: {result.status}")
print(f"Steps: {len(result.steps)}")
print(f"Duration: {result.duration_seconds:.2f}s")
asyncio.run(main())
Individual Components
from verifyloop import PlanGenerator, Executor, Verifier, VerifierConfig, Recoverer
# Use components individually
planner = PlanGenerator(model="gpt-4o")
plan = await planner.generate_plan("Fix the login bug in auth.py")
executor = Executor(working_dir=".")
step = await executor.bash("pytest tests/")
verifier = Verifier(VerifierConfig(verify_model="reason-critic-7b"))
result = await verifier.verify_file_state("auth.py", expected_content="def login()")
recoverer = Recoverer(model="gpt-4o")
recovery = await recoverer.recover("FileNotFoundError: auth.py not found")
API Reference
PipelineConfig
| Field | Type | Default | Description |
|---|---|---|---|
model |
str |
"gpt-4o" |
LLM model for planning/recovery |
verify_model |
str |
"reason-critic-7b" |
Trained verification model |
max_iterations |
int |
5 |
Max Plan→Execute→Verify loops |
confidence_threshold |
float |
0.8 |
Minimum confidence to accept result |
max_recovery_attempts |
int |
3 |
Max recovery attempts per iteration |
working_dir |
str |
"." |
Working directory for file ops |
dry_run |
bool |
False |
Plan only, don't execute |
interactive |
bool |
False |
Confirm each step before execution |
sandbox |
bool |
False |
Run bash in Docker container |
sandbox_image |
str |
"python:3.11-slim" |
Docker image for sandbox |
AgentPipeline
pipeline = AgentPipeline(config)
# Run a task
result: AgentRun = await pipeline.run(task, context, max_iterations)
# Register event callbacks
pipeline.on_event(callback) # async def callback(event: str, data: dict)
# Access token usage
print(pipeline.token_usage)
AgentRun
| Field | Type | Description |
|---|---|---|
task |
str |
Original task description |
steps |
list[Step] |
All plan/execute/verify/recover steps |
status |
RunStatus |
pending / planning / executing / verifying / recovering / completed / failed |
token_usage |
TokenUsage |
Prompt + completion token counts |
duration_seconds |
float |
Total wall-clock time |
iteration |
int |
Which iteration completed |
metadata |
dict |
Additional metadata |
Executor
executor = Executor(working_dir=".", sandbox=False)
# Tools
result = await executor.bash("ls -la")
result = await executor.read("app.py")
result = await executor.write("new_file.py", content)
result = await executor.edit("app.py", old_content, new_content)
result = await executor.web_search("python requests library")
result = await executor.web_fetch("https://example.com/docs")
# File history and rollback
history = executor.get_file_history("app.py")
executor.rollback_file("app.py")
Verifier
verifier = Verifier(VerifierConfig(
verify_model="reason-critic-7b",
confidence_threshold=0.8,
prefer_trained_model=True,
))
# Verification methods
result = await verifier.verify_code_edits(plan, execute_steps)
result = await verifier.verify_bash_output("pytest", output, expected="passed")
result = await verifier.verify_file_state("app.py", expected_content="def hello")
result = await verifier.verify_tests("pytest tests/", working_dir=".")
Recoverer
recoverer = Recoverer(model="gpt-4o", max_recovery_attempts=3)
# Recovery with pattern matching
recovery = await recoverer.recover(
error="SyntaxError: invalid syntax",
context="File: app.py, Line 42",
attempt=1,
)
# Pattern types: edit, create, retry, simplify, analyze
print(recovery.recovery_type) # "edit"
print(recovery.recovery_attempt) # "Fix syntax error in the file"
print(recovery.exhausted) # False
# Check if retry is worthwhile
should_retry = recoverer.should_retry("TimeoutError", attempt=2) # True
InMemoryStore / FileStore
from verifyloop import InMemoryStore, FileStore
# In-memory (default)
memory = InMemoryStore()
await memory.store("key", {"data": "value"})
result = await memory.retrieve("key")
results = await memory.search("value")
# Persistent file storage
memory = FileStore(base_dir=".verifyloop_memory")
await memory.store("key", {"data": "value"}, namespace="project1")
ConversationContext
from verifyloop.memory import ConversationContext
ctx = ConversationContext()
ctx.add_message("user", "Fix the bug in main.py")
ctx.add_file_context("main.py", "def broken():\n return 1/0")
# Build context string for LLM
context = ctx.build_context_string()
Configuration
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key (for GPT models) |
ANTHROPIC_API_KEY |
Anthropic API key (for Claude models) |
VERIFYLOOP_VERIFY_MODEL |
Override the verification model |
VERIFYLOOP_CONFIDENCE |
Override confidence threshold (0.0-1.0) |
Task File Format
{
"task": "Add authentication to app.py",
"context": "Flask application with a login route",
"model": "gpt-4o",
"verify_model": "reason-critic-7b",
"max_iterations": 3
}
Comparison with Other Agent Frameworks
vs. AutoGPT / BabyAGI
| Aspect | AutoGPT | VerifyLoop |
|---|---|---|
| Planning | Single prompt | Decomposed substeps with tool estimation |
| Verification | None | Trained model with confidence scoring |
| Recovery | Basic retry | Pattern-matched strategies (5 types) |
| Loop control | Infinite loop risk | Bounded iterations + convergence check |
vs. LangChain Agents
| Aspect | LangChain | VerifyLoop |
|---|---|---|
| Verification | LLM-as-judge (same model) | Dedicated trained verification model |
| Structured output | Optional | Enforced via Pydantic models |
| Recovery | Chain retries | Typed recovery with strategy selection |
| Token tracking | Callback-based | Built-in per-phase tracking |
vs. Claude Code / Cursor
| Aspect | Claude Code | VerifyLoop |
|---|---|---|
| Verification | Same model self-review | Dedicated ReasonCritic model |
| Recovery | Re-prompt | Pattern-matched with LLM fallback |
| Programmatic | Limited CLI | Full Python API + CLI |
| Extensibility | Plugin system | Tool interface + plugin system |
Verification Model: ReasonCritic
The key differentiator. VerifyLoop uses ReasonCritic, a trained model specifically for verification:
- Not a prompt — It's a model fine-tuned on verification tasks (code review, test analysis, output comparison)
- Falls back gracefully — If ReasonCritic is unavailable, falls back to a general LLM with structured verification prompts
- Confidence scoring — Numeric 0-1 confidence score, not binary pass/fail
- Actionable failures — Every failure comes with fix suggestions, not just "it broke"
License
MIT
Ecosystem
Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:
| Project | Description |
|---|---|
| Anvil | Self-verified coding agent |
| VerifyLoop | Plan→Execute→Verify→Recover framework |
| ErrorRecovery | Self-healing middleware (3,725 error patterns) |
| FableForge-14B | The fine-tuned 14B model (4-stage training) |
| ShellWhisperer | 1.5B edge agent (phone/RPi, 50ms) |
| ReasonCritic | Verification model (130 benchmark tasks) |
| TraceCompiler | Compile traces → LoRA skills |
| AgentRuntime | Persistent agent daemon (systemd for AI) |
| AgentSwarm | Multi-agent from real trace transitions |
| AgentTelemetry | Datadog for agents (token tracking, costs) |
| BenchAgent | HumanEval for tool-use (107 tasks) |
| AgentDev | VSCode extension with verification |
| TraceViz | Trace replay visualizer (Next.js) |
| AgentSkills | npm for agent behaviors |
| AgentCurriculum | 5-stage progressive training |
| AgentFuzzer | Adversarial testing for agents |
| AgentConstitution | Safety guardrails from traces |
| CostOptimizer | Token cost reduction (50-80%) |
| AgentProfiler | Behavioral fingerprinting |
| TrajectoryDistiller | Trace→training data pipeline |
| Fable5-Dataset | HuggingFace dataset release |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file verifyloop-0.1.0.tar.gz.
File metadata
- Download URL: verifyloop-0.1.0.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1feb422a6834ee46dffc138020a76579e1dd9ee4784034541bcc7d3b79e18455
|
|
| MD5 |
fc67d0fb58db6feeb74d030073323fe0
|
|
| BLAKE2b-256 |
f52ea70b5ded3f750e1a36c2059d0ff712eec38a871a78fce8e675f6d0a51fc6
|
File details
Details for the file verifyloop-0.1.0-py3-none-any.whl.
File metadata
- Download URL: verifyloop-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3314f971935bd9637ab41708a8e0642f41c40dfad31a74f89b9b61abf6a32ae8
|
|
| MD5 |
fa809cb0105444f03362017ce4b1719f
|
|
| BLAKE2b-256 |
231340a713ecf2c23e68307dd24b5ffe8eb522c71b3f31a0660eea1594458183
|