Skip to main content

The open-source, self-verified coding agent. Generate → Execute → Verify → Recover.

Project description

Anvil

Anvil — The Self-Verified Coding Agent

License: MIT Python 3.10+ Tests

Generate → Execute → Verify → Recover

Every other open agent generates and hopes. Anvil generates, runs, checks, and fixes — because it was trained on 210,000 examples of real agents doing exactly that.

This isn't prompt engineering. This is behavior engineering.


Why Anvil?

Other Agents Anvil
Generate code and hope it works Generate code, then verify it works
No error recovery Self-healing with 3 retry attempts
One-shot output Iterative Plan→Execute→Verify→Recover loop
No cost awareness Token tracking + model routing for cost optimization
Black box Full session tracking, verify reports, telemetry
Requires expensive API Runs fully local with ShellWhisperer (1.5B)

The Verification Loop

   ┌──────┐     ┌──────┐     ┌──────┐     ┌──────┐
   │ PLAN │────▶│ EXEC │────▶│VERIFY│────▶│ DONE │
   └──────┘     └──────┘     └──┬───┘     └──────┘
                                 │ Fail
                                 ▼
                            ┌──────┐
                            │RECOVR│────▶ back to EXEC
                            └──────┘

Anvil doesn't just write code. It verifies every change:

  1. Syntax check — Does the code parse?
  2. Test run — Do the tests pass?
  3. Lint check — Is the code clean?
  4. Import check — Are dependencies valid?

If verification fails, Anvil diagnoses the error, generates a fix, and re-verifies. Up to 3 retry cycles. This isn't optional — it's the core loop.

Quick Start

pip install anvil-agent

# Run with local model (ollama)
anvil run "Add error handling to main.py"

# Run with API model
anvil run -m gpt-4o "Refactor the auth module"

# Interactive chat with verification
anvil chat

# Verify existing code
anvil verify src/

# Start as persistent daemon
anvil daemon --port 8765

# List past sessions
anvil sessions

The Name

Anvil — where code gets forged, hammered, and tested until it holds.

Every blacksmith knows: you don't just shape metal on the anvil. You test it. You strike it, check it, and if it's not right, you heat it again and hammer it until it is. That's what this agent does with code.

Other agents shape and ship. Anvil shapes, verifies, and only then ships.

Architecture

anvil/
├── core/
│   ├── engine.py          # Plan→Execute→Verify→Recover loop
│   ├── config.py          # 7-layer configuration system
│   └── session.py          # Full session tracking + persistence
├── tools/
│   └── executor.py         # Bash, Read, Write, Edit, Grep, Glob, LS
├── verify/
│   └── pipeline.py         # Syntax, test, lint, import verification
├── models/
│   └── registry.py          # Local (ollama), OpenAI, Anthropic + cost tracking
├── integrations/
│   ├── verifyloop.py        # VerifyLoop framework integration
│   ├── error_recovery.py   # ErrorRecovery engine integration
│   ├── agent_swarm.py      # AgentSwarm coordination integration
│   └── cost_optimizer.py   # CostOptimizer routing integration
├── daemon/
│   └── server.py            # Persistent HTTP daemon mode
├── tui/
│   └── dashboard.py         # Rich terminal dashboard
└── cli.py                  # run, chat, verify, daemon, sessions, models

The FableForge Ecosystem

Anvil is the flagship product of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project What It Does
Anvil Self-verified coding agent (this one)
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error examples)
FableForge-14B The fine-tuned model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills.org npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Configuration

Create .anvil.json in your project root:

{
  "model": {
    "model": "local",
    "temperature": 0.2,
    "max_tokens": 4096
  },
  "verify": {
    "enabled": true,
    "auto_recover": true,
    "max_retries": 3,
    "check_syntax": true,
    "check_tests": true,
    "check_lint": true
  },
  "tools": {
    "allow_shell": true,
    "sandbox": false
  },
  "safety": {
    "constitution_enabled": true,
    "blocked_commands": ["rm -rf /", "mkfs"],
    "require_confirmation_for": ["git push", "DROP TABLE"]
  },
  "cost": {
    "max_cost_per_session_usd": 5.0,
    "route_by_complexity": true,
    "simple_model": "local",
    "complex_model": "gpt-4o"
  }
}

Daemon Mode

Run Anvil as a persistent server:

anvil daemon --port 8765
curl -X POST http://localhost:8765/run \
  -H "Content-Type: application/json" \
  -d '{"task": "Add input validation to all API endpoints"}'

Model Backends

Model Type Input $/1M Output $/1M
local (fableforge-14b) Local Free Free
gpt-4o API $2.50 $10.00
gpt-4o-mini API $0.15 $0.60
o3-mini API $1.10 $4.40
claude-3.5-sonnet API $3.00 $15.00
claude-3.5-haiku API $0.80 $4.00

How It's Different

Trained on Real Behavior

The FableForge model was trained on 210K examples from real agent traces:

  • 87.7% planning rate — agents plan before they act
  • 39.5% error recovery rate — agents that hit errors and recover
  • 1,311-step trace — the Boeing 747 trace proves agents need persistent runtime
  • 31 tools mapped — transition matrices drive swarm coordination

Verification Is Not Optional

Other agents: "Here's the code, hope it works."

Anvil: "Here's the code. I ran it. Tests pass. Lint is clean. Imports resolve. Here's the proof."

Self-Healing

When verification fails, Anvil doesn't just report the error. It reads the error, generates a fix, applies it, and re-verifies. This is the ErrorRecovery engine with 3,725 real error examples baked in.

Ecosystem Integration

Anvil doesn't work alone. It's wired into the full FableForge stack:

  • VerifyLoop → Sophisticated multi-step verification
  • ErrorRecovery → Pattern-matched error resolution from real traces
  • AgentSwarm → Multi-agent coordination via transition matrices
  • CostOptimizer → Automatic model routing based on task complexity
  • AgentConstitution → Safety guardrails from analysis of real traces

License

MIT

Built With

  • 210,000+ real agent traces from the Fable-5 dataset collection
  • 87.7% planning rate behavioral signal
  • 39.5% error recovery success rate
  • 303 tool calls in a single session (Boeing 747 trace)
  • 5 specialized micro-models (ShellWhisperer, ReasonCritic, etc.)

Anvil: Forge your code. Verify it holds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fableforge_anvil_agent-0.1.0.tar.gz (124.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fableforge_anvil_agent-0.1.0-py3-none-any.whl (112.4 kB view details)

Uploaded Python 3

File details

Details for the file fableforge_anvil_agent-0.1.0.tar.gz.

File metadata

  • Download URL: fableforge_anvil_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 124.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fableforge_anvil_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3355184c4400b9355fe31cfbe3929c24c305c45a608bce967448d88eb4567766
MD5 aefb35b6ee7cdabfb7b1bf65814fbe5a
BLAKE2b-256 903e2f23930d84cb8b690e74c1df4bb36fd328fccae5274a55ad179b0e1b49cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.1.0.tar.gz:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fableforge_anvil_agent-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fableforge_anvil_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 199e3aa7ceddfa5ed1f1d3cb3fadaf6c6bda68364e2d7e515cc1b67d303d7340
MD5 5458dfa95c2b7fc6eafe9e25fdf15afc
BLAKE2b-256 9c6138d21fdd24087781b0d6e5fd9561202353d3bc041d762dc192f80e1cd014

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.1.0-py3-none-any.whl:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page