Skip to main content

The open-source, self-verified coding agent. Generate → Execute → Verify → Recover.

Project description

Anvil

Anvil — The Self-Verified Coding Agent

License: MIT Python 3.10+ Tests

Generate → Execute → Verify → Recover

Every other open agent generates and hopes. Anvil generates, runs, checks, and fixes — because it was trained on 210,000 examples of real agents doing exactly that.

This isn't prompt engineering. This is behavior engineering.


Why Anvil?

Other Agents Anvil
Generate code and hope it works Generate code, then verify it works
No error recovery Self-healing with 3 retry attempts
One-shot output Iterative Plan→Execute→Verify→Recover loop
No cost awareness Token tracking + model routing for cost optimization
Black box Full session tracking, verify reports, telemetry
Requires expensive API Runs fully local with ShellWhisperer (1.5B)

The Verification Loop

   ┌──────┐     ┌──────┐     ┌──────┐     ┌──────┐
   │ PLAN │────▶│ EXEC │────▶│VERIFY│────▶│ DONE │
   └──────┘     └──────┘     └──┬───┘     └──────┘
                                 │ Fail
                                 ▼
                            ┌──────┐
                            │RECOVR│────▶ back to EXEC
                            └──────┘

Anvil doesn't just write code. It verifies every change:

  1. Syntax check — Does the code parse?
  2. Test run — Do the tests pass?
  3. Lint check — Is the code clean?
  4. Import check — Are dependencies valid?

If verification fails, Anvil diagnoses the error, generates a fix, and re-verifies. Up to 3 retry cycles. This isn't optional — it's the core loop.

Quick Start

pip install anvil-agent

# Run with local model (ollama)
anvil run "Add error handling to main.py"

# Run with API model
anvil run -m gpt-4o "Refactor the auth module"

# Interactive chat with verification
anvil chat

# Verify existing code
anvil verify src/

# Start as persistent daemon
anvil daemon --port 8765

# List past sessions
anvil sessions

The Name

Anvil — where code gets forged, hammered, and tested until it holds.

Every blacksmith knows: you don't just shape metal on the anvil. You test it. You strike it, check it, and if it's not right, you heat it again and hammer it until it is. That's what this agent does with code.

Other agents shape and ship. Anvil shapes, verifies, and only then ships.

Architecture

anvil/
├── core/
│   ├── engine.py          # Plan→Execute→Verify→Recover loop
│   ├── config.py          # 7-layer configuration system
│   └── session.py          # Full session tracking + persistence
├── tools/
│   └── executor.py         # Bash, Read, Write, Edit, Grep, Glob, LS
├── verify/
│   └── pipeline.py         # Syntax, test, lint, import verification
├── models/
│   └── registry.py          # Local (ollama), OpenAI, Anthropic + cost tracking
├── integrations/
│   ├── verifyloop.py        # VerifyLoop framework integration
│   ├── error_recovery.py   # ErrorRecovery engine integration
│   ├── agent_swarm.py      # AgentSwarm coordination integration
│   └── cost_optimizer.py   # CostOptimizer routing integration
├── daemon/
│   └── server.py            # Persistent HTTP daemon mode
├── tui/
│   └── dashboard.py         # Rich terminal dashboard
└── cli.py                  # run, chat, verify, daemon, sessions, models

The FableForge Ecosystem

Anvil is the flagship product of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project What It Does
Anvil Self-verified coding agent (this one)
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error examples)
FableForge-14B The fine-tuned model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills.org npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Configuration

Create .anvil.json in your project root:

{
  "model": {
    "model": "local",
    "temperature": 0.2,
    "max_tokens": 4096
  },
  "verify": {
    "enabled": true,
    "auto_recover": true,
    "max_retries": 3,
    "check_syntax": true,
    "check_tests": true,
    "check_lint": true
  },
  "tools": {
    "allow_shell": true,
    "sandbox": false
  },
  "safety": {
    "constitution_enabled": true,
    "blocked_commands": ["rm -rf /", "mkfs"],
    "require_confirmation_for": ["git push", "DROP TABLE"]
  },
  "cost": {
    "max_cost_per_session_usd": 5.0,
    "route_by_complexity": true,
    "simple_model": "local",
    "complex_model": "gpt-4o"
  }
}

Daemon Mode

Run Anvil as a persistent server:

anvil daemon --port 8765
curl -X POST http://localhost:8765/run \
  -H "Content-Type: application/json" \
  -d '{"task": "Add input validation to all API endpoints"}'

Model Backends

Model Type Input $/1M Output $/1M
local (fableforge-14b) Local Free Free
gpt-4o API $2.50 $10.00
gpt-4o-mini API $0.15 $0.60
o3-mini API $1.10 $4.40
claude-3.5-sonnet API $3.00 $15.00
claude-3.5-haiku API $0.80 $4.00

How It's Different

Trained on Real Behavior

The FableForge model was trained on 210K examples from real agent traces:

  • 87.7% planning rate — agents plan before they act
  • 39.5% error recovery rate — agents that hit errors and recover
  • 1,311-step trace — the Boeing 747 trace proves agents need persistent runtime
  • 31 tools mapped — transition matrices drive swarm coordination

Verification Is Not Optional

Other agents: "Here's the code, hope it works."

Anvil: "Here's the code. I ran it. Tests pass. Lint is clean. Imports resolve. Here's the proof."

Self-Healing

When verification fails, Anvil doesn't just report the error. It reads the error, generates a fix, applies it, and re-verifies. This is the ErrorRecovery engine with 3,725 real error examples baked in.

Ecosystem Integration

Anvil doesn't work alone. It's wired into the full FableForge stack:

  • VerifyLoop → Sophisticated multi-step verification
  • ErrorRecovery → Pattern-matched error resolution from real traces
  • AgentSwarm → Multi-agent coordination via transition matrices
  • CostOptimizer → Automatic model routing based on task complexity
  • AgentConstitution → Safety guardrails from analysis of real traces

License

MIT

Built With

  • 210,000+ real agent traces from the Fable-5 dataset collection
  • 87.7% planning rate behavioral signal
  • 39.5% error recovery success rate
  • 303 tool calls in a single session (Boeing 747 trace)
  • 5 specialized micro-models (ShellWhisperer, ReasonCritic, etc.)

Anvil: Forge your code. Verify it holds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fableforge_anvil_agent-0.3.0.tar.gz (135.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fableforge_anvil_agent-0.3.0-py3-none-any.whl (121.1 kB view details)

Uploaded Python 3

File details

Details for the file fableforge_anvil_agent-0.3.0.tar.gz.

File metadata

  • Download URL: fableforge_anvil_agent-0.3.0.tar.gz
  • Upload date:
  • Size: 135.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fableforge_anvil_agent-0.3.0.tar.gz
Algorithm Hash digest
SHA256 958779c1da102bb05810ee24cf8cff1fef87f44d2fac155298946c62daf9d169
MD5 af5415d62f56b5c64743183f3c2323c5
BLAKE2b-256 9b5422f8bda9af4a6c0f8359e9e1c6e0347bc9a4aee49a62c815991bc9213566

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.3.0.tar.gz:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fableforge_anvil_agent-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fableforge_anvil_agent-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f896bd4275a7f35acad0c714e729ae92c512d4da07c79d4d5071ffccbdf0d3e
MD5 b509f729ea2212834132c387195e1440
BLAKE2b-256 449ce38aa69b2608ace1979420e12b6e590db69601180096ef310a625cd36a3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.3.0-py3-none-any.whl:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page