Skip to main content

The open-source, self-verified coding agent. Generate → Execute → Verify → Recover.

Project description

Anvil

Anvil — The Self-Verified Coding Agent

License: MIT Python 3.10+ Tests

Generate → Execute → Verify → Recover

Every other open agent generates and hopes. Anvil generates, runs, checks, and fixes — because it was trained on 210,000 examples of real agents doing exactly that.

This isn't prompt engineering. This is behavior engineering.


Why Anvil?

Other Agents Anvil
Generate code and hope it works Generate code, then verify it works
No error recovery Self-healing with 3 retry attempts
One-shot output Iterative Plan→Execute→Verify→Recover loop
No cost awareness Token tracking + model routing for cost optimization
Black box Full session tracking, verify reports, telemetry
Requires expensive API Runs fully local with ShellWhisperer (1.5B)

The Verification Loop

   ┌──────┐     ┌──────┐     ┌──────┐     ┌──────┐
   │ PLAN │────▶│ EXEC │────▶│VERIFY│────▶│ DONE │
   └──────┘     └──────┘     └──┬───┘     └──────┘
                                 │ Fail
                                 ▼
                            ┌──────┐
                            │RECOVR│────▶ back to EXEC
                            └──────┘

Anvil doesn't just write code. It verifies every change:

  1. Syntax check — Does the code parse?
  2. Test run — Do the tests pass?
  3. Lint check — Is the code clean?
  4. Import check — Are dependencies valid?

If verification fails, Anvil diagnoses the error, generates a fix, and re-verifies. Up to 3 retry cycles. This isn't optional — it's the core loop.

Quick Start

pip install anvil-agent

# Run with local model (ollama)
anvil run "Add error handling to main.py"

# Run with API model
anvil run -m gpt-4o "Refactor the auth module"

# Interactive chat with verification
anvil chat

# Verify existing code
anvil verify src/

# Start as persistent daemon
anvil daemon --port 8765

# List past sessions
anvil sessions

The Name

Anvil — where code gets forged, hammered, and tested until it holds.

Every blacksmith knows: you don't just shape metal on the anvil. You test it. You strike it, check it, and if it's not right, you heat it again and hammer it until it is. That's what this agent does with code.

Other agents shape and ship. Anvil shapes, verifies, and only then ships.

Architecture

anvil/
├── core/
│   ├── engine.py          # Plan→Execute→Verify→Recover loop
│   ├── config.py          # 7-layer configuration system
│   └── session.py          # Full session tracking + persistence
├── tools/
│   └── executor.py         # Bash, Read, Write, Edit, Grep, Glob, LS
├── verify/
│   └── pipeline.py         # Syntax, test, lint, import verification
├── models/
│   └── registry.py          # Local (ollama), OpenAI, Anthropic + cost tracking
├── integrations/
│   ├── verifyloop.py        # VerifyLoop framework integration
│   ├── error_recovery.py   # ErrorRecovery engine integration
│   ├── agent_swarm.py      # AgentSwarm coordination integration
│   └── cost_optimizer.py   # CostOptimizer routing integration
├── daemon/
│   └── server.py            # Persistent HTTP daemon mode
├── tui/
│   └── dashboard.py         # Rich terminal dashboard
└── cli.py                  # run, chat, verify, daemon, sessions, models

The FableForge Ecosystem

Anvil is the flagship product of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project What It Does
Anvil Self-verified coding agent (this one)
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error examples)
FableForge-14B The fine-tuned model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills.org npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Configuration

Create .anvil.json in your project root:

{
  "model": {
    "model": "local",
    "temperature": 0.2,
    "max_tokens": 4096
  },
  "verify": {
    "enabled": true,
    "auto_recover": true,
    "max_retries": 3,
    "check_syntax": true,
    "check_tests": true,
    "check_lint": true
  },
  "tools": {
    "allow_shell": true,
    "sandbox": false
  },
  "safety": {
    "constitution_enabled": true,
    "blocked_commands": ["rm -rf /", "mkfs"],
    "require_confirmation_for": ["git push", "DROP TABLE"]
  },
  "cost": {
    "max_cost_per_session_usd": 5.0,
    "route_by_complexity": true,
    "simple_model": "local",
    "complex_model": "gpt-4o"
  }
}

Daemon Mode

Run Anvil as a persistent server:

anvil daemon --port 8765
curl -X POST http://localhost:8765/run \
  -H "Content-Type: application/json" \
  -d '{"task": "Add input validation to all API endpoints"}'

Model Backends

Model Type Input $/1M Output $/1M
local (fableforge-14b) Local Free Free
gpt-4o API $2.50 $10.00
gpt-4o-mini API $0.15 $0.60
o3-mini API $1.10 $4.40
claude-3.5-sonnet API $3.00 $15.00
claude-3.5-haiku API $0.80 $4.00

How It's Different

Trained on Real Behavior

The FableForge model was trained on 210K examples from real agent traces:

  • 87.7% planning rate — agents plan before they act
  • 39.5% error recovery rate — agents that hit errors and recover
  • 1,311-step trace — the Boeing 747 trace proves agents need persistent runtime
  • 31 tools mapped — transition matrices drive swarm coordination

Verification Is Not Optional

Other agents: "Here's the code, hope it works."

Anvil: "Here's the code. I ran it. Tests pass. Lint is clean. Imports resolve. Here's the proof."

Self-Healing

When verification fails, Anvil doesn't just report the error. It reads the error, generates a fix, applies it, and re-verifies. This is the ErrorRecovery engine with 3,725 real error examples baked in.

Ecosystem Integration

Anvil doesn't work alone. It's wired into the full FableForge stack:

  • VerifyLoop → Sophisticated multi-step verification
  • ErrorRecovery → Pattern-matched error resolution from real traces
  • AgentSwarm → Multi-agent coordination via transition matrices
  • CostOptimizer → Automatic model routing based on task complexity
  • AgentConstitution → Safety guardrails from analysis of real traces

License

MIT

Built With

  • 210,000+ real agent traces from the Fable-5 dataset collection
  • 87.7% planning rate behavioral signal
  • 39.5% error recovery success rate
  • 303 tool calls in a single session (Boeing 747 trace)
  • 5 specialized micro-models (ShellWhisperer, ReasonCritic, etc.)

Anvil: Forge your code. Verify it holds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fableforge_anvil_agent-0.2.1.tar.gz (125.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fableforge_anvil_agent-0.2.1-py3-none-any.whl (112.6 kB view details)

Uploaded Python 3

File details

Details for the file fableforge_anvil_agent-0.2.1.tar.gz.

File metadata

  • Download URL: fableforge_anvil_agent-0.2.1.tar.gz
  • Upload date:
  • Size: 125.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fableforge_anvil_agent-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e33cf66b1840d473cb82da3042f749a9470b507e62aafd5d984cb8051f0c3fe3
MD5 e5a3c96fd105754ebc78eac9820621ea
BLAKE2b-256 8f3244c3e0ee5a56dc17cb6879aa7faedafe1cc35a28c4c7e60d5f7cc73e37bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.2.1.tar.gz:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fableforge_anvil_agent-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fableforge_anvil_agent-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec7361cecd100fba1e00a31574c3c8a54b28d11114af6759a9710b937d1dd05e
MD5 f5758fe2271ca3545f6181d5ad90cf96
BLAKE2b-256 13923d24d2eadb02bf652ff1a4187de43212ad861033c9d1e741e388484ec37a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_anvil_agent-0.2.1-py3-none-any.whl:

Publisher: release.yml on KingLabsA/anvil

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page