Skip to main content

Self-improving agents with closed-loop learning — agents that learn to get it right

Project description

CannyForge

Reliability memory for tool-using LLM agents.

CannyForge watches your agent make mistakes, learns corrections, and injects them as SystemMessages before each LLM call. Your agent gets better over time — no retraining required.

Agent makes errors → CannyForge learns corrections → Agent stops repeating them

CannyForge demo: 80% → 100% on real LLM

Quick Start (LangGraph)

from cannyforge import CannyForge
from cannyforge.adapters.langgraph import CannyForgeMiddleware
from langgraph.prebuilt import create_react_agent

forge = CannyForge()
middleware = CannyForgeMiddleware(forge)
agent = create_react_agent(model, tools,
    pre_model_hook=middleware.before_model,
    post_model_hook=middleware.after_model)

# Just run tasks. CannyForge records errors via after_model.
# After learning, before_model injects corrections as SystemMessages.

v0.3 Direction

The benchmark branch is focused on three adoption-oriented improvements:

  • better reliability loops for tool-using agents
  • broader benchmark coverage for arguments, multi-step execution, and recovery
  • portable learned skills for assistant and workflow reuse

This is aimed at modern agent stacks, including LangGraph, LangChain-style tool agents, CrewAI-style orchestration, MCP-connected assistants, and personal assistant workflows.

See docs/v0.3-public.md for the public summary.

How It Works

  1. Record errorsafter_model detects tool failures and records them
  2. Learn correctionsrun_learning_cycle() clusters errors and generates specific correction text (template or LLM-generated)
  3. Inject correctionsbefore_model prepends a SystemMessage with all active corrections before each LLM call
  4. Track effectiveness — corrections that prevent recurrence are kept; ineffective ones can be regenerated

The correction is specific and actionable:

[CANNYFORGE] Learned rules for this request:
- When the task involves report, summary, sales, use `generate_report`, NOT `get_data`.
  Example: "Create a summary of Q4 sales performance"

Demo: 60% → 100% on Real LLM

pip install langgraph langchain-openai
# Set LLM_API_KEY in .env
python scenarios/demo_cannyforge.py

This runs 15 ambiguous tool-selection tasks twice:

  • Phase 1: baseline without corrections — records errors
  • Learning: generates corrections from observed errors
  • Phase 2: same tasks with correction injection — accuracy improves

Real output with DeepSeek:

Phase 1 accuracy: 9/15 (60%)
Phase 2 accuracy: 15/15 (100%)
Tasks fixed:
  - Restart the staging server -> execute_action
  - Send an alert to the on-call team -> execute_action
  - Deploy the latest build to production -> execute_action
  - Create a summary of Q4 sales performance -> generate_report
  - Write up a status report for this sprint -> generate_report
  - Generate a monthly uptime report -> generate_report

No simulated errors. No hand-crafted rules. Real LLM decisions, real corrections from the pipeline.

Install

pip install cannyforge           # from PyPI

Or from source:

git clone https://github.com/cannyforge/cannyforge.git
cd cannyforge
pip install -e .

Scenarios

Script Purpose
scenarios/demo_cannyforge.py Canonical demo — full pipeline: baseline → learn → improve
scenarios/demo_langgraph_tool_use.py Minimal quickstart — 3-line integration
scenarios/demo.py Animated terminal demo (internal skill system)

Older demo scripts are in scenarios/archive/ for reference.

Framework Coverage

Surface Current repo path
LangGraph middleware cannyforge/adapters/langgraph.py
LangChain adapter cannyforge/adapters/langchain.py
CrewAI adapter cannyforge/adapters/crewai.py
MCP server cannyforge/mcp_server.py
Bundled assistant skills cannyforge/bundled_skills/

CannyForge is designed to sit on top of existing agent frameworks rather than replace them. The public focus for v0.3 is improved tool routing, argument quality, multi-step execution, and reusable skills.

Core Architecture

Corrections Pipeline (LangGraph integration)

cannyforge/corrections.py    — Correction dataclass + CorrectionGenerator
cannyforge/adapters/langgraph.py — CannyForgeMiddleware (pre/post model hooks)
cannyforge/knowledge.py      — KnowledgeBase stores corrections + rules
cannyforge/learning.py       — PatternDetector + LearningEngine
cannyforge/core.py           — CannyForge orchestrator

CorrectionGenerator turns error clusters into actionable text:

  • Template mode (no LLM): groups by (wrong_tool, right_tool), extracts keywords, formats guidance
  • LLM mode: sends error cluster to LLM asking for a generalized rule covering unseen tasks

CannyForgeMiddleware hooks into LangGraph's create_react_agent:

  • before_model: injects always-on corrections + conditional rules as a SystemMessage
  • after_model: records tool failures, tracks correction effectiveness

Internal Skill System

CannyForge also includes a declarative skill system for standalone use (without LangGraph):

  • Skills defined via SKILL.md files (AgentSkills.io spec)
  • Three-tier execution: custom handler → LLM multi-step → template fallback
  • PATTERN_LIBRARY with condition-based rules for internal context signals
  • Rule lifecycle: ACTIVE → PROBATION → DORMANT → resurrection
from cannyforge import CannyForge
forge = CannyForge()
result = forge.execute("Write an email about the 3 PM meeting")

See scenarios/demo.py for the animated terminal demo of this path.

How Learning Works

1. Error Recording

# Via middleware (automatic):
agent = create_react_agent(llm, tools, post_model_hook=middleware.after_model)

# Or manual:
forge.learning_engine.record_error(
    skill_name="tool_use",
    task_description="Create a Q4 summary",
    error_type="WrongToolError",
    error_message="Called get_data instead of generate_report",
    context_snapshot={...},
)

2. Learning Cycle

metrics = forge.run_learning_cycle(min_frequency=2, min_confidence=0.3)
# Produces:
#   - Condition-based rules (for internal skill system)
#   - Corrections (for LangGraph injection)

3. Correction Injection

corrections = forge.knowledge_base.get_corrections("tool_use")
# [Correction(content="When task involves report, summary... use generate_report, NOT get_data")]

# Automatically injected by middleware.before_model() as a SystemMessage

Run Tests

pytest tests/ -v

Project Structure

cannyforge/
├── cannyforge/
│   ├── core.py              # CannyForge orchestrator
│   ├── corrections.py       # Correction + CorrectionGenerator
│   ├── knowledge.py         # KnowledgeBase, Rules, Conditions, Actions
│   ├── learning.py          # PatternDetector, LearningEngine
│   ├── skills.py            # Declarative skill system
│   ├── llm.py               # LLM providers (Claude, OpenAI, DeepSeek)
│   ├── tools.py             # Tool definitions and execution
│   ├── storage.py           # Storage backends (JSON, SQLite)
│   └── adapters/
│       └── langgraph.py     # LangGraph middleware (pre/post model hooks)
│
├── scenarios/
│   ├── demo_cannyforge.py   # Canonical demo (corrections pipeline)
│   ├── demo_langgraph_tool_use.py  # Minimal quickstart
│   └── demo.py              # Animated demo (internal skill system)
│
├── tests/                   # Test suite
└── skills/                  # Built-in skill definitions (SKILL.md)

Further Reading

License

Licensed under BSL 1.1. Free to use in production, but you may not offer CannyForge as a competing hosted service. Converts to Apache 2.0 on 2030-03-01. See LICENSE for full terms.

For commercial licensing inquiries: cannyforge@gmail.com


CannyForge — Your agent makes fewer repeated mistakes over time, with measurable evidence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cannyforge-0.3.0.tar.gz (114.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cannyforge-0.3.0-py3-none-any.whl (98.0 kB view details)

Uploaded Python 3

File details

Details for the file cannyforge-0.3.0.tar.gz.

File metadata

  • Download URL: cannyforge-0.3.0.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ee70324b78f6b53a806e22a45002ec4396362e3b4592520476c29ed4227040ad
MD5 aa69749a1522ae960ad86c4abdfb1467
BLAKE2b-256 342ba019297db37ca24115fbe1f989381501771d6706c2ccbfb0ae138b767cb4

See more details on using hashes here.

File details

Details for the file cannyforge-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cannyforge-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 98.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67a95a95eaf3c77ed6005a0e9a8cbc18f3fc9967308b42551888b926c8d96689
MD5 587aeb6a1809965b66d32be436b4f3c6
BLAKE2b-256 c8268111a312e4820d02754d25734208b411b3e883bdcf6368cf765849d371c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page