Skip to main content

Self-improving agents with closed-loop learning — agents that learn to get it right

Project description

CannyForge

Reliability memory for tool-using LLM agents.

CannyForge watches your agent make mistakes, learns corrections, and injects them as SystemMessages before each LLM call. Your agent gets better over time — no retraining required.

Agent makes errors → CannyForge learns corrections → Agent stops repeating them

Quick Start (LangGraph)

from cannyforge import CannyForge
from cannyforge.adapters.langgraph import CannyForgeMiddleware
from langgraph.prebuilt import create_react_agent

forge = CannyForge()
middleware = CannyForgeMiddleware(forge)
agent = create_react_agent(model, tools,
    pre_model_hook=middleware.before_model,
    post_model_hook=middleware.after_model)

# Just run tasks. CannyForge records errors via after_model.
# After learning, before_model injects corrections as SystemMessages.

How It Works

  1. Record errorsafter_model detects tool failures and records them
  2. Learn correctionsrun_learning_cycle() clusters errors and generates specific correction text (template or LLM-generated)
  3. Inject correctionsbefore_model prepends a SystemMessage with all active corrections before each LLM call
  4. Track effectiveness — corrections that prevent recurrence are kept; ineffective ones can be regenerated

The correction is specific and actionable:

[CANNYFORGE] Learned rules for this request:
- When the task involves report, summary, sales, use `generate_report`, NOT `get_data`.
  Example: "Create a summary of Q4 sales performance"

Demo: 60% → 100% on Real LLM

pip install langgraph langchain-openai
# Set LLM_API_KEY in .env
python scenarios/demo_cannyforge.py

This runs 15 ambiguous tool-selection tasks twice:

  • Phase 1: baseline without corrections — records errors
  • Learning: generates corrections from observed errors
  • Phase 2: same tasks with correction injection — accuracy improves

Real output with DeepSeek:

Phase 1 accuracy: 9/15 (60%)
Phase 2 accuracy: 15/15 (100%)
Tasks fixed:
  - Restart the staging server -> execute_action
  - Send an alert to the on-call team -> execute_action
  - Deploy the latest build to production -> execute_action
  - Create a summary of Q4 sales performance -> generate_report
  - Write up a status report for this sprint -> generate_report
  - Generate a monthly uptime report -> generate_report

No simulated errors. No hand-crafted rules. Real LLM decisions, real corrections from the pipeline.

Install

pip install cannyforge           # from PyPI

Or from source:

git clone https://github.com/cannyforge/cannyforge.git
cd cannyforge
pip install -e .

Scenarios

Script Purpose
scenarios/demo_cannyforge.py Canonical demo — full pipeline: baseline → learn → improve
scenarios/demo_langgraph_tool_use.py Minimal quickstart — 3-line integration
scenarios/demo.py Animated terminal demo (internal skill system)

Older demo scripts are in scenarios/archive/ for reference.

Core Architecture

Corrections Pipeline (LangGraph integration)

cannyforge/corrections.py    — Correction dataclass + CorrectionGenerator
cannyforge/adapters/langgraph.py — CannyForgeMiddleware (pre/post model hooks)
cannyforge/knowledge.py      — KnowledgeBase stores corrections + rules
cannyforge/learning.py       — PatternDetector + LearningEngine
cannyforge/core.py           — CannyForge orchestrator

CorrectionGenerator turns error clusters into actionable text:

  • Template mode (no LLM): groups by (wrong_tool, right_tool), extracts keywords, formats guidance
  • LLM mode: sends error cluster to LLM asking for a generalized rule covering unseen tasks

CannyForgeMiddleware hooks into LangGraph's create_react_agent:

  • before_model: injects always-on corrections + conditional rules as a SystemMessage
  • after_model: records tool failures, tracks correction effectiveness

Internal Skill System

CannyForge also includes a declarative skill system for standalone use (without LangGraph):

  • Skills defined via SKILL.md files (AgentSkills.io spec)
  • Three-tier execution: custom handler → LLM multi-step → template fallback
  • PATTERN_LIBRARY with condition-based rules for internal context signals
  • Rule lifecycle: ACTIVE → PROBATION → DORMANT → resurrection
from cannyforge import CannyForge
forge = CannyForge()
result = forge.execute("Write an email about the 3 PM meeting")

See scenarios/demo.py for the animated terminal demo of this path.

How Learning Works

1. Error Recording

# Via middleware (automatic):
agent = create_react_agent(llm, tools, post_model_hook=middleware.after_model)

# Or manual:
forge.learning_engine.record_error(
    skill_name="tool_use",
    task_description="Create a Q4 summary",
    error_type="WrongToolError",
    error_message="Called get_data instead of generate_report",
    context_snapshot={...},
)

2. Learning Cycle

metrics = forge.run_learning_cycle(min_frequency=2, min_confidence=0.3)
# Produces:
#   - Condition-based rules (for internal skill system)
#   - Corrections (for LangGraph injection)

3. Correction Injection

corrections = forge.knowledge_base.get_corrections("tool_use")
# [Correction(content="When task involves report, summary... use generate_report, NOT get_data")]

# Automatically injected by middleware.before_model() as a SystemMessage

Run Tests

pytest tests/ -v

Project Structure

cannyforge/
├── cannyforge/
│   ├── core.py              # CannyForge orchestrator
│   ├── corrections.py       # Correction + CorrectionGenerator
│   ├── knowledge.py         # KnowledgeBase, Rules, Conditions, Actions
│   ├── learning.py          # PatternDetector, LearningEngine
│   ├── skills.py            # Declarative skill system
│   ├── llm.py               # LLM providers (Claude, OpenAI, DeepSeek)
│   ├── tools.py             # Tool definitions and execution
│   ├── storage.py           # Storage backends (JSON, SQLite)
│   └── adapters/
│       └── langgraph.py     # LangGraph middleware (pre/post model hooks)
│
├── scenarios/
│   ├── demo_cannyforge.py   # Canonical demo (corrections pipeline)
│   ├── demo_langgraph_tool_use.py  # Minimal quickstart
│   └── demo.py              # Animated demo (internal skill system)
│
├── tests/                   # Test suite
└── skills/                  # Built-in skill definitions (SKILL.md)

Further Reading

License

Licensed under BSL 1.1. Free to use in production, but you may not offer CannyForge as a competing hosted service. Converts to Apache 2.0 on 2030-03-01. See LICENSE for full terms.

For commercial licensing inquiries: cannyforge@gmail.com


CannyForge — Your agent makes fewer repeated mistakes over time, with measurable evidence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cannyforge-0.2.1.tar.gz (101.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cannyforge-0.2.1-py3-none-any.whl (89.8 kB view details)

Uploaded Python 3

File details

Details for the file cannyforge-0.2.1.tar.gz.

File metadata

  • Download URL: cannyforge-0.2.1.tar.gz
  • Upload date:
  • Size: 101.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.2.1.tar.gz
Algorithm Hash digest
SHA256 18ebb1b90c54c4a40f5a1d4c2eba1c576d51925311de43e9be727742c876d2d3
MD5 8c59864abe77e71f64ddf5eacdd6bb3f
BLAKE2b-256 223f7a26e8a9c7aef7f7a93253540c45b62b0f9a48be1126946c894515969863

See more details on using hashes here.

File details

Details for the file cannyforge-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cannyforge-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 89.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 38a83951d97151c6d0ccfe400fec4b726b3bf7d79273830d34943e1130809d63
MD5 c87b0950025b23814427db918e2b726c
BLAKE2b-256 29342c0e543ab50628928a5101fa96e7bf8690df09bcd7bb47457d2f30914f11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page