Self-improving agents with closed-loop learning — agents that learn to get it right
Project description
CannyForge
Reliability memory for tool-using LLM agents.
CannyForge watches your agent make mistakes, learns corrections, and injects them as SystemMessages before each LLM call. Your agent gets better over time — no retraining required.
Agent makes errors → CannyForge learns corrections → Agent stops repeating them
Quick Start (LangGraph)
from cannyforge import CannyForge
from cannyforge.adapters.langgraph import CannyForgeMiddleware
from langgraph.prebuilt import create_react_agent
forge = CannyForge()
middleware = CannyForgeMiddleware(forge)
agent = create_react_agent(model, tools,
pre_model_hook=middleware.before_model,
post_model_hook=middleware.after_model)
# Just run tasks. CannyForge records errors via after_model.
# After learning, before_model injects corrections as SystemMessages.
v0.3 Direction
The benchmark branch is focused on three adoption-oriented improvements:
- better reliability loops for tool-using agents
- broader benchmark coverage for arguments, multi-step execution, and recovery
- portable learned skills for assistant and workflow reuse
This is aimed at modern agent stacks, including LangGraph, LangChain-style tool agents, CrewAI-style orchestration, MCP-connected assistants, and personal assistant workflows.
See docs/v0.3-public.md for the public summary.
How It Works
- Record errors —
after_modeldetects tool failures and records them - Learn corrections —
run_learning_cycle()clusters errors and generates specific correction text (template or LLM-generated) - Inject corrections —
before_modelprepends a SystemMessage with all active corrections before each LLM call - Track effectiveness — corrections that prevent recurrence are kept; ineffective ones can be regenerated
The correction is specific and actionable:
[CANNYFORGE] Learned rules for this request:
- When the task involves report, summary, sales, use `generate_report`, NOT `get_data`.
Example: "Create a summary of Q4 sales performance"
Demo: 60% → 100% on Real LLM
pip install langgraph langchain-openai
# Set LLM_API_KEY in .env
python scenarios/demo_cannyforge.py
This runs 15 ambiguous tool-selection tasks twice:
- Phase 1: baseline without corrections — records errors
- Learning: generates corrections from observed errors
- Phase 2: same tasks with correction injection — accuracy improves
Real output with DeepSeek:
Phase 1 accuracy: 9/15 (60%)
Phase 2 accuracy: 15/15 (100%)
Tasks fixed:
- Restart the staging server -> execute_action
- Send an alert to the on-call team -> execute_action
- Deploy the latest build to production -> execute_action
- Create a summary of Q4 sales performance -> generate_report
- Write up a status report for this sprint -> generate_report
- Generate a monthly uptime report -> generate_report
No simulated errors. No hand-crafted rules. Real LLM decisions, real corrections from the pipeline.
Install
pip install cannyforge # from PyPI
Or from source:
git clone https://github.com/cannyforge/cannyforge.git
cd cannyforge
pip install -e .
Scenarios
| Script | Purpose |
|---|---|
scenarios/demo_cannyforge.py |
Canonical demo — full pipeline: baseline → learn → improve |
scenarios/demo_langgraph_tool_use.py |
Minimal quickstart — 3-line integration |
scenarios/demo.py |
Animated terminal demo (internal skill system) |
Older demo scripts are in scenarios/archive/ for reference.
Framework Coverage
| Surface | Current repo path |
|---|---|
| LangGraph middleware | cannyforge/adapters/langgraph.py |
| LangChain adapter | cannyforge/adapters/langchain.py |
| CrewAI adapter | cannyforge/adapters/crewai.py |
| MCP server | cannyforge/mcp_server.py |
| Bundled assistant skills | cannyforge/bundled_skills/ |
CannyForge is designed to sit on top of existing agent frameworks rather than replace them. The public focus for v0.3 is improved tool routing, argument quality, multi-step execution, and reusable skills.
Core Architecture
Corrections Pipeline (LangGraph integration)
cannyforge/corrections.py — Correction dataclass + CorrectionGenerator
cannyforge/adapters/langgraph.py — CannyForgeMiddleware (pre/post model hooks)
cannyforge/knowledge.py — KnowledgeBase stores corrections + rules
cannyforge/learning.py — PatternDetector + LearningEngine
cannyforge/core.py — CannyForge orchestrator
CorrectionGenerator turns error clusters into actionable text:
- Template mode (no LLM): groups by
(wrong_tool, right_tool), extracts keywords, formats guidance - LLM mode: sends error cluster to LLM asking for a generalized rule covering unseen tasks
CannyForgeMiddleware hooks into LangGraph's create_react_agent:
before_model: injects always-on corrections + conditional rules as a SystemMessageafter_model: records tool failures, tracks correction effectiveness
Internal Skill System
CannyForge also includes a declarative skill system for standalone use (without LangGraph):
- Skills defined via
SKILL.mdfiles (AgentSkills.io spec) - Three-tier execution: custom handler → LLM multi-step → template fallback
- PATTERN_LIBRARY with condition-based rules for internal context signals
- Rule lifecycle: ACTIVE → PROBATION → DORMANT → resurrection
from cannyforge import CannyForge
forge = CannyForge()
result = forge.execute("Write an email about the 3 PM meeting")
See scenarios/demo.py for the animated terminal demo of this path.
How Learning Works
1. Error Recording
# Via middleware (automatic):
agent = create_react_agent(llm, tools, post_model_hook=middleware.after_model)
# Or manual:
forge.learning_engine.record_error(
skill_name="tool_use",
task_description="Create a Q4 summary",
error_type="WrongToolError",
error_message="Called get_data instead of generate_report",
context_snapshot={...},
)
2. Learning Cycle
metrics = forge.run_learning_cycle(min_frequency=2, min_confidence=0.3)
# Produces:
# - Condition-based rules (for internal skill system)
# - Corrections (for LangGraph injection)
3. Correction Injection
corrections = forge.knowledge_base.get_corrections("tool_use")
# [Correction(content="When task involves report, summary... use generate_report, NOT get_data")]
# Automatically injected by middleware.before_model() as a SystemMessage
Run Tests
pytest tests/ -v
Project Structure
cannyforge/
├── cannyforge/
│ ├── core.py # CannyForge orchestrator
│ ├── corrections.py # Correction + CorrectionGenerator
│ ├── knowledge.py # KnowledgeBase, Rules, Conditions, Actions
│ ├── learning.py # PatternDetector, LearningEngine
│ ├── skills.py # Declarative skill system
│ ├── llm.py # LLM providers (Claude, OpenAI, DeepSeek)
│ ├── tools.py # Tool definitions and execution
│ ├── storage.py # Storage backends (JSON, SQLite)
│ └── adapters/
│ └── langgraph.py # LangGraph middleware (pre/post model hooks)
│
├── scenarios/
│ ├── demo_cannyforge.py # Canonical demo (corrections pipeline)
│ ├── demo_langgraph_tool_use.py # Minimal quickstart
│ └── demo.py # Animated demo (internal skill system)
│
├── tests/ # Test suite
└── skills/ # Built-in skill definitions (SKILL.md)
Further Reading
License
Licensed under BSL 1.1. Free to use in production, but you may not offer CannyForge as a competing hosted service. Converts to Apache 2.0 on 2030-03-01. See LICENSE for full terms.
For commercial licensing inquiries: cannyforge@gmail.com
CannyForge — Your agent makes fewer repeated mistakes over time, with measurable evidence.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cannyforge-0.3.0.tar.gz.
File metadata
- Download URL: cannyforge-0.3.0.tar.gz
- Upload date:
- Size: 114.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee70324b78f6b53a806e22a45002ec4396362e3b4592520476c29ed4227040ad
|
|
| MD5 |
aa69749a1522ae960ad86c4abdfb1467
|
|
| BLAKE2b-256 |
342ba019297db37ca24115fbe1f989381501771d6706c2ccbfb0ae138b767cb4
|
File details
Details for the file cannyforge-0.3.0-py3-none-any.whl.
File metadata
- Download URL: cannyforge-0.3.0-py3-none-any.whl
- Upload date:
- Size: 98.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67a95a95eaf3c77ed6005a0e9a8cbc18f3fc9967308b42551888b926c8d96689
|
|
| MD5 |
587aeb6a1809965b66d32be436b4f3c6
|
|
| BLAKE2b-256 |
c8268111a312e4820d02754d25734208b411b3e883bdcf6368cf765849d371c2
|