Self-improving agents with closed-loop learning — agents that learn to get it right

These details have not been verified by PyPI

Project links

Project description

CannyForge

Reliability memory for tool-using LLM agents.

CannyForge watches your agent make mistakes, learns corrections, and injects them as SystemMessages before each LLM call. Your agent gets better over time — no retraining required.

Agent makes errors → CannyForge learns corrections → Agent stops repeating them

CannyForge demo: 80% → 100% on real LLM

Quick Start (LangGraph)

from cannyforge import CannyForge
from cannyforge.adapters.langgraph import CannyForgeMiddleware
from langgraph.prebuilt import create_react_agent

forge = CannyForge()
middleware = CannyForgeMiddleware(forge)
agent = create_react_agent(model, tools,
    pre_model_hook=middleware.before_model,
    post_model_hook=middleware.after_model)

# Just run tasks. CannyForge records errors via after_model.
# After learning, before_model injects corrections as SystemMessages.

v0.3 Direction

The benchmark branch is focused on three adoption-oriented improvements:

better reliability loops for tool-using agents
broader benchmark coverage for arguments, multi-step execution, and recovery
portable learned skills for assistant and workflow reuse

This is aimed at modern agent stacks, including LangGraph, LangChain-style tool agents, CrewAI-style orchestration, MCP-connected assistants, and personal assistant workflows.

See docs/v0.3-public.md for the public summary.

How It Works

Record errors — after_model detects tool failures and records them
Learn corrections — run_learning_cycle() clusters errors and generates specific correction text (template or LLM-generated)
Inject corrections — before_model prepends a SystemMessage with all active corrections before each LLM call
Track effectiveness — corrections that prevent recurrence are kept; ineffective ones can be regenerated

The correction is specific and actionable:

[CANNYFORGE] Learned rules for this request:
- When the task involves report, summary, sales, use `generate_report`, NOT `get_data`.
  Example: "Create a summary of Q4 sales performance"

Demo: 60% → 100% on Real LLM

pip install langgraph langchain-openai
# Set LLM_API_KEY in .env
python scenarios/demo_cannyforge.py

This runs 15 ambiguous tool-selection tasks twice:

Phase 1: baseline without corrections — records errors
Learning: generates corrections from observed errors
Phase 2: same tasks with correction injection — accuracy improves

Real output with DeepSeek:

Phase 1 accuracy: 9/15 (60%)
Phase 2 accuracy: 15/15 (100%)
Tasks fixed:
  - Restart the staging server -> execute_action
  - Send an alert to the on-call team -> execute_action
  - Deploy the latest build to production -> execute_action
  - Create a summary of Q4 sales performance -> generate_report
  - Write up a status report for this sprint -> generate_report
  - Generate a monthly uptime report -> generate_report

No simulated errors. No hand-crafted rules. Real LLM decisions, real corrections from the pipeline.

Install

pip install cannyforge           # from PyPI

Or from source:

git clone https://github.com/cannyforge/cannyforge.git
cd cannyforge
pip install -e .

Scenarios

Script	Purpose
`scenarios/demo_cannyforge.py`	Canonical demo — full pipeline: baseline → learn → improve
`scenarios/demo_langgraph_tool_use.py`	Minimal quickstart — 3-line integration
`scenarios/demo.py`	Animated terminal demo (internal skill system)

Older demo scripts are in scenarios/archive/ for reference.

Framework Coverage

Surface	Current repo path
LangGraph middleware	`cannyforge/adapters/langgraph.py`
LangChain adapter	`cannyforge/adapters/langchain.py`
CrewAI adapter	`cannyforge/adapters/crewai.py`
MCP server	`cannyforge/mcp_server.py`
Bundled assistant skills	`cannyforge/bundled_skills/`

CannyForge is designed to sit on top of existing agent frameworks rather than replace them. The public focus for v0.3 is improved tool routing, argument quality, multi-step execution, and reusable skills.

Core Architecture

Corrections Pipeline (LangGraph integration)

cannyforge/corrections.py    — Correction dataclass + CorrectionGenerator
cannyforge/adapters/langgraph.py — CannyForgeMiddleware (pre/post model hooks)
cannyforge/knowledge.py      — KnowledgeBase stores corrections + rules
cannyforge/learning.py       — PatternDetector + LearningEngine
cannyforge/core.py           — CannyForge orchestrator

CorrectionGenerator turns error clusters into actionable text:

Template mode (no LLM): groups by (wrong_tool, right_tool), extracts keywords, formats guidance
LLM mode: sends error cluster to LLM asking for a generalized rule covering unseen tasks

CannyForgeMiddleware hooks into LangGraph's create_react_agent:

before_model: injects always-on corrections + conditional rules as a SystemMessage
after_model: records tool failures, tracks correction effectiveness

Internal Skill System

CannyForge also includes a declarative skill system for standalone use (without LangGraph):

Skills defined via SKILL.md files (AgentSkills.io spec)
Three-tier execution: custom handler → LLM multi-step → template fallback
PATTERN_LIBRARY with condition-based rules for internal context signals
Rule lifecycle: ACTIVE → PROBATION → DORMANT → resurrection

from cannyforge import CannyForge
forge = CannyForge()
result = forge.execute("Write an email about the 3 PM meeting")

See scenarios/demo.py for the animated terminal demo of this path.

How Learning Works

1. Error Recording

# Via middleware (automatic):
agent = create_react_agent(llm, tools, post_model_hook=middleware.after_model)

# Or manual:
forge.learning_engine.record_error(
    skill_name="tool_use",
    task_description="Create a Q4 summary",
    error_type="WrongToolError",
    error_message="Called get_data instead of generate_report",
    context_snapshot={...},
)

2. Learning Cycle

metrics = forge.run_learning_cycle(min_frequency=2, min_confidence=0.3)
# Produces:
#   - Condition-based rules (for internal skill system)
#   - Corrections (for LangGraph injection)

3. Correction Injection

corrections = forge.knowledge_base.get_corrections("tool_use")
# [Correction(content="When task involves report, summary... use generate_report, NOT get_data")]

# Automatically injected by middleware.before_model() as a SystemMessage

Run Tests

pytest tests/ -v

Project Structure

cannyforge/
├── cannyforge/
│   ├── core.py              # CannyForge orchestrator
│   ├── corrections.py       # Correction + CorrectionGenerator
│   ├── knowledge.py         # KnowledgeBase, Rules, Conditions, Actions
│   ├── learning.py          # PatternDetector, LearningEngine
│   ├── skills.py            # Declarative skill system
│   ├── llm.py               # LLM providers (Claude, OpenAI, DeepSeek)
│   ├── tools.py             # Tool definitions and execution
│   ├── storage.py           # Storage backends (JSON, SQLite)
│   └── adapters/
│       └── langgraph.py     # LangGraph middleware (pre/post model hooks)
│
├── scenarios/
│   ├── demo_cannyforge.py   # Canonical demo (corrections pipeline)
│   ├── demo_langgraph_tool_use.py  # Minimal quickstart
│   └── demo.py              # Animated demo (internal skill system)
│
├── tests/                   # Test suite
└── skills/                  # Built-in skill definitions (SKILL.md)

License

Licensed under BSL 1.1. Free to use in production, but you may not offer CannyForge as a competing hosted service. Converts to Apache 2.0 on 2030-03-01. See LICENSE for full terms.

For commercial licensing inquiries: cannyforge@gmail.com

CannyForge — Your agent makes fewer repeated mistakes over time, with measurable evidence.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 10, 2026

0.2.1

Mar 10, 2026

0.2.0

Mar 2, 2026

0.1.1

Mar 2, 2026

0.1.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cannyforge-0.3.0.tar.gz (114.3 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cannyforge-0.3.0-py3-none-any.whl (98.0 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file cannyforge-0.3.0.tar.gz.

File metadata

Download URL: cannyforge-0.3.0.tar.gz
Upload date: Apr 10, 2026
Size: 114.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ee70324b78f6b53a806e22a45002ec4396362e3b4592520476c29ed4227040ad`
MD5	`aa69749a1522ae960ad86c4abdfb1467`
BLAKE2b-256	`342ba019297db37ca24115fbe1f989381501771d6706c2ccbfb0ae138b767cb4`

See more details on using hashes here.

File details

Details for the file cannyforge-0.3.0-py3-none-any.whl.

File metadata

Download URL: cannyforge-0.3.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 98.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cannyforge-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67a95a95eaf3c77ed6005a0e9a8cbc18f3fc9967308b42551888b926c8d96689`
MD5	`587aeb6a1809965b66d32be436b4f3c6`
BLAKE2b-256	`c8268111a312e4820d02754d25734208b411b3e883bdcf6368cf765849d371c2`

See more details on using hashes here.

cannyforge 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CannyForge

Quick Start (LangGraph)

v0.3 Direction

How It Works

Demo: 60% → 100% on Real LLM

Install

Scenarios

Framework Coverage

Core Architecture

Corrections Pipeline (LangGraph integration)

Internal Skill System

How Learning Works

1. Error Recording

2. Learning Cycle

3. Correction Injection

Run Tests

Project Structure

Further Reading

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes