Skip to main content

Experience memory layer for coding agents — learn from mistakes, inject proven fixes

Project description

English | 한국어

Project Forge

An experience memory layer for coding agents.

CI Python 3.12+ Tests Dependencies Schema License: MIT


What is Forge?

Forge is an experience memory layer that sits between your coding agent sessions. It is not an orchestrator, not a harness — it is the long-term memory that agents lack.

The problem: LLM coding agents (Claude Code, Cursor, etc.) start every session from scratch. They repeat the same mistakes, forget yesterday's fixes, and can't learn from past failures.

Forge solves this by:

  1. Remembering — automatically captures failures, decisions, and rules from every session
  2. Learning — uses reinforcement learning (Q-values) to measure which experiences actually help
  3. Injecting — feeds the most useful experiences into the next session's context, ranked by proven effectiveness

It runs as Claude Code hooks — zero manual intervention after setup.

Session Start               Mid-session                 Session End
  forge resume                 forge detect                forge writeback
  ↓                            ↓                           ↓
  Load Q-ranked patterns       Match stderr to DB          Parse transcript
  ↓                            ↓                           ↓
  Inject into context          Warn: "Seen this before,    Update Q-values
  "Last time this failed,       try: use async with"       Record experiment
   here's the fix (Q:0.8)"                                 Auto-promote

Installation

Step 1: Install

# pip
pip install git+https://github.com/gksl5355/Project-Forge.git

# or uv (faster)
uv tool install git+https://github.com/gksl5355/Project-Forge.git

Important: forge must be on your system PATH. Virtual-env-only installs will not work with hooks.

Step 2: Setup

forge setup

This one command:

  • Creates the experience database (~/.forge/forge.db)
  • Installs learning hooks (session start/end/failure detection)
  • Installs guard hooks (secret detection, --no-verify blocking)
  • Installs team skills (spawn-team, doctor, debate, ralph)
  • Patches ~/.claude/settings.json (append-only, creates backup)
=== Forge Setup ===

Hooks & Settings:
  + hooks.SessionStart: resume.sh
  + hooks.SessionEnd: writeback.sh
  + hooks.PostToolUse: detect.sh
  = env.AGENT_TEAMS = 1 (ok)           ← existing value preserved
  ! env.SOME_KEY = X (recommends: Y)   ← conflict shown, not overwritten

Skills:
  + ~/.claude/skills/spawn-team/

Proceed? [Y/n]:
  • forge setup -y to skip confirmation

Step 3: Done

Start coding. Forge learns automatically from every session.

# Check your Forge Score after a few sessions
forge score

# View with full breakdown
forge score --detail

For developers (editable install)

git clone https://github.com/gksl5355/Project-Forge.git
cd Project-Forge
pip install -e ".[dev]"     # or: make dev
forge setup

Features

Automatic Experience Learning

Every session goes through a learn → remember → inject cycle:

Phase Hook What happens
Start forge resume Loads top experiences by Q-value, injects into agent context
During forge detect Matches stderr/failures against known patterns, warns in real-time
End forge writeback Parses transcript, extracts new failures, updates Q-values

No manual intervention needed. Forge gets smarter with every session.

Q-Value Learning (MemRL)

Based on MemRL — each experience has a Q-value that measures how useful it actually is:

Q ← Q + α(reward - Q)

reward = 1.0  →  Warning helped (failure was avoided)
reward = 0.0  →  Warning ignored (same error repeated)

Time decay: Q *= (1 - 0.005)^days_since_last_used

High-Q experiences get shown first. Low-Q ones fade away. The system self-corrects.

Forge Score

One number that tells you how well Forge is working:

forge score
# === Forge Score (workspace: default) ===
#
#   Forge Score:     0.68 / 1.00
#
#   학습 효과 (QWHR):     0.72
#   컨텍스트 적중률:       0.65
#   토큰 효율:             0.58
#   패턴: 47개 | 세션: 23개

forge score --detail     # full breakdown with routing, circuit breaker, etc.

The score is computed from 8 internal metrics, weighted and optimized through parameter sweeps. You don't need to know the formula — just watch the number go up.

Smart Context Injection

Forge doesn't dump all experiences at once. It ranks them using:

  • Q-value — proven effectiveness
  • Recency — recent failures weighted higher (configurable decay)
  • Relevance — tag overlap with current session

The top experiences are formatted in a token-efficient format and injected at session start.

Adaptive Warning Formats

Forge automatically tests different warning formats (A/B testing) and converges on whichever format actually helps your agent more:

  • Essential: [WARN] pattern → hint (minimal tokens)
  • Annotated: [WARN] pattern Q:0.75 → hint (balanced)
  • Concise: [WARN] pattern Q:0.75 → hint_short (default)
  • Detailed: Full stats with seen/helped counts

Guard Hooks

Protective hooks that prevent common agent failure modes:

Hook What it does
block-no-verify.sh Blocks --no-verify — prevents bypassing pre-commit hooks
guard-secrets.sh Detects API keys, tokens, private keys in writes
suggest-compact.sh Suggests /compact after 50+ tool calls
cost-tracker.sh Logs session metrics for efficiency tracking

Circuit Breaker

Automatically detects when a session is stuck in a failure loop:

  • Tracks consecutive failures and total tool calls per session
  • Trips when limits are exceeded (configurable)
  • Resets on success

Model Routing

Learns which LLM model works best for different task types:

quick tasks  → claude-haiku-4-5      (fast, cheap)
standard     → claude-sonnet-4-6     (balanced)
deep tasks   → claude-opus-4-6       (thorough)
review       → claude-sonnet-4-6     (good at review)

Routing accuracy improves as more session data accumulates.

Team Orchestration Support

Forge integrates with /spawn-team to learn from multi-agent runs:

forge recommend --complexity MEDIUM
# → sonnet:2+haiku:1 (3 runs, success: 85%, confidence: medium)

forge resume --team-brief
# → Recent team failures + recommended config

Global Promotion

When a pattern appears in 2+ projects, Forge automatically promotes it to a global experience that benefits all workspaces.

Commands

Everyday

Command Description
forge setup Full setup (DB + hooks + skills + settings)
forge score View your Forge Score
forge score --detail Full score breakdown
forge config View/change settings
forge stats Workspace statistics

Data Management

Command Description
forge record failure Record a failure pattern with hint
forge record decision Record a decision with rationale
forge record rule Record a project rule (block/warn/log)
forge list List experiences by type
forge detail PATTERN Detailed view of a pattern
forge search -t TAG Search by tag
forge edit Edit existing records

Analysis

Command Description
forge trend Fitness trend over time
forge recommend Team config recommendation
forge decay Apply time decay to stale patterns
forge promote ID Promote to global or knowledge
forge ingest Ingest team run data
forge dedup Merge duplicate patterns

Hooks (automatic, not manually called)

Command Trigger Description
forge resume SessionStart Inject experiences into context
forge detect PostToolUse Real-time failure matching
forge writeback SessionEnd Learn from session transcript

Configuration

forge config                    # view basic settings
forge config --set alpha=0.15   # change a setting
forge config --advanced         # view all parameters (40+)

Basic settings (~/.forge/config.yml):

max_tokens: 3000          # max context tokens for injection
l0_max_entries: 50         # max patterns to show
llm_model: claude-haiku-4-5-20251001
alpha: 0.1                 # EMA learning rate
routing_enabled: true      # model routing on/off
circuit_breaker_enabled: true

All settings are optional. Defaults are pre-optimized.

Architecture

forge/
├── cli.py              # Typer CLI (all commands)
├── config.py           # ForgeConfig + YAML loading
├── engines/            # Core engines
│   ├── resume.py       # Session start: context injection
│   ├── detect.py       # Mid-session: failure matching
│   ├── writeback.py    # Session end: learning
│   ├── fitness.py      # Forge Score computation
│   ├── routing.py      # Model routing
│   ├── prompt_optimizer.py  # A/B testing, hint scoring
│   ├── sweep.py        # Parameter optimization
│   └── ...
├── core/               # Core logic
│   ├── qvalue.py       # Q-value EMA updates
│   ├── context.py      # L0/L1 context formatting
│   ├── circuit_breaker.py
│   └── ...
├── storage/            # SQLite storage
│   ├── db.py           # Schema, connections
│   ├── models.py       # Dataclass models
│   └── queries.py      # Raw SQL queries
├── hooks/              # Shell hook templates
└── skills/             # Bundled SKILL.md files

Data flow:

Agent session
  ↓ SessionStart hook
forge resume → DB query → context injection
  ↓ PostToolUse hook (on failure)
forge detect → pattern match → real-time warning
  ↓ SessionEnd hook
forge writeback → transcript parse → Q update → experiment record

What Gets Installed

Component Location Purpose
Experience DB ~/.forge/forge.db SQLite — failures, decisions, rules, experiments
Learning hooks ~/.forge/hooks/*.sh Session start/end/failure detection
Guard hooks ~/.forge/hooks/*.sh Secret guard, no-verify block, compact suggest
Team skills ~/.claude/skills/ spawn-team, doctor, debate, ralph
Settings patch ~/.claude/settings.json Hook registration (append-only, backup created)
Config ~/.forge/config.yml Optional overrides (created on first forge config --set)

Metrics

Metric Value
Tests 1,203 (all passing)
Source modules 40
Test files 42
Lines of code ~8,900
DB schema v5
External dependencies 2 (typer, pyyaml)
Python 3.12+

Tech Stack

  • Python 3.12+ — runtime
  • SQLite — built-in DB, zero config, no external server
  • Typer — CLI framework
  • PyYAML — config parsing

Acknowledgements

  • MemRL — EMA-based Q-value learning. Core insight: Q measures "hint usefulness", not "failure severity"
  • OpenViking (ByteDance) — L0/L1/L2 layered context loading for token efficiency
  • Claude Code — Hook system that makes automatic learning possible

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_memory-1.2.0.tar.gz (202.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forge_memory-1.2.0-py3-none-any.whl (120.0 kB view details)

Uploaded Python 3

File details

Details for the file forge_memory-1.2.0.tar.gz.

File metadata

  • Download URL: forge_memory-1.2.0.tar.gz
  • Upload date:
  • Size: 202.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for forge_memory-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2ded0f4ec401cf75383d4d63c28bc62ea8d9083957f72647001cbc65c7360067
MD5 427204496724fbbc286da00741c07ced
BLAKE2b-256 bb3ddf83be537146f6c65b68ab84b0bb24ee3aae24bdd160684b3afd0c5d14c3

See more details on using hashes here.

File details

Details for the file forge_memory-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: forge_memory-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 120.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for forge_memory-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a95a3fbf8d2f4c83080a3b9beefbd0dfe511accb63b48e6cd28b3d6f75fb2a6
MD5 3389fa2fa1b950803fb0c42735c27027
BLAKE2b-256 d7076fa5aba8a5c2ad1a1c02bc9cc53ea06108c4fdbaae5c13d308ead99045ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page