Production-ready multi-agent CLI with unified memory

Project description

forge — AI Coding Agent Framework

Two modes, one CLI. Forge new → directed graph orchestration. Forge compile → intent-to-production pipeline with formal verification, mutation testing, and runtime profiling.

Install

Via npm (recommended)

npm install -g @tearbin/forge
forge --version
forge setup  # interactive: pick provider, enter API key, test

Via npx (no install)

npx @tearbin/forge

Via pip (coming soon — package name claim in progress)

pip install forge-cli   # ← available once the PyPI package name is resolved
forge --version
forge setup

From source

cd ~/forge
pip install -e .
forge --version
forge setup

Two Modes

Command	What it does
`forge new "build a Stripe billing app"`	Directed graph: orchestrator → spec → executor → review gate
`forge compile "build a URL shortener"`	Pipeline: interview → rules → failing tests → code → review → integration
`forge compile --edit "rename get_user"`	Edit pipeline: blast radius → extract rules → delta → test delta

Both modes share the same LLM backend, TUI, skills system, and memory model.

forge new — Directed Graph Orchestration

User Prompt
    ↓
Orchestrator       ← decides: new or continue, handles failures/escalation
    ↓
Spec Generator     ← asks clarifying Qs, writes SPEC.md (git-tracked, append-only)
    ↓
Executor           ← TDD: tests first, then implementation, against SPEC.md
    ↓
Review Gate        ← checkpoint: pass → done, fail → retry executor
    ↓
Orchestrator       ← done or escalate to user

Memory Model (4-tier, per-project SQLite)

Tier	Scope	Duration	Storage
`short`	Current task	Session	In-memory dict
`mid`	Current project	Days	`~/.forge/projects/<id>/memory.db`
`episodic`	Across sessions	Weeks	Same DB
`long`	Cross-project	Months	`~/.forge/memory/`

Every spec write is a new version row (never overwrite). forge continue loads the latest and resumes from orchestrator state.

Failure Lanes (explicit)

Every node can emit:

done → follow normal edge
blocked → review_gate emits review_fail → executor retries with reason
failed → orchestrator decides: retry / escalate / block

forge compile — Intent to Production Pipeline

Describe what you want in plain English. Forge interviews you, writes failing tests from formal rules, writes the code to pass them, runs 10 parallel static analysis gates, and integrates everything into a runnable project.

Intent (plain English)
    ↓
Stage 1: Interviewer         adversarial Q&A — extracts entities, relationships, rules
    ↓
Stage 2: Flow Designer        → UserFlowTree (screen-by-screen flows)
    ↓
Stage 3: Contract Writer      → APIContract[] (request/response/error for every action)
    ↓
Stage 4: Rule Compiler        → RuleSet (IF/THEN constraints, no prose)
    ↓
Gate 1: Human Review          y/n — reject sends feedback back to Rule Compiler
    ↓
Stage 5: Schema Designer      → DatabaseSchema (PostgreSQL — tables, FKs, indexes, audit columns)
    ↓
Stage 6: Test Writer          → FailingTestSuite (one test per rule) + Hypothesis strategies
    ↓
Stage 7: Mutation Testing     ← mutmut — hard gate if score < 70%
    ↓
Stage 8: Coder                → passing code (pyupgrade + pyright + crosshair in TDD loop)
    ↓
Stage 9: Integrator           → requirements.txt + supply chain scan + SBOM
    ↓
Stage 10: Reviewer            → 10 parallel gates: bandit, semgrep, radon, vulture, griffe,
    │                            deptry, pyright, mutmut, pip-audit, crosshair
    ↓
Gate 2: Human Review          y/n — reject sends feedback back to Coder
    ↓
Stage 11: Profiling           ← memray (memory) + pyspy (CPU) — advisory feedback
    ↓
Done: project_dir/

The 10 Hard Gates

Every gate blocks the pipeline. No LLM synthesis happens until all gates pass.

#	Gate	Tool	Threshold
1	Security vulnerabilities	`bandit`	HIGH/CRITICAL → BLOCK
2	Hardcoded secrets	`semgrep` (p/secrets)	HIGH → BLOCK
3	Complexity	`radon`	CC > 10 → BLOCK
4	Dead code	`vulture`	≥80% confidence → BLOCK
5	Contract drift	`griffe`	Any drift → BLOCK
6	Missing dependencies	`deptry` (DEP001)	Missing dep → BLOCK
7	Type errors	`pyright`	Any error → BLOCK
8	Test quality	`mutmut`	Score < 70% → BLOCK
9	Supply chain CVEs	`pip-audit`	HIGH/CRITICAL → BLOCK
10	Contract violations	`crosshair`	Any violation → BLOCK

CLI

forge compile "build a URL shortener with analytics"
forge compile "a REST API" --auto-approve
forge compile "a blog" --test-prompt "posts" --test-prompt "auth"
forge compile "a service" --sandbox              # E2B sandbox isolation
forge compile "a service" --nats-url nats://localhost:4222
forge compile "a blog" --workdir ./myproject

Edit Pipeline — Change Existing Code

forge compile --edit "add OAuth to the API" --workdir ./myproject
forge compile --edit "rename get_user to fetch_user" --workdir ./myproject
forge compile --edit "change beta user discount from 20% to 30%" --workdir ./myproject

Stage 1: ChangeIntent          ← what is changing, what must not
    ↓
Stage 2: ImpactAnalyst        ← jedi blast-radius: all callers of changed function
    ↓
Stage 3: RuleExtractor        ← extract IF/THEN rules from source by observation
    ↓
Stage 4: DeltaCompiler        ← PRESERVE / CHANGE / NEW / REMOVE per rule
    ↓
Stage 5: BlastChecker         ← verify delta stays within declared blast radius
    ↓
Stage 6: TestDeltaWriter      ← failing tests for CHANGE rules, regression for PRESERVE
    ↓
    Coder (edit-mode TDD loop)

The BlastChecker raises BlastCheckError if the delta touches files outside the declared impact surface. The user must approve an expanded scope or narrow the change.

Agents

Agent	Stage	Input	Output
`InterviewerAgent`	1	Plain intent	`IntentDocument`
`FlowDesignerAgent`	2	`IntentDocument`	`UserFlowTree`
`ContractWriterAgent`	3	`UserFlowTree`	`list[APIContract]`
`RuleCompilerAgent`	4	`list[APIContract]` + `UserFlowTree`	`RuleSet`
`SchemaDesignerAgent`	5	`IntentDocument`	`DatabaseSchema`
`TestWriterAgent`	6	`RuleSet`	`FailingTestSuite` + Hypothesis strategies
`CoderAgent`	8	`FailingTestSuite` + `RuleSet`	Code files + passing tests
`IntegratorAgent`	9	All of the above	`requirements.txt` + SBOM + supply chain
`ReviewerAgent`	10	`RuleSet` + project files	`ReviewResult` (violations + LLM synthesis)
`ImpactAnalystAgent`	2 (edit)	`ChangeIntent` + codebase	`ImpactSurface` (jedi blast-radius)
`RuleExtractorAgent`	3 (edit)	`ImpactSurface` + source	`list[Rule]` (PRESERVE)
`DeltaCompilerAgent`	4 (edit)	`list[Rule]` + `ChangeIntent`	`DeltaRuleSet`
`BlastCheckerAgent`	5 (edit)	`DeltaRuleSet` + `ImpactSurface`	bool (raises `BlastCheckError`)
`TestDeltaWriterAgent`	6 (edit)	`DeltaRuleSet`	`FailingTestSuite`

Code Intelligence (22 tools)

Formal verification, semantic analysis, runtime intelligence, mutation testing, supply chain, and quality gates — all wired as structured tool integrations, surfaced in the TUI CodeIntelligenceScreen (accessible via /ci command palette or forge tui --screen ci).

Tool	Role	Mode	TUI
`bandit`	Security analysis	Hard gate	✓
`semgrep`	Secrets + security rules	Hard gate (p/secrets)	✓
`pip_audit`	CVE scanning	Hard gate — no HIGH/CRITICAL CVEs	✓
`pyupgrade`	Python version upgrade	Auto-fix	✓
`radon`	Complexity analysis	Hard gate CC > 10	✓
`vulture`	Dead code detection	Hard gate ≥80% confidence	✓
`griffe`	API contract drift	Hard gate	✓
`deptry`	Dependency analysis	Missing dep = hard gate	✓
`cyclonedx`	SBOM generation	Artifact — CycloneDX JSON per build	✓
`crosshair`	Static contract proving	Hard gate — no violations	✓
`hypothesis`	Property-based test generation	Advisory — enriches test coverage	✓
`mutmut`	Mutation testing	Hard gate — score ≥ 70%	✓
`jedi`	Semantic callers/inference	Blast-radius analysis	✓
`rope`	Semantic refactoring	Safe renames/moves	✓
`memray`	Memory profiling	Advisory feedback	✓
`pyspy`	CPU sampling	Advisory feedback	✓
`otel`	OpenTelemetry tracing	Traces on test failure
`pyright`	Type checking	Hard gate — wired in coder
`ctags`	Symbol index	Navigation
`atlas`	Schema migration	Migration files

Skills

Skills are reusable knowledge stored in ~/.forge/skills/ and ~/.hermes/skills/. They teach agents how to use the tools — thresholds, patterns, error interpretation. They are loaded on demand per agent role.

forge --list-skills

Agent Role Skills

Skill	Teaches
`reviewer`	10-gate hierarchy, evidence formatting, blocking failure strings
`test-writer`	Property-based vs example-based thinking, mutmut loop, Hypothesis patterns
`coder`	TDD discipline, minimal fix, pyupgrade + crosshair in TDD loop

Quality Gate Skills

Skill	Teaches
`bandit`	B301/B303/B310/B608 patterns, fix templates, rejection strings
`radon`	CC sources (nested conditionals, boolean expressions), refactor patterns
`vulture`	80% confidence rule, speculative code detection
`pyright`	TypedDict/Generic/Protocol patterns, type annotation discipline
`mutmut`	Survivor interpretation, test-to-kill mapping, threshold philosophy
`crosshair`	PEP 316 contract syntax, counterexample reading
`hypothesis`	Strategy map, invariant testing, `@given`/`@settings` patterns
`pip-audit`	CVE severity mapping, auto-fix strategy
`griffe`	Drift detection, contract comparison logic

Semantic Skills

Skill	Teaches
`jedi`	find_callers/get_inference vs grep, reference fields, limitations
`rope`	rename/extract/inline vs string-replace, safe refactor pattern

Stack Skills

Skill	Teaches
`fastapi`	Project structure, route patterns, Pydantic v2, dependency injection
`postgres`	Soft deletes, audit logging, JSONB, multi-tenancy, indexing

TUI

Launch with forge tui — full-screen Textual interface with 6 screens.

Screens

Screen	Command	What it does
`HomeScreen`	Default	Session browser, new session, project picker
`ChatScreen`	`/chat`	Free-form LLM chat with 29 tools wired
`GraphScreen`	`/graph`	Directed graph execution: DAG visualizer + event stream
`ProductCompilerScreen`	`/compile`	Full 11-stage intent-to-production pipeline
`CodeIntelligenceScreen`	`/ci`	22 code intelligence tools, run individually or by category
`ModelPicker`	`Ctrl+A`	Switch models/providers from any screen

Home Screen

┌─────────────────────────────────────────────────────────┐
│  Home                                    [Ctrl+P] CMDS │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Recent Sessions                                        │
│  > forge new "build a Stripe billing app"   2h ago      │
│  > forge compile "a URL shortener"          yesterday   │
│  > edit "rename get_user to fetch_user"    3 days ago   │
│                                                         │
│  Model: MiniMax (minimax_openai / default)              │
│  Provider: OpenAI-compatible · Streaming: ✓ · Tools: ✓  │
│                                                         │
│  [Ctrl+A] Model   [Ctrl+P] Commands   [Enter] New       │
└─────────────────────────────────────────────────────────┘

Chat Screen

┌─────────────────────────────────────────────────────────┐
│  Chat — forge new: Stripe billing app          [Ctrl+P]  │
├────────────────────────────────────┬────────────────────┤
│                                    │ Tools             │
│ > build a Stripe billing app       │ [base] 7 tools    │
│                                    │ [code] 22 tools   │
│ ✓ Plan created: 3 files           │                   │
│ ✓ Tests passing                   │ Sessions: 4      │
│ ✓ 847 tokens · $0.02              │ Tokens: 12.4K     │
│                                    │ Cost: $0.31      │
│ [Streaming tokens...]              │                   │
├────────────────────────────────────┴────────────────────┤
│ > _                                                     │
└─────────────────────────────────────────────────────────┘

29 tools available in Chat: bash, read, write, search, grep, glob, patch (7 base) + bandit, radon_cc, vulture, semgrep, pip_audit, pyupgrade, deptry, cyclonedx_sbom, jedi_find_callers, jedi_infer, jedi_complete, jedi_signatures, rope_rename, rope_extract, rope_inline, rope_move, crosshair, hypothesis, mutmut, griffe, memray, pyspy (22 code intel).

Session persistence: restore any past session, token/cost tracking per session.

Graph Screen

┌─────────────────────────────────────────────────────────┐
│  Graph — forge new                              [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  DAG                                    Ver: 3   ① 1   │
│  ───                                                    │
│  orchestrator  [*] ← active                            │
│       │                                                │
│  spec_gen      [ ]                                      │
│       │                                                │
│  executor      [✓]                                      │
│       │                                                │
│  review_gate   [✓]  PASS                                │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  Event Stream                                           │
│  ───────────                                            │
│  10:42:01  orchestrator › spec_gen › executor           │
│  10:42:03  executor › review_gate › PASS               │
│  10:42:04  orchestrator › done                          │
├─────────────────────────────────────────────────────────┤
│  [Enter] Start   [Ctrl+R] Reset   [Ctrl+V] Spec   [Esc] │
└─────────────────────────────────────────────────────────┘

Code Intelligence Screen

┌─────────────────────────────────────────────────────────┐
│  Code Intelligence                              [Ctrl+P]│
├──────────────────┬──────────────────┬──────────────────┤
│ 🔒 Security      │ Results          │ Summary           │
│ [Bandit       ]  │ ─────────────── │ ──────────────── │
│ [Semgrep      ]  │ HIGH: 2 issues  │ Last run: 10:44   │
│ [pip-audit    ]  │                  │ Pass: 6  Fail: 4  │
│ [pyupgrade    ]  │ src/api.py:42   │ Total: 847 lines  │
│                  │   B301: pickl…  │ CC avg: 4.2       │
│ 📊 Complexity    │                  │ Vulture: 94% conf │
│ [radon_cc     ]  │ src/auth.py:18  │                   │
│ [vulture      ]  │   CC = 14 ⚠     │ [Export Markdown] │
│ [deptry       ]  │                  │                   │
│                  │ MEDIUM: 1 issue  │                   │
│ 📦 Dependency    │ src/billing.py   │                   │
│ [cyclonedx    ]  │   missing dep   │                   │
│                  │                  │                   │
├──────────────────┴──────────────────┴──────────────────┤
│ [Ctrl+R] Run selected   [Ctrl+G] Run grid   [Esc] Back │
└─────────────────────────────────────────────────────────┘

Tool categories:

🔒 Security: Bandit, Semgrep, pip-audit, pyupgrade
📊 Complexity: Radon CC, Vulture, Deptry
📦 Dependency: CycloneDX SBOM
🧠 Semantic: Jedi (find callers, infer, complete, signatures) + Rope (rename, extract, inline, move)
🔬 Advanced: Crosshair, Hypothesis, Mutmut, Griffe, Memray, py-spy

Each tool shows: installed status, PASS/FAIL badge, severity-sorted results with file:line references. Full output exported to ~/.forge/ci-report-*.md.

Product Compiler Screen

┌─────────────────────────────────────────────────────────┐
│  Compile — URL shortener                        [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Stage 5/11: Schema Designer                           │
│  ────────────────────────────────────────              │
│                                                         │
│  PostgreSQL schema — 3 tables                          │
│                                                         │
│  urls(id, original_url, short_code PK, click_count,     │
│       created_at, expires_at, user_id FK)              │
│                                                         │
│  users(id, email, plan, created_at)                     │
│                                                         │
│  analytics(id, url_id FK, ip, user_agent, clicked_at)  │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  [Ctrl+G] Gate Approve   [Ctrl+R] Gate Reject   [Esc]   │
└─────────────────────────────────────────────────────────┘

11 stages with 2 human gates wired into the TUI (Gate 1: after rules, Gate 2: after review).

Keybindings (all screens)

Key	Action
`Ctrl+A`	Model picker
`Ctrl+P`	Command palette (fuzzy search)
`Ctrl+G`	Gate approve
`Ctrl+R`	Gate reject / reset graph
`Ctrl+C`	Cancel running task
`Escape`	Back / home
`Enter`	Start graph run (GraphScreen)
`Ctrl+V`	View spec (GraphScreen)
`Ctrl+E`	View executor output (GraphScreen)
`Ctrl+K`	Clear results (CodeIntelligenceScreen)

LLM Providers

Provider	SDK	Streaming	Tools
OpenAI	`openai`	✓	✓
Anthropic	`anthropic`	✓	✓
MiniMax (OpenAI compat)	`openai` + HTTP	✓	✓
DeepSeek	`openai` + `base_url`	✓	✓
Qwen	`openai` + `base_url`	✓	✓
Groq	`openai` + `base_url`	✓	✓
Ollama	`openai` + localhost	✓	✓

Config ~/.forge/config.yaml:

provider: minimax_openai
model: default
api_key: ...
base_url: ...   # optional, for proxy/custom endpoints

Runtime override:

forge new "my idea" --provider deepseek --model deepseek-chat
forge compile "API" --model-routing "coder=anthropic/claude-sonnet-4"

Run Tests

cd ~/forge
pytest tests/ -v

Project Status

Version	Highlights
v0.9 ✅	Full TUI power wired: 6 screens (Home, Chat, Graph, Compile, CI, ModelPicker), GraphScreen DAG visualizer, 29-tool ChatScreen, CodeIntelligenceScreen (22 tools, 5 categories), ProductCompilerScreen gate wiring, session persistence, streaming, token/cost tracking
v0.8 ✅	10 hard gates, 20 code intelligence tools, edit pipeline (6 stages), 17 skills, jedi/rope semantic layer, mutation testing, supply chain SBOM, OpenTelemetry tracing
v0.7 ✅	Product Compiler: 9-agent pipeline, 2 human gates, E2B sandbox, NATS messaging, full TUI wire
v0.6 ✅	SDK migration: `openai` + `anthropic` Python SDKs, Textual TUI, model picker, command palette
v0.5 ✅	Surgical patch editing, git-aware FileService, LSP integration, exponential backoff
v0.4 ✅	Subagent delegation, MCP session management, structured pytest failure parsing
v0.3 ✅	Multi-session state restore, TDD-first executor, skill auto-loading

Project details

Release history Release notifications | RSS feed

0.9.5

May 4, 2026

0.9.4

May 4, 2026

0.9.3

May 4, 2026

0.9.2

May 4, 2026

This version

0.9.1

May 4, 2026

0.9.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_tui-0.9.1.tar.gz (346.4 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

forge_tui-0.9.1-py3-none-any.whl (250.0 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file forge_tui-0.9.1.tar.gz.

File metadata

Download URL: forge_tui-0.9.1.tar.gz
Upload date: May 4, 2026
Size: 346.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for forge_tui-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`3ecc4bf26db0d3cd12e5dc68bb255a94283bf25f9a9ada2f43a59453d81188ef`
MD5	`d50e7e3a6e96a11b0597344bd66c40b0`
BLAKE2b-256	`db675a0df7ec0075f45891ffbedf514ce37a0bb94748b06a91ce79c1fc3766f5`

See more details on using hashes here.

File details

Details for the file forge_tui-0.9.1-py3-none-any.whl.

File metadata

Download URL: forge_tui-0.9.1-py3-none-any.whl
Upload date: May 4, 2026
Size: 250.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for forge_tui-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31ba013f57f0819345e6a1159e00db30bee1f6fdf01c2fb6071e427345e832c0`
MD5	`4dca50b76bad5db228df5a8ab70d3454`
BLAKE2b-256	`73640aad104f234ea494749380585036dda78a263d4cf20a0d832829753442a0`

See more details on using hashes here.

forge-tui 0.9.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

forge — AI Coding Agent Framework

Install

Via npm (recommended)

Via npx (no install)

Via pip (coming soon — package name claim in progress)

From source

Two Modes

forge new — Directed Graph Orchestration

Memory Model (4-tier, per-project SQLite)

Failure Lanes (explicit)

forge compile — Intent to Production Pipeline

The 10 Hard Gates

CLI

Edit Pipeline — Change Existing Code

Agents

Code Intelligence (22 tools)

Skills

Agent Role Skills

Quality Gate Skills

Semantic Skills

Stack Skills

TUI

Screens

Home Screen

Chat Screen

Graph Screen

Code Intelligence Screen

Product Compiler Screen

Keybindings (all screens)

LLM Providers

Run Tests

Project Status

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes