Production-ready multi-agent CLI with unified memory
Project description
forge — AI Coding Agent Framework
Two modes, one CLI. Forge new → directed graph orchestration. Forge compile → intent-to-production pipeline with formal verification, mutation testing, and runtime profiling.
Install
Via npm (recommended)
npm install -g @tearbin/forge
forge --version
forge setup # interactive: pick provider, enter API key, test
Via npx (no install)
npx @tearbin/forge
Via pip (coming soon — package name claim in progress)
pip install forge-cli # ← available once the PyPI package name is resolved
forge --version
forge setup
From source
cd ~/forge
pip install -e .
forge --version
forge setup
Two Modes
| Command | What it does |
|---|---|
forge new "build a Stripe billing app" |
Directed graph: orchestrator → spec → executor → review gate |
forge compile "build a URL shortener" |
Pipeline: interview → rules → failing tests → code → review → integration |
forge compile --edit "rename get_user" |
Edit pipeline: blast radius → extract rules → delta → test delta |
Both modes share the same LLM backend, TUI, skills system, and memory model.
forge new — Directed Graph Orchestration
User Prompt
↓
Orchestrator ← decides: new or continue, handles failures/escalation
↓
Spec Generator ← asks clarifying Qs, writes SPEC.md (git-tracked, append-only)
↓
Executor ← TDD: tests first, then implementation, against SPEC.md
↓
Review Gate ← checkpoint: pass → done, fail → retry executor
↓
Orchestrator ← done or escalate to user
Memory Model (4-tier, per-project SQLite)
| Tier | Scope | Duration | Storage |
|---|---|---|---|
short |
Current task | Session | In-memory dict |
mid |
Current project | Days | ~/.forge/projects/<id>/memory.db |
episodic |
Across sessions | Weeks | Same DB |
long |
Cross-project | Months | ~/.forge/memory/ |
Every spec write is a new version row (never overwrite). forge continue loads the latest and resumes from orchestrator state.
Failure Lanes (explicit)
Every node can emit:
done→ follow normal edgeblocked→ review_gate emitsreview_fail→ executor retries with reasonfailed→ orchestrator decides:retry/escalate/block
forge compile — Intent to Production Pipeline
Describe what you want in plain English. Forge interviews you, writes failing tests from formal rules, writes the code to pass them, runs 10 parallel static analysis gates, and integrates everything into a runnable project.
Intent (plain English)
↓
Stage 1: Interviewer adversarial Q&A — extracts entities, relationships, rules
↓
Stage 2: Flow Designer → UserFlowTree (screen-by-screen flows)
↓
Stage 3: Contract Writer → APIContract[] (request/response/error for every action)
↓
Stage 4: Rule Compiler → RuleSet (IF/THEN constraints, no prose)
↓
Gate 1: Human Review y/n — reject sends feedback back to Rule Compiler
↓
Stage 5: Schema Designer → DatabaseSchema (PostgreSQL — tables, FKs, indexes, audit columns)
↓
Stage 6: Test Writer → FailingTestSuite (one test per rule) + Hypothesis strategies
↓
Stage 7: Mutation Testing ← mutmut — hard gate if score < 70%
↓
Stage 8: Coder → passing code (pyupgrade + pyright + crosshair in TDD loop)
↓
Stage 9: Integrator → requirements.txt + supply chain scan + SBOM
↓
Stage 10: Reviewer → 10 parallel gates: bandit, semgrep, radon, vulture, griffe,
│ deptry, pyright, mutmut, pip-audit, crosshair
↓
Gate 2: Human Review y/n — reject sends feedback back to Coder
↓
Stage 11: Profiling ← memray (memory) + pyspy (CPU) — advisory feedback
↓
Done: project_dir/
The 10 Hard Gates
Every gate blocks the pipeline. No LLM synthesis happens until all gates pass.
| # | Gate | Tool | Threshold |
|---|---|---|---|
| 1 | Security vulnerabilities | bandit |
HIGH/CRITICAL → BLOCK |
| 2 | Hardcoded secrets | semgrep (p/secrets) |
HIGH → BLOCK |
| 3 | Complexity | radon |
CC > 10 → BLOCK |
| 4 | Dead code | vulture |
≥80% confidence → BLOCK |
| 5 | Contract drift | griffe |
Any drift → BLOCK |
| 6 | Missing dependencies | deptry (DEP001) |
Missing dep → BLOCK |
| 7 | Type errors | pyright |
Any error → BLOCK |
| 8 | Test quality | mutmut |
Score < 70% → BLOCK |
| 9 | Supply chain CVEs | pip-audit |
HIGH/CRITICAL → BLOCK |
| 10 | Contract violations | crosshair |
Any violation → BLOCK |
CLI
forge compile "build a URL shortener with analytics"
forge compile "a REST API" --auto-approve
forge compile "a blog" --test-prompt "posts" --test-prompt "auth"
forge compile "a service" --sandbox # E2B sandbox isolation
forge compile "a service" --nats-url nats://localhost:4222
forge compile "a blog" --workdir ./myproject
Edit Pipeline — Change Existing Code
forge compile --edit "add OAuth to the API" --workdir ./myproject
forge compile --edit "rename get_user to fetch_user" --workdir ./myproject
forge compile --edit "change beta user discount from 20% to 30%" --workdir ./myproject
Stage 1: ChangeIntent ← what is changing, what must not
↓
Stage 2: ImpactAnalyst ← jedi blast-radius: all callers of changed function
↓
Stage 3: RuleExtractor ← extract IF/THEN rules from source by observation
↓
Stage 4: DeltaCompiler ← PRESERVE / CHANGE / NEW / REMOVE per rule
↓
Stage 5: BlastChecker ← verify delta stays within declared blast radius
↓
Stage 6: TestDeltaWriter ← failing tests for CHANGE rules, regression for PRESERVE
↓
Coder (edit-mode TDD loop)
The BlastChecker raises BlastCheckError if the delta touches files outside the declared impact surface. The user must approve an expanded scope or narrow the change.
Agents
| Agent | Stage | Input | Output |
|---|---|---|---|
InterviewerAgent |
1 | Plain intent | IntentDocument |
FlowDesignerAgent |
2 | IntentDocument |
UserFlowTree |
ContractWriterAgent |
3 | UserFlowTree |
list[APIContract] |
RuleCompilerAgent |
4 | list[APIContract] + UserFlowTree |
RuleSet |
SchemaDesignerAgent |
5 | IntentDocument |
DatabaseSchema |
TestWriterAgent |
6 | RuleSet |
FailingTestSuite + Hypothesis strategies |
CoderAgent |
8 | FailingTestSuite + RuleSet |
Code files + passing tests |
IntegratorAgent |
9 | All of the above | requirements.txt + SBOM + supply chain |
ReviewerAgent |
10 | RuleSet + project files |
ReviewResult (violations + LLM synthesis) |
ImpactAnalystAgent |
2 (edit) | ChangeIntent + codebase |
ImpactSurface (jedi blast-radius) |
RuleExtractorAgent |
3 (edit) | ImpactSurface + source |
list[Rule] (PRESERVE) |
DeltaCompilerAgent |
4 (edit) | list[Rule] + ChangeIntent |
DeltaRuleSet |
BlastCheckerAgent |
5 (edit) | DeltaRuleSet + ImpactSurface |
bool (raises BlastCheckError) |
TestDeltaWriterAgent |
6 (edit) | DeltaRuleSet |
FailingTestSuite |
Code Intelligence (22 tools)
Formal verification, semantic analysis, runtime intelligence, mutation testing, supply chain, and quality gates — all wired as structured tool integrations, surfaced in the TUI CodeIntelligenceScreen (accessible via /ci command palette or forge tui --screen ci).
| Tool | Role | Mode | TUI |
|---|---|---|---|
bandit |
Security analysis | Hard gate | ✓ |
semgrep |
Secrets + security rules | Hard gate (p/secrets) | ✓ |
pip_audit |
CVE scanning | Hard gate — no HIGH/CRITICAL CVEs | ✓ |
pyupgrade |
Python version upgrade | Auto-fix | ✓ |
radon |
Complexity analysis | Hard gate CC > 10 | ✓ |
vulture |
Dead code detection | Hard gate ≥80% confidence | ✓ |
griffe |
API contract drift | Hard gate | ✓ |
deptry |
Dependency analysis | Missing dep = hard gate | ✓ |
cyclonedx |
SBOM generation | Artifact — CycloneDX JSON per build | ✓ |
crosshair |
Static contract proving | Hard gate — no violations | ✓ |
hypothesis |
Property-based test generation | Advisory — enriches test coverage | ✓ |
mutmut |
Mutation testing | Hard gate — score ≥ 70% | ✓ |
jedi |
Semantic callers/inference | Blast-radius analysis | ✓ |
rope |
Semantic refactoring | Safe renames/moves | ✓ |
memray |
Memory profiling | Advisory feedback | ✓ |
pyspy |
CPU sampling | Advisory feedback | ✓ |
otel |
OpenTelemetry tracing | Traces on test failure | |
pyright |
Type checking | Hard gate — wired in coder | |
ctags |
Symbol index | Navigation | |
atlas |
Schema migration | Migration files |
Skills
Skills are reusable knowledge stored in ~/.forge/skills/ and ~/.hermes/skills/. They teach agents how to use the tools — thresholds, patterns, error interpretation. They are loaded on demand per agent role.
forge --list-skills
Agent Role Skills
| Skill | Teaches |
|---|---|
reviewer |
10-gate hierarchy, evidence formatting, blocking failure strings |
test-writer |
Property-based vs example-based thinking, mutmut loop, Hypothesis patterns |
coder |
TDD discipline, minimal fix, pyupgrade + crosshair in TDD loop |
Quality Gate Skills
| Skill | Teaches |
|---|---|
bandit |
B301/B303/B310/B608 patterns, fix templates, rejection strings |
radon |
CC sources (nested conditionals, boolean expressions), refactor patterns |
vulture |
80% confidence rule, speculative code detection |
pyright |
TypedDict/Generic/Protocol patterns, type annotation discipline |
mutmut |
Survivor interpretation, test-to-kill mapping, threshold philosophy |
crosshair |
PEP 316 contract syntax, counterexample reading |
hypothesis |
Strategy map, invariant testing, @given/@settings patterns |
pip-audit |
CVE severity mapping, auto-fix strategy |
griffe |
Drift detection, contract comparison logic |
Semantic Skills
| Skill | Teaches |
|---|---|
jedi |
find_callers/get_inference vs grep, reference fields, limitations |
rope |
rename/extract/inline vs string-replace, safe refactor pattern |
Stack Skills
| Skill | Teaches |
|---|---|
fastapi |
Project structure, route patterns, Pydantic v2, dependency injection |
postgres |
Soft deletes, audit logging, JSONB, multi-tenancy, indexing |
TUI
Launch with forge tui — full-screen Textual interface with 6 screens.
Screens
| Screen | Command | What it does |
|---|---|---|
HomeScreen |
Default | Session browser, new session, project picker |
ChatScreen |
/chat |
Free-form LLM chat with 29 tools wired |
GraphScreen |
/graph |
Directed graph execution: DAG visualizer + event stream |
ProductCompilerScreen |
/compile |
Full 11-stage intent-to-production pipeline |
CodeIntelligenceScreen |
/ci |
22 code intelligence tools, run individually or by category |
ModelPicker |
Ctrl+A |
Switch models/providers from any screen |
Home Screen
┌─────────────────────────────────────────────────────────┐
│ Home [Ctrl+P] CMDS │
├─────────────────────────────────────────────────────────┤
│ │
│ Recent Sessions │
│ > forge new "build a Stripe billing app" 2h ago │
│ > forge compile "a URL shortener" yesterday │
│ > edit "rename get_user to fetch_user" 3 days ago │
│ │
│ Model: MiniMax (minimax_openai / default) │
│ Provider: OpenAI-compatible · Streaming: ✓ · Tools: ✓ │
│ │
│ [Ctrl+A] Model [Ctrl+P] Commands [Enter] New │
└─────────────────────────────────────────────────────────┘
Chat Screen
┌─────────────────────────────────────────────────────────┐
│ Chat — forge new: Stripe billing app [Ctrl+P] │
├────────────────────────────────────┬────────────────────┤
│ │ Tools │
│ > build a Stripe billing app │ [base] 7 tools │
│ │ [code] 22 tools │
│ ✓ Plan created: 3 files │ │
│ ✓ Tests passing │ Sessions: 4 │
│ ✓ 847 tokens · $0.02 │ Tokens: 12.4K │
│ │ Cost: $0.31 │
│ [Streaming tokens...] │ │
├────────────────────────────────────┴────────────────────┤
│ > _ │
└─────────────────────────────────────────────────────────┘
29 tools available in Chat: bash, read, write, search, grep, glob, patch (7 base) + bandit, radon_cc, vulture, semgrep, pip_audit, pyupgrade, deptry, cyclonedx_sbom, jedi_find_callers, jedi_infer, jedi_complete, jedi_signatures, rope_rename, rope_extract, rope_inline, rope_move, crosshair, hypothesis, mutmut, griffe, memray, pyspy (22 code intel).
Session persistence: restore any past session, token/cost tracking per session.
Graph Screen
┌─────────────────────────────────────────────────────────┐
│ Graph — forge new [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│ │
│ DAG Ver: 3 ① 1 │
│ ─── │
│ orchestrator [*] ← active │
│ │ │
│ spec_gen [ ] │
│ │ │
│ executor [✓] │
│ │ │
│ review_gate [✓] PASS │
│ │
├─────────────────────────────────────────────────────────┤
│ Event Stream │
│ ─────────── │
│ 10:42:01 orchestrator › spec_gen › executor │
│ 10:42:03 executor › review_gate › PASS │
│ 10:42:04 orchestrator › done │
├─────────────────────────────────────────────────────────┤
│ [Enter] Start [Ctrl+R] Reset [Ctrl+V] Spec [Esc] │
└─────────────────────────────────────────────────────────┘
Code Intelligence Screen
┌─────────────────────────────────────────────────────────┐
│ Code Intelligence [Ctrl+P]│
├──────────────────┬──────────────────┬──────────────────┤
│ 🔒 Security │ Results │ Summary │
│ [Bandit ] │ ─────────────── │ ──────────────── │
│ [Semgrep ] │ HIGH: 2 issues │ Last run: 10:44 │
│ [pip-audit ] │ │ Pass: 6 Fail: 4 │
│ [pyupgrade ] │ src/api.py:42 │ Total: 847 lines │
│ │ B301: pickl… │ CC avg: 4.2 │
│ 📊 Complexity │ │ Vulture: 94% conf │
│ [radon_cc ] │ src/auth.py:18 │ │
│ [vulture ] │ CC = 14 ⚠ │ [Export Markdown] │
│ [deptry ] │ │ │
│ │ MEDIUM: 1 issue │ │
│ 📦 Dependency │ src/billing.py │ │
│ [cyclonedx ] │ missing dep │ │
│ │ │ │
├──────────────────┴──────────────────┴──────────────────┤
│ [Ctrl+R] Run selected [Ctrl+G] Run grid [Esc] Back │
└─────────────────────────────────────────────────────────┘
Tool categories:
- 🔒 Security: Bandit, Semgrep, pip-audit, pyupgrade
- 📊 Complexity: Radon CC, Vulture, Deptry
- 📦 Dependency: CycloneDX SBOM
- 🧠 Semantic: Jedi (find callers, infer, complete, signatures) + Rope (rename, extract, inline, move)
- 🔬 Advanced: Crosshair, Hypothesis, Mutmut, Griffe, Memray, py-spy
Each tool shows: installed status, PASS/FAIL badge, severity-sorted results with file:line references. Full output exported to ~/.forge/ci-report-*.md.
Product Compiler Screen
┌─────────────────────────────────────────────────────────┐
│ Compile — URL shortener [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│ │
│ Stage 5/11: Schema Designer │
│ ──────────────────────────────────────── │
│ │
│ PostgreSQL schema — 3 tables │
│ │
│ urls(id, original_url, short_code PK, click_count, │
│ created_at, expires_at, user_id FK) │
│ │
│ users(id, email, plan, created_at) │
│ │
│ analytics(id, url_id FK, ip, user_agent, clicked_at) │
│ │
├─────────────────────────────────────────────────────────┤
│ [Ctrl+G] Gate Approve [Ctrl+R] Gate Reject [Esc] │
└─────────────────────────────────────────────────────────┘
11 stages with 2 human gates wired into the TUI (Gate 1: after rules, Gate 2: after review).
Keybindings (all screens)
| Key | Action |
|---|---|
Ctrl+A |
Model picker |
Ctrl+P |
Command palette (fuzzy search) |
Ctrl+G |
Gate approve |
Ctrl+R |
Gate reject / reset graph |
Ctrl+C |
Cancel running task |
Escape |
Back / home |
Enter |
Start graph run (GraphScreen) |
Ctrl+V |
View spec (GraphScreen) |
Ctrl+E |
View executor output (GraphScreen) |
Ctrl+K |
Clear results (CodeIntelligenceScreen) |
LLM Providers
| Provider | SDK | Streaming | Tools |
|---|---|---|---|
| OpenAI | openai |
✓ | ✓ |
| Anthropic | anthropic |
✓ | ✓ |
| MiniMax (OpenAI compat) | openai + HTTP |
✓ | ✓ |
| DeepSeek | openai + base_url |
✓ | ✓ |
| Qwen | openai + base_url |
✓ | ✓ |
| Groq | openai + base_url |
✓ | ✓ |
| Ollama | openai + localhost |
✓ | ✓ |
Config ~/.forge/config.yaml:
provider: minimax_openai
model: default
api_key: ...
base_url: ... # optional, for proxy/custom endpoints
Runtime override:
forge new "my idea" --provider deepseek --model deepseek-chat
forge compile "API" --model-routing "coder=anthropic/claude-sonnet-4"
Run Tests
cd ~/forge
pytest tests/ -v
Project Status
| Version | Highlights |
|---|---|
| v0.9 ✅ | Full TUI power wired: 6 screens (Home, Chat, Graph, Compile, CI, ModelPicker), GraphScreen DAG visualizer, 29-tool ChatScreen, CodeIntelligenceScreen (22 tools, 5 categories), ProductCompilerScreen gate wiring, session persistence, streaming, token/cost tracking |
| v0.8 ✅ | 10 hard gates, 20 code intelligence tools, edit pipeline (6 stages), 17 skills, jedi/rope semantic layer, mutation testing, supply chain SBOM, OpenTelemetry tracing |
| v0.7 ✅ | Product Compiler: 9-agent pipeline, 2 human gates, E2B sandbox, NATS messaging, full TUI wire |
| v0.6 ✅ | SDK migration: openai + anthropic Python SDKs, Textual TUI, model picker, command palette |
| v0.5 ✅ | Surgical patch editing, git-aware FileService, LSP integration, exponential backoff |
| v0.4 ✅ | Subagent delegation, MCP session management, structured pytest failure parsing |
| v0.3 ✅ | Multi-session state restore, TDD-first executor, skill auto-loading |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forge_tui-0.9.2.tar.gz.
File metadata
- Download URL: forge_tui-0.9.2.tar.gz
- Upload date:
- Size: 346.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
298abac21eb0c2ea42c713c6a1dafceeccd7d62b44497c9faa8f4f6d30a4b433
|
|
| MD5 |
56fe879282701c5211e8a209ac94654b
|
|
| BLAKE2b-256 |
f84fdf9b1bf8dfeda50e5303e2a9b45f7bb93f4db239d9a8cfc3fb47ca714562
|
File details
Details for the file forge_tui-0.9.2-py3-none-any.whl.
File metadata
- Download URL: forge_tui-0.9.2-py3-none-any.whl
- Upload date:
- Size: 250.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c1d37f077c130ff7a9bea23ec07db97495e92882425f5357d93f4ad8727793b
|
|
| MD5 |
3898ba5b8e9be50d16e3668f7c8c91a0
|
|
| BLAKE2b-256 |
a303a9211fa3ec684d037b1d8aa091e296722da230373a4258a2e382f16ac8b2
|