Skip to main content

Production-ready multi-agent CLI with unified memory

Project description

forge — AI Coding Agent Framework

Two modes, one CLI. Forge new → directed graph orchestration. Forge compile → intent-to-production pipeline with formal verification, mutation testing, and runtime profiling.


Install

Via npm (recommended)

npm install -g @tearbin/forge
forge --version
forge setup  # interactive: pick provider, enter API key, test

Via npx (no install)

npx @tearbin/forge

Via pip (coming soon — package name claim in progress)

pip install forge-cli   # ← available once the PyPI package name is resolved
forge --version
forge setup

From source

cd ~/forge
pip install -e .
forge --version
forge setup

Two Modes

Command What it does
forge new "build a Stripe billing app" Directed graph: orchestrator → spec → executor → review gate
forge compile "build a URL shortener" Pipeline: interview → rules → failing tests → code → review → integration
forge compile --edit "rename get_user" Edit pipeline: blast radius → extract rules → delta → test delta

Both modes share the same LLM backend, TUI, skills system, and memory model.


forge new — Directed Graph Orchestration

User Prompt
    ↓
Orchestrator       ← decides: new or continue, handles failures/escalation
    ↓
Spec Generator     ← asks clarifying Qs, writes SPEC.md (git-tracked, append-only)
    ↓
Executor           ← TDD: tests first, then implementation, against SPEC.md
    ↓
Review Gate        ← checkpoint: pass → done, fail → retry executor
    ↓
Orchestrator       ← done or escalate to user

Memory Model (4-tier, per-project SQLite)

Tier Scope Duration Storage
short Current task Session In-memory dict
mid Current project Days ~/.forge/projects/<id>/memory.db
episodic Across sessions Weeks Same DB
long Cross-project Months ~/.forge/memory/

Every spec write is a new version row (never overwrite). forge continue loads the latest and resumes from orchestrator state.

Failure Lanes (explicit)

Every node can emit:

  • done → follow normal edge
  • blocked → review_gate emits review_fail → executor retries with reason
  • failed → orchestrator decides: retry / escalate / block

forge compile — Intent to Production Pipeline

Describe what you want in plain English. Forge interviews you, writes failing tests from formal rules, writes the code to pass them, runs 10 parallel static analysis gates, and integrates everything into a runnable project.

Intent (plain English)
    ↓
Stage 1: Interviewer         adversarial Q&A — extracts entities, relationships, rules
    ↓
Stage 2: Flow Designer        → UserFlowTree (screen-by-screen flows)
    ↓
Stage 3: Contract Writer      → APIContract[] (request/response/error for every action)
    ↓
Stage 4: Rule Compiler        → RuleSet (IF/THEN constraints, no prose)
    ↓
Gate 1: Human Review          y/n — reject sends feedback back to Rule Compiler
    ↓
Stage 5: Schema Designer      → DatabaseSchema (PostgreSQL — tables, FKs, indexes, audit columns)
    ↓
Stage 6: Test Writer          → FailingTestSuite (one test per rule) + Hypothesis strategies
    ↓
Stage 7: Mutation Testing     ← mutmut — hard gate if score < 70%
    ↓
Stage 8: Coder                → passing code (pyupgrade + pyright + crosshair in TDD loop)
    ↓
Stage 9: Integrator           → requirements.txt + supply chain scan + SBOM
    ↓
Stage 10: Reviewer            → 10 parallel gates: bandit, semgrep, radon, vulture, griffe,
    │                            deptry, pyright, mutmut, pip-audit, crosshair
    ↓
Gate 2: Human Review          y/n — reject sends feedback back to Coder
    ↓
Stage 11: Profiling           ← memray (memory) + pyspy (CPU) — advisory feedback
    ↓
Done: project_dir/

The 10 Hard Gates

Every gate blocks the pipeline. No LLM synthesis happens until all gates pass.

# Gate Tool Threshold
1 Security vulnerabilities bandit HIGH/CRITICAL → BLOCK
2 Hardcoded secrets semgrep (p/secrets) HIGH → BLOCK
3 Complexity radon CC > 10 → BLOCK
4 Dead code vulture ≥80% confidence → BLOCK
5 Contract drift griffe Any drift → BLOCK
6 Missing dependencies deptry (DEP001) Missing dep → BLOCK
7 Type errors pyright Any error → BLOCK
8 Test quality mutmut Score < 70% → BLOCK
9 Supply chain CVEs pip-audit HIGH/CRITICAL → BLOCK
10 Contract violations crosshair Any violation → BLOCK

CLI

forge compile "build a URL shortener with analytics"
forge compile "a REST API" --auto-approve
forge compile "a blog" --test-prompt "posts" --test-prompt "auth"
forge compile "a service" --sandbox              # E2B sandbox isolation
forge compile "a service" --nats-url nats://localhost:4222
forge compile "a blog" --workdir ./myproject

Edit Pipeline — Change Existing Code

forge compile --edit "add OAuth to the API" --workdir ./myproject
forge compile --edit "rename get_user to fetch_user" --workdir ./myproject
forge compile --edit "change beta user discount from 20% to 30%" --workdir ./myproject
Stage 1: ChangeIntent          ← what is changing, what must not
    ↓
Stage 2: ImpactAnalyst        ← jedi blast-radius: all callers of changed function
    ↓
Stage 3: RuleExtractor        ← extract IF/THEN rules from source by observation
    ↓
Stage 4: DeltaCompiler        ← PRESERVE / CHANGE / NEW / REMOVE per rule
    ↓
Stage 5: BlastChecker         ← verify delta stays within declared blast radius
    ↓
Stage 6: TestDeltaWriter      ← failing tests for CHANGE rules, regression for PRESERVE
    ↓
    Coder (edit-mode TDD loop)

The BlastChecker raises BlastCheckError if the delta touches files outside the declared impact surface. The user must approve an expanded scope or narrow the change.


Agents

Agent Stage Input Output
InterviewerAgent 1 Plain intent IntentDocument
FlowDesignerAgent 2 IntentDocument UserFlowTree
ContractWriterAgent 3 UserFlowTree list[APIContract]
RuleCompilerAgent 4 list[APIContract] + UserFlowTree RuleSet
SchemaDesignerAgent 5 IntentDocument DatabaseSchema
TestWriterAgent 6 RuleSet FailingTestSuite + Hypothesis strategies
CoderAgent 8 FailingTestSuite + RuleSet Code files + passing tests
IntegratorAgent 9 All of the above requirements.txt + SBOM + supply chain
ReviewerAgent 10 RuleSet + project files ReviewResult (violations + LLM synthesis)
ImpactAnalystAgent 2 (edit) ChangeIntent + codebase ImpactSurface (jedi blast-radius)
RuleExtractorAgent 3 (edit) ImpactSurface + source list[Rule] (PRESERVE)
DeltaCompilerAgent 4 (edit) list[Rule] + ChangeIntent DeltaRuleSet
BlastCheckerAgent 5 (edit) DeltaRuleSet + ImpactSurface bool (raises BlastCheckError)
TestDeltaWriterAgent 6 (edit) DeltaRuleSet FailingTestSuite

Code Intelligence (22 tools)

Formal verification, semantic analysis, runtime intelligence, mutation testing, supply chain, and quality gates — all wired as structured tool integrations, surfaced in the TUI CodeIntelligenceScreen (accessible via /ci command palette or forge tui --screen ci).

Tool Role Mode TUI
bandit Security analysis Hard gate
semgrep Secrets + security rules Hard gate (p/secrets)
pip_audit CVE scanning Hard gate — no HIGH/CRITICAL CVEs
pyupgrade Python version upgrade Auto-fix
radon Complexity analysis Hard gate CC > 10
vulture Dead code detection Hard gate ≥80% confidence
griffe API contract drift Hard gate
deptry Dependency analysis Missing dep = hard gate
cyclonedx SBOM generation Artifact — CycloneDX JSON per build
crosshair Static contract proving Hard gate — no violations
hypothesis Property-based test generation Advisory — enriches test coverage
mutmut Mutation testing Hard gate — score ≥ 70%
jedi Semantic callers/inference Blast-radius analysis
rope Semantic refactoring Safe renames/moves
memray Memory profiling Advisory feedback
pyspy CPU sampling Advisory feedback
otel OpenTelemetry tracing Traces on test failure
pyright Type checking Hard gate — wired in coder
ctags Symbol index Navigation
atlas Schema migration Migration files

Skills

Skills are reusable knowledge stored in ~/.forge/skills/ and ~/.hermes/skills/. They teach agents how to use the tools — thresholds, patterns, error interpretation. They are loaded on demand per agent role.

forge --list-skills

Agent Role Skills

Skill Teaches
reviewer 10-gate hierarchy, evidence formatting, blocking failure strings
test-writer Property-based vs example-based thinking, mutmut loop, Hypothesis patterns
coder TDD discipline, minimal fix, pyupgrade + crosshair in TDD loop

Quality Gate Skills

Skill Teaches
bandit B301/B303/B310/B608 patterns, fix templates, rejection strings
radon CC sources (nested conditionals, boolean expressions), refactor patterns
vulture 80% confidence rule, speculative code detection
pyright TypedDict/Generic/Protocol patterns, type annotation discipline
mutmut Survivor interpretation, test-to-kill mapping, threshold philosophy
crosshair PEP 316 contract syntax, counterexample reading
hypothesis Strategy map, invariant testing, @given/@settings patterns
pip-audit CVE severity mapping, auto-fix strategy
griffe Drift detection, contract comparison logic

Semantic Skills

Skill Teaches
jedi find_callers/get_inference vs grep, reference fields, limitations
rope rename/extract/inline vs string-replace, safe refactor pattern

Stack Skills

Skill Teaches
fastapi Project structure, route patterns, Pydantic v2, dependency injection
postgres Soft deletes, audit logging, JSONB, multi-tenancy, indexing

TUI

Launch with forge tui — full-screen Textual interface with 6 screens.

Screens

Screen Command What it does
HomeScreen Default Session browser, new session, project picker
ChatScreen /chat Free-form LLM chat with 29 tools wired
GraphScreen /graph Directed graph execution: DAG visualizer + event stream
ProductCompilerScreen /compile Full 11-stage intent-to-production pipeline
CodeIntelligenceScreen /ci 22 code intelligence tools, run individually or by category
ModelPicker Ctrl+A Switch models/providers from any screen

Home Screen

┌─────────────────────────────────────────────────────────┐
│  Home                                    [Ctrl+P] CMDS │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Recent Sessions                                        │
│  > forge new "build a Stripe billing app"   2h ago      │
│  > forge compile "a URL shortener"          yesterday   │
│  > edit "rename get_user to fetch_user"    3 days ago   │
│                                                         │
│  Model: MiniMax (minimax_openai / default)              │
│  Provider: OpenAI-compatible · Streaming: ✓ · Tools: ✓  │
│                                                         │
│  [Ctrl+A] Model   [Ctrl+P] Commands   [Enter] New       │
└─────────────────────────────────────────────────────────┘

Chat Screen

┌─────────────────────────────────────────────────────────┐
│  Chat — forge new: Stripe billing app          [Ctrl+P]  │
├────────────────────────────────────┬────────────────────┤
│                                    │ Tools             │
│ > build a Stripe billing app       │ [base] 7 tools    │
│                                    │ [code] 22 tools   │
│ ✓ Plan created: 3 files           │                   │
│ ✓ Tests passing                   │ Sessions: 4      │
│ ✓ 847 tokens · $0.02              │ Tokens: 12.4K     │
│                                    │ Cost: $0.31      │
│ [Streaming tokens...]              │                   │
├────────────────────────────────────┴────────────────────┤
│ > _                                                     │
└─────────────────────────────────────────────────────────┘

29 tools available in Chat: bash, read, write, search, grep, glob, patch (7 base) + bandit, radon_cc, vulture, semgrep, pip_audit, pyupgrade, deptry, cyclonedx_sbom, jedi_find_callers, jedi_infer, jedi_complete, jedi_signatures, rope_rename, rope_extract, rope_inline, rope_move, crosshair, hypothesis, mutmut, griffe, memray, pyspy (22 code intel).

Session persistence: restore any past session, token/cost tracking per session.

Graph Screen

┌─────────────────────────────────────────────────────────┐
│  Graph — forge new                              [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  DAG                                    Ver: 3   ① 1   │
│  ───                                                    │
│  orchestrator  [*] ← active                            │
│       │                                                │
│  spec_gen      [ ]                                      │
│       │                                                │
│  executor      [✓]                                      │
│       │                                                │
│  review_gate   [✓]  PASS                                │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  Event Stream                                           │
│  ───────────                                            │
│  10:42:01  orchestrator › spec_gen › executor           │
│  10:42:03  executor › review_gate › PASS               │
│  10:42:04  orchestrator › done                          │
├─────────────────────────────────────────────────────────┤
│  [Enter] Start   [Ctrl+R] Reset   [Ctrl+V] Spec   [Esc] │
└─────────────────────────────────────────────────────────┘

Code Intelligence Screen

┌─────────────────────────────────────────────────────────┐
│  Code Intelligence                              [Ctrl+P]│
├──────────────────┬──────────────────┬──────────────────┤
│ 🔒 Security      │ Results          │ Summary           │
│ [Bandit       ]  │ ─────────────── │ ──────────────── │
│ [Semgrep      ]  │ HIGH: 2 issues  │ Last run: 10:44   │
│ [pip-audit    ]  │                  │ Pass: 6  Fail: 4  │
│ [pyupgrade    ]  │ src/api.py:42   │ Total: 847 lines  │
│                  │   B301: pickl…  │ CC avg: 4.2       │
│ 📊 Complexity    │                  │ Vulture: 94% conf │
│ [radon_cc     ]  │ src/auth.py:18  │                   │
│ [vulture      ]  │   CC = 14 ⚠     │ [Export Markdown] │
│ [deptry       ]  │                  │                   │
│                  │ MEDIUM: 1 issue  │                   │
│ 📦 Dependency    │ src/billing.py   │                   │
│ [cyclonedx    ]  │   missing dep   │                   │
│                  │                  │                   │
├──────────────────┴──────────────────┴──────────────────┤
│ [Ctrl+R] Run selected   [Ctrl+G] Run grid   [Esc] Back │
└─────────────────────────────────────────────────────────┘

Tool categories:

  • 🔒 Security: Bandit, Semgrep, pip-audit, pyupgrade
  • 📊 Complexity: Radon CC, Vulture, Deptry
  • 📦 Dependency: CycloneDX SBOM
  • 🧠 Semantic: Jedi (find callers, infer, complete, signatures) + Rope (rename, extract, inline, move)
  • 🔬 Advanced: Crosshair, Hypothesis, Mutmut, Griffe, Memray, py-spy

Each tool shows: installed status, PASS/FAIL badge, severity-sorted results with file:line references. Full output exported to ~/.forge/ci-report-*.md.

Product Compiler Screen

┌─────────────────────────────────────────────────────────┐
│  Compile — URL shortener                        [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Stage 5/11: Schema Designer                           │
│  ────────────────────────────────────────              │
│                                                         │
│  PostgreSQL schema — 3 tables                          │
│                                                         │
│  urls(id, original_url, short_code PK, click_count,     │
│       created_at, expires_at, user_id FK)              │
│                                                         │
│  users(id, email, plan, created_at)                     │
│                                                         │
│  analytics(id, url_id FK, ip, user_agent, clicked_at)  │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  [Ctrl+G] Gate Approve   [Ctrl+R] Gate Reject   [Esc]   │
└─────────────────────────────────────────────────────────┘

11 stages with 2 human gates wired into the TUI (Gate 1: after rules, Gate 2: after review).

Keybindings (all screens)

Key Action
Ctrl+A Model picker
Ctrl+P Command palette (fuzzy search)
Ctrl+G Gate approve
Ctrl+R Gate reject / reset graph
Ctrl+C Cancel running task
Escape Back / home
Enter Start graph run (GraphScreen)
Ctrl+V View spec (GraphScreen)
Ctrl+E View executor output (GraphScreen)
Ctrl+K Clear results (CodeIntelligenceScreen)

LLM Providers

Provider SDK Streaming Tools
OpenAI openai
Anthropic anthropic
MiniMax (OpenAI compat) openai + HTTP
DeepSeek openai + base_url
Qwen openai + base_url
Groq openai + base_url
Ollama openai + localhost

Config ~/.forge/config.yaml:

provider: minimax_openai
model: default
api_key: ...
base_url: ...   # optional, for proxy/custom endpoints

Runtime override:

forge new "my idea" --provider deepseek --model deepseek-chat
forge compile "API" --model-routing "coder=anthropic/claude-sonnet-4"

Run Tests

cd ~/forge
pytest tests/ -v

Project Status

Version Highlights
v0.9 ✅ Full TUI power wired: 6 screens (Home, Chat, Graph, Compile, CI, ModelPicker), GraphScreen DAG visualizer, 29-tool ChatScreen, CodeIntelligenceScreen (22 tools, 5 categories), ProductCompilerScreen gate wiring, session persistence, streaming, token/cost tracking
v0.8 ✅ 10 hard gates, 20 code intelligence tools, edit pipeline (6 stages), 17 skills, jedi/rope semantic layer, mutation testing, supply chain SBOM, OpenTelemetry tracing
v0.7 ✅ Product Compiler: 9-agent pipeline, 2 human gates, E2B sandbox, NATS messaging, full TUI wire
v0.6 ✅ SDK migration: openai + anthropic Python SDKs, Textual TUI, model picker, command palette
v0.5 ✅ Surgical patch editing, git-aware FileService, LSP integration, exponential backoff
v0.4 ✅ Subagent delegation, MCP session management, structured pytest failure parsing
v0.3 ✅ Multi-session state restore, TDD-first executor, skill auto-loading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_tui-0.9.1.tar.gz (346.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forge_tui-0.9.1-py3-none-any.whl (250.0 kB view details)

Uploaded Python 3

File details

Details for the file forge_tui-0.9.1.tar.gz.

File metadata

  • Download URL: forge_tui-0.9.1.tar.gz
  • Upload date:
  • Size: 346.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for forge_tui-0.9.1.tar.gz
Algorithm Hash digest
SHA256 3ecc4bf26db0d3cd12e5dc68bb255a94283bf25f9a9ada2f43a59453d81188ef
MD5 d50e7e3a6e96a11b0597344bd66c40b0
BLAKE2b-256 db675a0df7ec0075f45891ffbedf514ce37a0bb94748b06a91ce79c1fc3766f5

See more details on using hashes here.

File details

Details for the file forge_tui-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: forge_tui-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 250.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for forge_tui-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31ba013f57f0819345e6a1159e00db30bee1f6fdf01c2fb6071e427345e832c0
MD5 4dca50b76bad5db228df5a8ab70d3454
BLAKE2b-256 73640aad104f234ea494749380585036dda78a263d4cf20a0d832829753442a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page