Skip to main content

Production-ready multi-agent CLI with unified memory

Project description

forge — AI Coding Agent Framework

Two modes, one CLI. Forge new → directed graph orchestration. Forge compile → intent-to-production pipeline with formal verification, mutation testing, and runtime profiling.


Install

Via npm (recommended)

npm install -g @tearbin/forge
forge --version
forge setup  # interactive: pick provider, enter API key, test

Via npx (no install)

npx @tearbin/forge

Via pip (coming soon — package name claim in progress)

pip install forge-cli   # ← available once the PyPI package name is resolved
forge --version
forge setup

From source

cd ~/forge
pip install -e .
forge --version
forge setup

Two Modes

Command What it does
forge new "build a Stripe billing app" Directed graph: orchestrator → spec → executor → review gate
forge compile "build a URL shortener" Pipeline: interview → rules → failing tests → code → review → integration
forge compile --edit "rename get_user" Edit pipeline: blast radius → extract rules → delta → test delta

Both modes share the same LLM backend, TUI, skills system, and memory model.


forge new — Directed Graph Orchestration

User Prompt
    ↓
Orchestrator       ← decides: new or continue, handles failures/escalation
    ↓
Spec Generator     ← asks clarifying Qs, writes SPEC.md (git-tracked, append-only)
    ↓
Executor           ← TDD: tests first, then implementation, against SPEC.md
    ↓
Review Gate        ← checkpoint: pass → done, fail → retry executor
    ↓
Orchestrator       ← done or escalate to user

Memory Model (4-tier, per-project SQLite)

Tier Scope Duration Storage
short Current task Session In-memory dict
mid Current project Days ~/.forge/projects/<id>/memory.db
episodic Across sessions Weeks Same DB
long Cross-project Months ~/.forge/memory/

Every spec write is a new version row (never overwrite). forge continue loads the latest and resumes from orchestrator state.

Failure Lanes (explicit)

Every node can emit:

  • done → follow normal edge
  • blocked → review_gate emits review_fail → executor retries with reason
  • failed → orchestrator decides: retry / escalate / block

forge compile — Intent to Production Pipeline

Describe what you want in plain English. Forge interviews you, writes failing tests from formal rules, writes the code to pass them, runs 10 parallel static analysis gates, and integrates everything into a runnable project.

Intent (plain English)
    ↓
Stage 1: Interviewer         adversarial Q&A — extracts entities, relationships, rules
    ↓
Stage 2: Flow Designer        → UserFlowTree (screen-by-screen flows)
    ↓
Stage 3: Contract Writer      → APIContract[] (request/response/error for every action)
    ↓
Stage 4: Rule Compiler        → RuleSet (IF/THEN constraints, no prose)
    ↓
Gate 1: Human Review          y/n — reject sends feedback back to Rule Compiler
    ↓
Stage 5: Schema Designer      → DatabaseSchema (PostgreSQL — tables, FKs, indexes, audit columns)
    ↓
Stage 6: Test Writer          → FailingTestSuite (one test per rule) + Hypothesis strategies
    ↓
Stage 7: Mutation Testing     ← mutmut — hard gate if score < 70%
    ↓
Stage 8: Coder                → passing code (pyupgrade + pyright + crosshair in TDD loop)
    ↓
Stage 9: Integrator           → requirements.txt + supply chain scan + SBOM
    ↓
Stage 10: Reviewer            → 10 parallel gates: bandit, semgrep, radon, vulture, griffe,
    │                            deptry, pyright, mutmut, pip-audit, crosshair
    ↓
Gate 2: Human Review          y/n — reject sends feedback back to Coder
    ↓
Stage 11: Profiling           ← memray (memory) + pyspy (CPU) — advisory feedback
    ↓
Done: project_dir/

The 10 Hard Gates

Every gate blocks the pipeline. No LLM synthesis happens until all gates pass.

# Gate Tool Threshold
1 Security vulnerabilities bandit HIGH/CRITICAL → BLOCK
2 Hardcoded secrets semgrep (p/secrets) HIGH → BLOCK
3 Complexity radon CC > 10 → BLOCK
4 Dead code vulture ≥80% confidence → BLOCK
5 Contract drift griffe Any drift → BLOCK
6 Missing dependencies deptry (DEP001) Missing dep → BLOCK
7 Type errors pyright Any error → BLOCK
8 Test quality mutmut Score < 70% → BLOCK
9 Supply chain CVEs pip-audit HIGH/CRITICAL → BLOCK
10 Contract violations crosshair Any violation → BLOCK

CLI

forge compile "build a URL shortener with analytics"
forge compile "a REST API" --auto-approve
forge compile "a blog" --test-prompt "posts" --test-prompt "auth"
forge compile "a service" --sandbox              # E2B sandbox isolation
forge compile "a service" --nats-url nats://localhost:4222
forge compile "a blog" --workdir ./myproject

Edit Pipeline — Change Existing Code

forge compile --edit "add OAuth to the API" --workdir ./myproject
forge compile --edit "rename get_user to fetch_user" --workdir ./myproject
forge compile --edit "change beta user discount from 20% to 30%" --workdir ./myproject
Stage 1: ChangeIntent          ← what is changing, what must not
    ↓
Stage 2: ImpactAnalyst        ← jedi blast-radius: all callers of changed function
    ↓
Stage 3: RuleExtractor        ← extract IF/THEN rules from source by observation
    ↓
Stage 4: DeltaCompiler        ← PRESERVE / CHANGE / NEW / REMOVE per rule
    ↓
Stage 5: BlastChecker         ← verify delta stays within declared blast radius
    ↓
Stage 6: TestDeltaWriter      ← failing tests for CHANGE rules, regression for PRESERVE
    ↓
    Coder (edit-mode TDD loop)

The BlastChecker raises BlastCheckError if the delta touches files outside the declared impact surface. The user must approve an expanded scope or narrow the change.


Agents

Agent Stage Input Output
InterviewerAgent 1 Plain intent IntentDocument
FlowDesignerAgent 2 IntentDocument UserFlowTree
ContractWriterAgent 3 UserFlowTree list[APIContract]
RuleCompilerAgent 4 list[APIContract] + UserFlowTree RuleSet
SchemaDesignerAgent 5 IntentDocument DatabaseSchema
TestWriterAgent 6 RuleSet FailingTestSuite + Hypothesis strategies
CoderAgent 8 FailingTestSuite + RuleSet Code files + passing tests
IntegratorAgent 9 All of the above requirements.txt + SBOM + supply chain
ReviewerAgent 10 RuleSet + project files ReviewResult (violations + LLM synthesis)
ImpactAnalystAgent 2 (edit) ChangeIntent + codebase ImpactSurface (jedi blast-radius)
RuleExtractorAgent 3 (edit) ImpactSurface + source list[Rule] (PRESERVE)
DeltaCompilerAgent 4 (edit) list[Rule] + ChangeIntent DeltaRuleSet
BlastCheckerAgent 5 (edit) DeltaRuleSet + ImpactSurface bool (raises BlastCheckError)
TestDeltaWriterAgent 6 (edit) DeltaRuleSet FailingTestSuite

Code Intelligence (22 tools)

Formal verification, semantic analysis, runtime intelligence, mutation testing, supply chain, and quality gates — all wired as structured tool integrations, surfaced in the TUI CodeIntelligenceScreen (accessible via /ci command palette or forge tui --screen ci).

Tool Role Mode TUI
bandit Security analysis Hard gate
semgrep Secrets + security rules Hard gate (p/secrets)
pip_audit CVE scanning Hard gate — no HIGH/CRITICAL CVEs
pyupgrade Python version upgrade Auto-fix
radon Complexity analysis Hard gate CC > 10
vulture Dead code detection Hard gate ≥80% confidence
griffe API contract drift Hard gate
deptry Dependency analysis Missing dep = hard gate
cyclonedx SBOM generation Artifact — CycloneDX JSON per build
crosshair Static contract proving Hard gate — no violations
hypothesis Property-based test generation Advisory — enriches test coverage
mutmut Mutation testing Hard gate — score ≥ 70%
jedi Semantic callers/inference Blast-radius analysis
rope Semantic refactoring Safe renames/moves
memray Memory profiling Advisory feedback
pyspy CPU sampling Advisory feedback
otel OpenTelemetry tracing Traces on test failure
pyright Type checking Hard gate — wired in coder
ctags Symbol index Navigation
atlas Schema migration Migration files

Skills

Skills are reusable knowledge stored in ~/.forge/skills/ and ~/.hermes/skills/. They teach agents how to use the tools — thresholds, patterns, error interpretation. They are loaded on demand per agent role.

forge --list-skills

Agent Role Skills

Skill Teaches
reviewer 10-gate hierarchy, evidence formatting, blocking failure strings
test-writer Property-based vs example-based thinking, mutmut loop, Hypothesis patterns
coder TDD discipline, minimal fix, pyupgrade + crosshair in TDD loop

Quality Gate Skills

Skill Teaches
bandit B301/B303/B310/B608 patterns, fix templates, rejection strings
radon CC sources (nested conditionals, boolean expressions), refactor patterns
vulture 80% confidence rule, speculative code detection
pyright TypedDict/Generic/Protocol patterns, type annotation discipline
mutmut Survivor interpretation, test-to-kill mapping, threshold philosophy
crosshair PEP 316 contract syntax, counterexample reading
hypothesis Strategy map, invariant testing, @given/@settings patterns
pip-audit CVE severity mapping, auto-fix strategy
griffe Drift detection, contract comparison logic

Semantic Skills

Skill Teaches
jedi find_callers/get_inference vs grep, reference fields, limitations
rope rename/extract/inline vs string-replace, safe refactor pattern

Stack Skills

Skill Teaches
fastapi Project structure, route patterns, Pydantic v2, dependency injection
postgres Soft deletes, audit logging, JSONB, multi-tenancy, indexing

TUI

Launch with forge tui — full-screen Textual interface with 6 screens.

Screens

Screen Command What it does
HomeScreen Default Session browser, new session, project picker
ChatScreen /chat Free-form LLM chat with 29 tools wired
GraphScreen /graph Directed graph execution: DAG visualizer + event stream
ProductCompilerScreen /compile Full 11-stage intent-to-production pipeline
CodeIntelligenceScreen /ci 22 code intelligence tools, run individually or by category
ModelPicker Ctrl+A Switch models/providers from any screen

Home Screen

┌─────────────────────────────────────────────────────────┐
│  Home                                    [Ctrl+P] CMDS │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Recent Sessions                                        │
│  > forge new "build a Stripe billing app"   2h ago      │
│  > forge compile "a URL shortener"          yesterday   │
│  > edit "rename get_user to fetch_user"    3 days ago   │
│                                                         │
│  Model: MiniMax (minimax_openai / default)              │
│  Provider: OpenAI-compatible · Streaming: ✓ · Tools: ✓  │
│                                                         │
│  [Ctrl+A] Model   [Ctrl+P] Commands   [Enter] New       │
└─────────────────────────────────────────────────────────┘

Chat Screen

┌─────────────────────────────────────────────────────────┐
│  Chat — forge new: Stripe billing app          [Ctrl+P]  │
├────────────────────────────────────┬────────────────────┤
│                                    │ Tools             │
│ > build a Stripe billing app       │ [base] 7 tools    │
│                                    │ [code] 22 tools   │
│ ✓ Plan created: 3 files           │                   │
│ ✓ Tests passing                   │ Sessions: 4      │
│ ✓ 847 tokens · $0.02              │ Tokens: 12.4K     │
│                                    │ Cost: $0.31      │
│ [Streaming tokens...]              │                   │
├────────────────────────────────────┴────────────────────┤
│ > _                                                     │
└─────────────────────────────────────────────────────────┘

29 tools available in Chat: bash, read, write, search, grep, glob, patch (7 base) + bandit, radon_cc, vulture, semgrep, pip_audit, pyupgrade, deptry, cyclonedx_sbom, jedi_find_callers, jedi_infer, jedi_complete, jedi_signatures, rope_rename, rope_extract, rope_inline, rope_move, crosshair, hypothesis, mutmut, griffe, memray, pyspy (22 code intel).

Session persistence: restore any past session, token/cost tracking per session.

Graph Screen

┌─────────────────────────────────────────────────────────┐
│  Graph — forge new                              [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  DAG                                    Ver: 3   ① 1   │
│  ───                                                    │
│  orchestrator  [*] ← active                            │
│       │                                                │
│  spec_gen      [ ]                                      │
│       │                                                │
│  executor      [✓]                                      │
│       │                                                │
│  review_gate   [✓]  PASS                                │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  Event Stream                                           │
│  ───────────                                            │
│  10:42:01  orchestrator › spec_gen › executor           │
│  10:42:03  executor › review_gate › PASS               │
│  10:42:04  orchestrator › done                          │
├─────────────────────────────────────────────────────────┤
│  [Enter] Start   [Ctrl+R] Reset   [Ctrl+V] Spec   [Esc] │
└─────────────────────────────────────────────────────────┘

Code Intelligence Screen

┌─────────────────────────────────────────────────────────┐
│  Code Intelligence                              [Ctrl+P]│
├──────────────────┬──────────────────┬──────────────────┤
│ 🔒 Security      │ Results          │ Summary           │
│ [Bandit       ]  │ ─────────────── │ ──────────────── │
│ [Semgrep      ]  │ HIGH: 2 issues  │ Last run: 10:44   │
│ [pip-audit    ]  │                  │ Pass: 6  Fail: 4  │
│ [pyupgrade    ]  │ src/api.py:42   │ Total: 847 lines  │
│                  │   B301: pickl…  │ CC avg: 4.2       │
│ 📊 Complexity    │                  │ Vulture: 94% conf │
│ [radon_cc     ]  │ src/auth.py:18  │                   │
│ [vulture      ]  │   CC = 14 ⚠     │ [Export Markdown] │
│ [deptry       ]  │                  │                   │
│                  │ MEDIUM: 1 issue  │                   │
│ 📦 Dependency    │ src/billing.py   │                   │
│ [cyclonedx    ]  │   missing dep   │                   │
│                  │                  │                   │
├──────────────────┴──────────────────┴──────────────────┤
│ [Ctrl+R] Run selected   [Ctrl+G] Run grid   [Esc] Back │
└─────────────────────────────────────────────────────────┘

Tool categories:

  • 🔒 Security: Bandit, Semgrep, pip-audit, pyupgrade
  • 📊 Complexity: Radon CC, Vulture, Deptry
  • 📦 Dependency: CycloneDX SBOM
  • 🧠 Semantic: Jedi (find callers, infer, complete, signatures) + Rope (rename, extract, inline, move)
  • 🔬 Advanced: Crosshair, Hypothesis, Mutmut, Griffe, Memray, py-spy

Each tool shows: installed status, PASS/FAIL badge, severity-sorted results with file:line references. Full output exported to ~/.forge/ci-report-*.md.

Product Compiler Screen

┌─────────────────────────────────────────────────────────┐
│  Compile — URL shortener                        [Ctrl+P]│
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Stage 5/11: Schema Designer                           │
│  ────────────────────────────────────────              │
│                                                         │
│  PostgreSQL schema — 3 tables                          │
│                                                         │
│  urls(id, original_url, short_code PK, click_count,     │
│       created_at, expires_at, user_id FK)              │
│                                                         │
│  users(id, email, plan, created_at)                     │
│                                                         │
│  analytics(id, url_id FK, ip, user_agent, clicked_at)  │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  [Ctrl+G] Gate Approve   [Ctrl+R] Gate Reject   [Esc]   │
└─────────────────────────────────────────────────────────┘

11 stages with 2 human gates wired into the TUI (Gate 1: after rules, Gate 2: after review).

Keybindings (all screens)

Key Action
Ctrl+A Model picker
Ctrl+P Command palette (fuzzy search)
Ctrl+G Gate approve
Ctrl+R Gate reject / reset graph
Ctrl+C Cancel running task
Escape Back / home
Enter Start graph run (GraphScreen)
Ctrl+V View spec (GraphScreen)
Ctrl+E View executor output (GraphScreen)
Ctrl+K Clear results (CodeIntelligenceScreen)

LLM Providers

Provider SDK Streaming Tools
OpenAI openai
Anthropic anthropic
MiniMax (OpenAI compat) openai + HTTP
DeepSeek openai + base_url
Qwen openai + base_url
Groq openai + base_url
Ollama openai + localhost

Config ~/.forge/config.yaml:

provider: minimax_openai
model: default
api_key: ...
base_url: ...   # optional, for proxy/custom endpoints

Runtime override:

forge new "my idea" --provider deepseek --model deepseek-chat
forge compile "API" --model-routing "coder=anthropic/claude-sonnet-4"

Run Tests

cd ~/forge
pytest tests/ -v

Project Status

Version Highlights
v0.9 ✅ Full TUI power wired: 6 screens (Home, Chat, Graph, Compile, CI, ModelPicker), GraphScreen DAG visualizer, 29-tool ChatScreen, CodeIntelligenceScreen (22 tools, 5 categories), ProductCompilerScreen gate wiring, session persistence, streaming, token/cost tracking
v0.8 ✅ 10 hard gates, 20 code intelligence tools, edit pipeline (6 stages), 17 skills, jedi/rope semantic layer, mutation testing, supply chain SBOM, OpenTelemetry tracing
v0.7 ✅ Product Compiler: 9-agent pipeline, 2 human gates, E2B sandbox, NATS messaging, full TUI wire
v0.6 ✅ SDK migration: openai + anthropic Python SDKs, Textual TUI, model picker, command palette
v0.5 ✅ Surgical patch editing, git-aware FileService, LSP integration, exponential backoff
v0.4 ✅ Subagent delegation, MCP session management, structured pytest failure parsing
v0.3 ✅ Multi-session state restore, TDD-first executor, skill auto-loading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_tui-0.9.5.tar.gz (346.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forge_tui-0.9.5-py3-none-any.whl (250.1 kB view details)

Uploaded Python 3

File details

Details for the file forge_tui-0.9.5.tar.gz.

File metadata

  • Download URL: forge_tui-0.9.5.tar.gz
  • Upload date:
  • Size: 346.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for forge_tui-0.9.5.tar.gz
Algorithm Hash digest
SHA256 52deb093651aeb188d9ff1f442cac499edcb8b2e6491505c19a2e13b81fe52f9
MD5 9b5ac1d958b7a98855da98884c86ca9e
BLAKE2b-256 d9ff4c9e1c1a029b8832cb8b7513937e7eb91fb65184d445749af4563aa6962f

See more details on using hashes here.

File details

Details for the file forge_tui-0.9.5-py3-none-any.whl.

File metadata

  • Download URL: forge_tui-0.9.5-py3-none-any.whl
  • Upload date:
  • Size: 250.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for forge_tui-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cfd6cf98d55682f6f41d731b0a3c5fd677571e9c8e2f3033df5395af307699d9
MD5 d42de7e7f1cc6b2cdb1596a371e4d4c8
BLAKE2b-256 a36597309ef7d56d5273501233064ec0060b177270c5b6fd446f0e9a4def3c8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page