Skip to main content

Forge Codex: install-once workflow orchestrators for Cursor/Codex

Project description

forge

A Codex-native agent toolkit for structured software delivery: investigation, planning, implementation, review, testing, diagnostics, and workflow continuity across sessions.

Recent Changes

Mock Flows + Numbered Handoff Menu (2026-05-07):

  • forge:test --mode flows authors end-to-end mock flows in 4 styles (scenario / BDD / HTTP-replay / workflow-dry-run). The skill detects your project layout, recommends the best-fit flow type with a confidence score, and progressively gates 8 quality criteria across scaffold → author → execute → report phases. Run with --flow-type <type> to override the recommendation, or --framework/--entry-point/--roles to fine-tune detection. Reference templates/mock-flow-types.md for per-type details.
  • Every skill's final step now presents a numbered handoff menu instead of a single hardcoded next-skill string. Reply with "yes" / "1" / "default" or pick a numbered alternative to steer the workflow. Use scripts/smoke.py as a CI-eligible end-to-end harness for the new flows mode.

State-lifecycle and authoring fixes (2026-05-07):

  • The plan skill now materializes a section-marker skeleton at step 1 (sourced from templates/writing-plans.md) and refuses to mark step 6 complete while any <!-- FORGE_SKELETON: ... --> markers remain.
  • forge resume --cleanup lists state files eligible for cleanup (dry-run by default). Add --force to delete; --all-stale --force for migration mode (clears every state file regardless of age).
  • Re-running step 1 of any skill now aborts when an in-progress same-skill session exists. To resume, use the --state <path> flag or run resume.py.
  • Over-cap --step invocations (e.g., --step 9 on an 8-step skill) now print a friendly "skill complete" message and exit 0 instead of erroring.
  • A failure_count field tracks consecutive same-step retries; after two failures, resume.py emits an "inspect logs" hint instead of producing a third retry command.

Install (pipx) — run in any repo

This repo ships a global forge launcher so you can install once and run workflows from any target repository (without copying scripts/ into each repo).

pipx install forge-next

Then, from any target repo:

forge evaluate --step 1 --mode review
forge plan --step 1
forge status

Use --repo <path> to target a different repository root.

Quick Start (dev / contributors)

# Clone the repo
git clone https://github.com/your-org/forge-codex.git /path/to/forge-codex

# Enter the project
cd /path/to/forge-codex

Then use the repo as the home for Codex-oriented workflow assets, skills, prompts, and orchestrators.

Codex Config

If you want local Codex sessions to treat Forge skill invocation as implicit permission to use Forge agents, add a developer_instructions block to ~/.codex/config.toml:

developer_instructions = """
Invoking any `forge:*` skill implicitly authorizes the agent dispatch required by that workflow. Do not require the user to separately ask for delegation, sub-agents, or parallel agent work after invoking a Forge skill.

At the start of a fresh interactive session, begin the first user-visible response with exactly: Ready Player 1?
"""

You can verify the injected developer prompt with:

codex debug prompt-input

If a higher-priority launcher or hosted integration injects its own developer instructions, those may still override or compete with your local config.

Goals

  • Turn a structured multi-skill workflow model into a Codex-first environment
  • Support multi-step, resumable engineering workflows instead of one-shot prompts
  • Separate skill orchestration from reusable methodology templates
  • Preserve handoff context between phases and between sessions
  • Make review, verification, and diagnostics first-class parts of the workflow

Planned Skills

Skill Purpose Typical Invocation
develop Investigate a problem space and shape solution options develop <problem or feature>
plan Convert an approved direction into an implementation plan plan
evaluate Review a plan before or after implementation evaluate <plan>
implement Execute a plan in ordered or parallel waves implement
code-review Run structured review modes against code changes code-review <target>
test Execute tests, analyze failures, and identify coverage gaps test
diagnose Perform root-cause analysis on bugs and regressions diagnose <issue>
status Show workflow position, open findings, and next action status
resume Continue the active workflow from persisted state resume

Forge Skill Invocation Contract

Invoking a Forge workflow skill is intended to be enough to authorize the agent team that skill needs.

  • forge:develop, forge:plan, forge:implement, forge:code-review, forge:test, and forge:diagnose should auto-dispatch the relevant Forge agents when their workflow calls for it.
  • forge:evaluate should auto-dispatch the review team when team/review mode is active.
  • Users should not need to separately ask for "sub-agents", "delegation", or "parallel agent work" after invoking a Forge skill.
  • If the surrounding Codex session policy blocks agent spawning, that should be surfaced as an environment limitation rather than treated as normal Forge behavior.
  • Every spawned agent must be closed (close_agent) once it reports back or is no longer useful. Forge skills never leave agents open across wave / step / phase boundaries — Codex caps concurrent agents and leaked sessions eventually block further dispatch. See templates/codex-runtime.md for the lifecycle pattern.
  • At the end of each skill's workflow, a numbered handoff menu replaces the previous single next-skill prompt. Users can reply "yes", "1", "default", or a literal command; the menu makes workflow alternatives explicit.

Workflow Model

develop -> plan -> evaluate (pre) -> implement -> code-review -> test -> diagnose (if needed)

At any point:
- evaluate can run as a standalone critique workflow
- diagnose can run as an ad-hoc incident workflow
- status and resume can inspect or continue the current state

The intended model is composable rather than monolithic:

  • Each skill can run on its own
  • Skills can hand off context to the next skill in the chain
  • State files make interrupted workflows resumable
  • Review loops enforce quality gates before moving downstream

Agents

The Codex version is expected to use a small set of specialized roles rather than a single undifferentiated agent.

Agent Role
architect Investigation lead, solution design, architecture review
planner Implementation planning, sequencing, dependency mapping
backend-dev Backend implementation with tests
frontend-dev Frontend implementation with tests
critic Challenges assumptions, stresses weak logic, finds hidden risks
qa-reviewer Validates behavior, testing quality, and verification depth
security-reviewer Reviews security-sensitive changes and operational risk
doc-writer Produces user-facing and developer-facing documentation and tracks documentation debt

Methodology Coverage

forge-codex is intended to bundle practical engineering methods instead of vague “best practices”.

Investigation and diagnostics

  • 5 Whys
  • Kepner-Tregoe IS/IS-NOT
  • Fishbone / Ishikawa
  • FMEA
  • MECE decomposition
  • Bayesian evidence updates
  • hypothesis-driven debugging
  • change analysis
  • counterfactual reasoning
  • barrier analysis

Solution design

  • divergent and convergent option generation
  • trade-off scoring
  • pre-mortem analysis
  • reversibility checks
  • constraint analysis

Planning

  • phased execution
  • dependency mapping
  • parallelization opportunities
  • rollback planning
  • explicit verification steps
  • documentation-in-the-loop

Review and testing

  • structured finding severity
  • behavior verification
  • edge-case analysis
  • regression coverage review
  • failure triage
  • operational readiness checks

Architecture

The repo is expected to follow a script-driven orchestration model.

  • Skill orchestrators drive state progression for each workflow
  • Prompt templates provide repeatable phase instructions
  • Shared templates hold reusable review and planning patterns
  • State files persist current step, completed step, findings, and handoffs
  • Memory files carry context between adjacent skills
  • Reports provide durable outputs from evaluate, review, and diagnose flows

State and Continuity

Cross-session continuity is a core design goal.

  • Each active skill should persist its own state file
  • Resume logic should distinguish between a true conflict and an unrelated active session
  • Standalone skills should not pause just because another non-conflicting workflow exists
  • Handoff files should summarize completed work and recommend the next step
  • Status tooling should surface active sessions, findings, and next actions without requiring manual inspection

Design Principles

  • Codex-first: optimize for Codex workflows, not a direct port of another assistant’s toolkit model
  • Actionable outputs: produce plans, findings, commands, and reports that can be used immediately
  • Resumable by default: interrupted work should be recoverable
  • Verification over narration: claims should be tied to code, tests, or runtime evidence
  • Composable workflows: users should be able to run a single skill or the full chain
  • Minimal hidden state: the workflow should be inspectable from files in the repo

Current Project Structure

forge-codex/
├── README.md
├── agents/
├── prompts/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   └── diagnose/
├── templates/
│   ├── review/
│   ├── planning/
│   ├── reporting/
│   └── handoff/
├── scripts/
│   ├── shared/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   └── diagnose/
├── skills/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   ├── diagnose/
│   ├── status/
│   └── resume/
└── templates/

Initial Roadmap

Phase 1: Skeleton

  • define repository layout
  • add shared orchestration primitives
  • add status and resume foundations
  • document the state model

Phase 2: Core Skills

  • implement evaluate
  • implement diagnose
  • implement develop
  • add report generation and state cleanup rules

Phase 3: Delivery Flow

  • implement plan
  • implement implement
  • implement code-review
  • implement test

Phase 4: Hardening

  • add regression tests for state handling
  • verify conflict detection logic
  • tighten workflow transitions
  • document extension points for future agents and skills

Current Status

This repository now contains the copied Codex workflow assets, reorganized into a Codex-first layout. Assistant-specific packaging has been removed, and the top-level structure has been normalized around agents/, skills/, scripts/, prompts/, and templates/.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_next-0.1.4.tar.gz (224.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forge_next-0.1.4-py3-none-any.whl (271.1 kB view details)

Uploaded Python 3

File details

Details for the file forge_next-0.1.4.tar.gz.

File metadata

  • Download URL: forge_next-0.1.4.tar.gz
  • Upload date:
  • Size: 224.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for forge_next-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1fdd4029f41e2d4d485a55c46e85710e513b8471459a1e7c0870a7d208734cbc
MD5 3e0a4f29e2c8255f5cd8344719ee6c51
BLAKE2b-256 5b6cbe122f0341ebaa55228fb353de7a211d9c3541501e1ed9f6851c5d756865

See more details on using hashes here.

File details

Details for the file forge_next-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: forge_next-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 271.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for forge_next-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1c40ca6336a65c85e29d50845ac1309a935e20eb70d930fb79ff222dd2034fc8
MD5 92975777a297e661824d1057b14383fd
BLAKE2b-256 50b8d3d7a9b54fccf2e37cbfaa62f9ee378ec8612b03ce5d834b2250241229e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page