Forge Codex: install-once workflow orchestrators for Cursor/Codex

Project description

forge

A Codex-native agent toolkit for structured software delivery: investigation, planning, implementation, review, testing, diagnostics, and workflow continuity across sessions.

Recent Changes

Mock Flows + Numbered Handoff Menu (2026-05-07):

forge:test --mode flows authors end-to-end mock flows in 4 styles (scenario / BDD / HTTP-replay / workflow-dry-run). The skill detects your project layout, recommends the best-fit flow type with a confidence score, and progressively gates 8 quality criteria across scaffold → author → execute → report phases. Run with --flow-type <type> to override the recommendation, or --framework/--entry-point/--roles to fine-tune detection. Reference templates/mock-flow-types.md for per-type details.
Every skill's final step now presents a numbered handoff menu instead of a single hardcoded next-skill string. Reply with "yes" / "1" / "default" or pick a numbered alternative to steer the workflow. Use scripts/smoke.py as a CI-eligible end-to-end harness for the new flows mode.

State-lifecycle and authoring fixes (2026-05-07):

The plan skill now materializes a section-marker skeleton at step 1 (sourced from templates/writing-plans.md) and refuses to mark step 6 complete while any  markers remain.
forge resume --cleanup lists state files eligible for cleanup (dry-run by default). Add --force to delete; --all-stale --force for migration mode (clears every state file regardless of age).
Re-running step 1 of any skill now aborts when an in-progress same-skill session exists. To resume, use the --state <path> flag or run resume.py.
Over-cap --step invocations (e.g., --step 9 on an 8-step skill) now print a friendly "skill complete" message and exit 0 instead of erroring.
A failure_count field tracks consecutive same-step retries; after two failures, resume.py emits an "inspect logs" hint instead of producing a third retry command.

Install (pipx) — run in any repo

This repo ships a global forge launcher so you can install once and run workflows from any target repository (without copying scripts/ into each repo).

pipx install forge-next

Then, from any target repo:

forge evaluate --step 1 --mode review
forge plan --step 1
forge status

Use --repo <path> to target a different repository root.

Quick Start (dev / contributors)

# Clone the repo
git clone https://github.com/your-org/forge-codex.git /path/to/forge-codex

# Enter the project
cd /path/to/forge-codex

Then use the repo as the home for Codex-oriented workflow assets, skills, prompts, and orchestrators.

Codex Config

If you want local Codex sessions to treat Forge skill invocation as implicit permission to use Forge agents, add a developer_instructions block to ~/.codex/config.toml:

developer_instructions = """
Invoking any `forge:*` skill implicitly authorizes the agent dispatch required by that workflow. Do not require the user to separately ask for delegation, sub-agents, or parallel agent work after invoking a Forge skill.

At the start of a fresh interactive session, begin the first user-visible response with exactly: Ready Player 1?
"""

You can verify the injected developer prompt with:

codex debug prompt-input

If a higher-priority launcher or hosted integration injects its own developer instructions, those may still override or compete with your local config.

Goals

Turn a structured multi-skill workflow model into a Codex-first environment
Support multi-step, resumable engineering workflows instead of one-shot prompts
Separate skill orchestration from reusable methodology templates
Preserve handoff context between phases and between sessions
Make review, verification, and diagnostics first-class parts of the workflow

Planned Skills

Skill	Purpose	Typical Invocation
develop	Investigate a problem space and shape solution options	`develop <problem or feature>`
plan	Convert an approved direction into an implementation plan	`plan`
evaluate	Review a plan before or after implementation	`evaluate <plan>`
implement	Execute a plan in ordered or parallel waves	`implement`
code-review	Run structured review modes against code changes	`code-review <target>`
test	Execute tests, analyze failures, and identify coverage gaps	`test`
diagnose	Perform root-cause analysis on bugs and regressions	`diagnose <issue>`
status	Show workflow position, open findings, and next action	`status`
resume	Continue the active workflow from persisted state	`resume`

Forge Skill Invocation Contract

Invoking a Forge workflow skill is intended to be enough to authorize the agent team that skill needs.

forge:develop, forge:plan, forge:implement, forge:code-review, forge:test, and forge:diagnose should auto-dispatch the relevant Forge agents when their workflow calls for it.
forge:evaluate should auto-dispatch the review team when team/review mode is active.
Users should not need to separately ask for "sub-agents", "delegation", or "parallel agent work" after invoking a Forge skill.
If the surrounding Codex session policy blocks agent spawning, that should be surfaced as an environment limitation rather than treated as normal Forge behavior.
Every spawned agent must be closed (close_agent) once it reports back or is no longer useful. Forge skills never leave agents open across wave / step / phase boundaries — Codex caps concurrent agents and leaked sessions eventually block further dispatch. See templates/codex-runtime.md for the lifecycle pattern.
At the end of each skill's workflow, a numbered handoff menu replaces the previous single next-skill prompt. Users can reply "yes", "1", "default", or a literal command; the menu makes workflow alternatives explicit.

Workflow Model

develop -> plan -> evaluate (pre) -> implement -> code-review -> test -> diagnose (if needed)

At any point:
- evaluate can run as a standalone critique workflow
- diagnose can run as an ad-hoc incident workflow
- status and resume can inspect or continue the current state

The intended model is composable rather than monolithic:

Each skill can run on its own
Skills can hand off context to the next skill in the chain
State files make interrupted workflows resumable
Review loops enforce quality gates before moving downstream

Agents

The Codex version is expected to use a small set of specialized roles rather than a single undifferentiated agent.

Agent	Role
architect	Investigation lead, solution design, architecture review
planner	Implementation planning, sequencing, dependency mapping
backend-dev	Backend implementation with tests
frontend-dev	Frontend implementation with tests
critic	Challenges assumptions, stresses weak logic, finds hidden risks
qa-reviewer	Validates behavior, testing quality, and verification depth
security-reviewer	Reviews security-sensitive changes and operational risk
doc-writer	Produces user-facing and developer-facing documentation and tracks documentation debt

Methodology Coverage

forge-codex is intended to bundle practical engineering methods instead of vague “best practices”.

Investigation and diagnostics

5 Whys
Kepner-Tregoe IS/IS-NOT
Fishbone / Ishikawa
FMEA
MECE decomposition
Bayesian evidence updates
hypothesis-driven debugging
change analysis
counterfactual reasoning
barrier analysis

Solution design

divergent and convergent option generation
trade-off scoring
pre-mortem analysis
reversibility checks
constraint analysis

Planning

phased execution
dependency mapping
parallelization opportunities
rollback planning
explicit verification steps
documentation-in-the-loop

Review and testing

structured finding severity
behavior verification
edge-case analysis
regression coverage review
failure triage
operational readiness checks

Architecture

The repo is expected to follow a script-driven orchestration model.

Skill orchestrators drive state progression for each workflow
Prompt templates provide repeatable phase instructions
Shared templates hold reusable review and planning patterns
State files persist current step, completed step, findings, and handoffs
Memory files carry context between adjacent skills
Reports provide durable outputs from evaluate, review, and diagnose flows

State and Continuity

Cross-session continuity is a core design goal.

Each active skill should persist its own state file
Resume logic should distinguish between a true conflict and an unrelated active session
Standalone skills should not pause just because another non-conflicting workflow exists
Handoff files should summarize completed work and recommend the next step
Status tooling should surface active sessions, findings, and next actions without requiring manual inspection

Design Principles

Codex-first: optimize for Codex workflows, not a direct port of another assistant’s toolkit model
Actionable outputs: produce plans, findings, commands, and reports that can be used immediately
Resumable by default: interrupted work should be recoverable
Verification over narration: claims should be tied to code, tests, or runtime evidence
Composable workflows: users should be able to run a single skill or the full chain
Minimal hidden state: the workflow should be inspectable from files in the repo

Current Project Structure

forge-codex/
├── README.md
├── agents/
├── prompts/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   └── diagnose/
├── templates/
│   ├── review/
│   ├── planning/
│   ├── reporting/
│   └── handoff/
├── scripts/
│   ├── shared/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   └── diagnose/
├── skills/
│   ├── develop/
│   ├── plan/
│   ├── evaluate/
│   ├── implement/
│   ├── code-review/
│   ├── test/
│   ├── diagnose/
│   ├── status/
│   └── resume/
└── templates/

Initial Roadmap

Phase 1: Skeleton

define repository layout
add shared orchestration primitives
add status and resume foundations
document the state model

Phase 2: Core Skills

implement evaluate
implement diagnose
implement develop
add report generation and state cleanup rules

Phase 3: Delivery Flow

implement plan
implement implement
implement code-review
implement test

Phase 4: Hardening

add regression tests for state handling
verify conflict detection logic
tighten workflow transitions
document extension points for future agents and skills

Current Status

This repository now contains the copied Codex workflow assets, reorganized into a Codex-first layout. Assistant-specific packaging has been removed, and the top-level structure has been normalized around agents/, skills/, scripts/, prompts/, and templates/.

License

MIT

Project details

Release history Release notifications | RSS feed

0.10.7

May 19, 2026

0.9.9

May 19, 2026

0.9.8

May 19, 2026

0.9.7

May 18, 2026

0.9.4

May 17, 2026

0.9.3

May 17, 2026

0.9.2

May 17, 2026

0.9.1

May 16, 2026

0.9.0

May 15, 2026

0.7.2

May 15, 2026

0.7.1

May 14, 2026

0.7.0

May 11, 2026

0.6.4

May 11, 2026

0.6.3

May 11, 2026

0.6.0

May 10, 2026

0.4.2

May 9, 2026

0.4.1

May 9, 2026

0.4.0

May 9, 2026

0.3.3

May 9, 2026

0.3.1

May 9, 2026

0.3.0

May 9, 2026

0.2.0

May 9, 2026

0.1.4

May 9, 2026

0.1.3

May 8, 2026

This version

0.1.1

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forge_next-0.1.1.tar.gz (217.5 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

forge_next-0.1.1-py3-none-any.whl (265.6 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file forge_next-0.1.1.tar.gz.

File metadata

Download URL: forge_next-0.1.1.tar.gz
Upload date: May 8, 2026
Size: 217.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for forge_next-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7dc27fa8d31ba56d24b2b14d190d3b7fc51dfe7511253bab411ba10f0b88a66e`
MD5	`9f8965255aee8f43c2def75c75502933`
BLAKE2b-256	`0bb2bdf75f75dc0f440d7514337cb5dfbd67a447f0c02111632d020619557b48`

See more details on using hashes here.

File details

Details for the file forge_next-0.1.1-py3-none-any.whl.

File metadata

Download URL: forge_next-0.1.1-py3-none-any.whl
Upload date: May 8, 2026
Size: 265.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for forge_next-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59d7b295306c5026fd83da09333bafd8757af1da3b955fb39a82c872458fe08f`
MD5	`e0f2c1e371582497ff026afc63ffa269`
BLAKE2b-256	`d81da74225c6e1c380aa53c803fedccf98b1471c3843c6a51b6820c8946acddc`

See more details on using hashes here.

forge-next 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

forge

Recent Changes

Install (pipx) — run in any repo

Quick Start (dev / contributors)

Codex Config

Goals

Planned Skills

Forge Skill Invocation Contract

Workflow Model

Agents

Methodology Coverage

Architecture

State and Continuity

Design Principles

Current Project Structure

Initial Roadmap

Phase 1: Skeleton

Phase 2: Core Skills

Phase 3: Delivery Flow

Phase 4: Hardening

Current Status

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes