Skip to main content

Local agent manager/worker conveyor control plane for Codex sessions.

Project description

Codex Terminal Manager

A Mac-first prototype for letting one Codex session supervise and gently steer another Codex session running in a terminal.

The goal is not full autonomy. The goal is lightweight supervision for Codex tasks that mostly need progress checks, occasional nudges, test reruns, or clean stop/resume handling.

Motivating Principles

  • Project workflows often need nudging. Some Codex tasks do not need a second implementer; they need a manager that keeps watching, asks for status at the right time, unblocks predictable terminal prompts, and gently steers the worker back toward the user's goal.
  • Useful acceptance criteria often emerge during the work. Even when the user starts with a plan, implementation reveals new edge cases, missing tests, unclear polish requirements, and follow-up decisions. The manager should help discover, record, defer, satisfy, and audit these emergent acceptance criteria instead of assuming the whole checklist can be known up front.
  • Supervision should be durable and replayable. Manager observations, nudges, interrupts, handoffs, compaction requests, and final decisions should leave enough structured history to understand why the worker was pushed and what evidence supported finishing or continuing the task.

Burden Of Proof

Before declaring work complete, try to disprove the change. Identify the strongest realistic failure mode, verify it with a command, test, trace, screenshot, audit record, diff, or direct inspection, and include that evidence in the final handoff. Treat done, tests passed, worker claims, passing happy-path tests, generated summaries, and optimistic UI as claims, not proof. Treat unverified assumptions as blockers or explicit follow-ups.

See docs/agent-evidence-playbook.md for the repo-specific evidence ladder, receipt options, and final handoff shape agents should use when closing out work.

Architecture

Supervision is built on three primitives: sessions, tasks, and bindings.

  • Worker session. A Codex session running inside a named tmux session. Workers own a rollout JSONL on disk (~/.codex/sessions/.../rollout-*.jsonl) which Agent Conveyor ingests for state inference.
  • Manager session. A Codex session running anywhere — Ghostty, iTerm2, Terminal.app, or a web terminal. The manager does not need to run inside tmux. Its job is to call conveyor commands, read their JSON output, and decide whether to nudge, interrupt, finish, or wait.
  • Task. A unit of supervised work. A task has a goal and optional summary/manager instructions.
  • Binding. A row that ties one worker session and one manager session to one task. Bindings are explicit and durable.

The manager Codex drives the supervision loop by calling conveyor cycle <task> repeatedly. Each cycle ingests new events from the worker's rollout, captures the worker's tmux pane as a shadow signal, and returns structured JSON. The manager reads that JSON and decides what to do.

tmux owns the worker PTY. Ghostty, iTerm2, Terminal.app, or ttyd can be viewers, but they should not be the source of truth for orchestration.

manager terminal
  Codex manager session
    |
    | runs conveyor commands (cycle, session-nudge, ...)
    v
tmux session: codex-worker-a
  pane 1: Codex worker session
  pane 2: optional dev server, tests, or logs

.codex-workers/
  workerctl.db         <- authoritative SQLite control plane
  worker-a/            <- ignored runtime artifacts
    status.json
    transcript.txt
    events.jsonl

Non-Goals

  • Cross-platform support.
  • Browser-first orchestration.
  • Full terminal emulator automation.
  • Autonomous merging or destructive git actions.
  • Managing many workers at once.

Install

For users, install Agent Conveyor with pipx:

pipx install git+https://github.com/neonwatty/codex-terminal-manager.git
conveyor install-skills
conveyor doctor

The package name is agent-conveyor. The installed CLI exposes both conveyor and the compatibility command workerctl. conveyor install-skills installs the manage-codex-workers and codex-review skills into $CODEX_HOME/skills or ~/.codex/skills. The codex-review install includes the guarded review helper used by the QA and PR closeout flows.

For contributors working from this checkout, use the local installer instead:

scripts/install-local --write
export PATH="$PWD/bin:$PATH"

conveyor doctor reports local dependency health (tmux, codex, etc.). conveyor db-doctor initializes and checks the SQLite control-plane database. Before publishing agent-conveyor to TestPyPI or PyPI, use docs/package-release.md.

After install, the intended Codex app entry point is natural language. Open a new Codex app session in the repo and say:

Use the manage-codex-workers skill.

Set up a Codex app Ralph loop for issue CTL.
Require adversarial proof before another worker iteration.

The installed skill should choose names, create the no-tmux binding with create-disposable-binding, point the worker at worker-inbox, and use loop-status plus telemetry receipts before reporting that the loop is ready.

Dispatch is core infrastructure for supervised worker/manager pairs. The pair workflow starts a detached Dispatch watch process by default so worker completion is routed to the bound manager mechanically. For manually bound pairs, run Dispatch in a separate shell:

conveyor dispatch --watch --dispatcher-id dispatch-local

Use conveyor qa-plan dispatch-completion for a bounded verification flow, or conveyor qa-plan ralph-loop for the repeated PR/CI/merge/context-clear dogfood loop. Use conveyor qa-plan adversarial-triggers to verify natural-language manager prompts activate Ralph-loop adversarial gates. Use conveyor qa-plan goalbuddy-conveyor when a broad request should become sequential GoalBuddy child boards with PR/CI/merge receipts. For manual QA, launch the dashboard with Dispatch enforcement so the page can show live proof:

conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard

Quickstart

The fastest way to start a worker and register it is a single command:

# One command: spawn codex in tmux, wait for it to come up, register as worker
conveyor start-worker --name foo --cwd "$PWD" --task "Refactor auth"

# Register a manager. Managers do not need to run inside tmux.
MGR_PID=$$   # if your current shell is the manager; otherwise find its pid
conveyor register-manager --name foo-mgr --pid $MGR_PID --cwd "$PWD"

# Create a task and bind the pair to it.
conveyor tasks --create my-task --goal "Refactor auth"
conveyor bind --task my-task --worker foo --manager foo-mgr

# Start Dispatch in another shell so worker completion wakes the manager.
conveyor dispatch --watch --dispatcher-id dispatch-local

# One observation cycle. Returns JSON.
conveyor cycle my-task

# Optionally nudge the worker through its tmux pane.
conveyor session-nudge foo "What's your current state?"

# When the task is complete:
conveyor finish-task my-task --reason "auth refactor merged" --capture-transcript-before-stop --stop-manager --stop-worker
conveyor unbind --task my-task
conveyor deregister foo
conveyor deregister foo-mgr

For manual registration of a pre-existing Codex session:

# Start a Codex worker inside a fresh tmux session.
tmux new-session -d -s codex-foo
tmux send-keys -t codex-foo "codex" Enter
# (Wait a moment for codex to come up.)
WORKER_PID=$(pgrep -f "codex.*--sandbox" | head -1)

# Register the worker. lsof auto-discovers the rollout JSONL from the pid.
conveyor register-worker --name foo --pid $WORKER_PID \
  --cwd "$PWD" --tmux-session codex-foo

If lsof discovery fails (e.g. the codex session was started ephemerally), pass the rollout path explicitly with --codex-session ~/.codex/sessions/.../rollout-...-<uuid>.jsonl.

To register a manager session that's already running:

# If the codex is already running and you know its pid:
conveyor register-manager --name my-mgr --pid 28975

# register-manager runs `lsof -p <pid>` to find the rollout JSONL.
# If the codex hasn't written its rollout yet (no input typed),
# you'll get a hint asking you to type something in the codex prompt and retry.

# Or pass --codex-session explicitly to bypass the lsof probe:
conveyor register-manager --name my-mgr --pid 28975 \
    --codex-session /path/to/rollout.jsonl

Note: lsof is the canonical pid→rollout lookup. find -newermt is unreliable because filesystem mtime resolution and parsing of "X minutes ago" varies — lsof reads the open fd directly.

For low-risk verification without a real task, conveyor start-test <name> creates a worker, asks it to update only its ignored status.json, and leaves the tmux session attached:

conveyor start-test live-test --cwd "$PWD" --accept-trust --open
tmux attach -t codex-live-test

Commands

Sessions and binding

  • start-worker --name N [--cwd D] [--task "..."] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N] — Spawn Codex in a fresh tmux session and register it as a worker in one call. The fastest way to start a supervised worker. Internally: tmux new-session
    • codex + poll for rollout + register-worker.
  • start-manager --name N [--cwd D] [--task T] [--task-goal G] [--worker W] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N] — Spawn Codex in a fresh tmux session and register it as a manager in one call. Mirrors start-worker but uses a manager bootstrap prompt instead of a worker task prompt. When --task, --task-goal, and --worker are supplied, the bootstrap is ready for late attach: it names the task, goal, worker session, and concrete manager-config, cycle, manager-ack, and worker-ack commands. If manager config has already been recorded for the task, the bootstrap tells the manager to start with cycle instead of asking setup questions again. Without those flags, the bootstrap asks the manager to collect the missing supervision details before cycling.
  • pair --task T --worker-name W --manager-name M [--cwd D] [--task-prompt PROMPT] [--task-goal GOAL] [--task-summary S] [--manager-objective O] [--manager-guideline G ...] [--manager-acceptance A ...] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N] [--dispatcher-id ID] [--no-dispatch] — One-shot: spawn worker + manager and bind to a task in a single command. Combines start-worker + start-manager + bind. The task is looked up or created (if --task-goal is provided); if the task does not exist and no goal is given, an error is raised with a hint. The worker receives the optional --task-prompt as its initial Codex prompt; the manager receives a manager bootstrap prompt with the task, goal, worker name, manager configuration status, and cycle commands. pair records a default guided manager config before launching the manager, so retries against an existing task do not fall back into setup-question mode. If manager config flags are supplied (--manager-mode, --manager-objective, repeated --manager-guideline, --manager-acceptance, --manager-reference, or manager permission flags), those values are merged into the seeded config and the bootstrap tells the manager to start supervising with cycle instead of asking setup questions first. Manager acceptance entries are also seeded into the living acceptance criteria ledger when they do not already exist for the task. By default pair starts a detached dispatch --watch process after successful worker/manager setup, bind, and run creation. Use --dispatcher-id to set its identity or --no-dispatch for isolated/manual workflows. A live dispatch heartbeat is reused only when it has the same dispatcher id; otherwise pair starts the requested dispatcher so audit receipts keep the configured identity. If the manager or bind fails after the worker is spawned, the worker remains registered and can be cleaned up with conveyor deregister. Use --accept-trust only for directories you intentionally trust; it retries Enter during startup discovery so fresh workspaces do not stall before registration.
  • register-worker --name N [--pid P | --codex-session PATH] [--cwd D] [--tmux-session S] — Register an already-running Codex session as a worker. Rollout JSONL is auto-discovered from the pid via lsof unless --codex-session is given.
  • register-manager --name N ... — Same arguments; tmux is not required.
  • deregister <name> — Mark a session gone. Refuses if the session is bound to an active task.
  • sessions [--role worker|manager] [--state active|gone|all] [--include-legacy] [--name N ...] [--redact-identity-token] — List registered sessions. By default, sessions shows active registered sessions and hides Phase 1 backfill rows (legacy pre-redesign workers/managers, identified by pid IS NULL) plus rows marked state='gone'. Pass --state all to show every row, or --state gone to inspect only gone rows:
    conveyor sessions                    # active registered sessions only
    conveyor sessions --state active     # explicit equivalent of the default
    conveyor sessions --state gone       # gone sessions only
    conveyor sessions --state all        # active, gone, and legacy rows
    conveyor sessions --name <session> --redact-identity-token
    
    For shareable QA evidence, prefer repeating --name for just the sessions in scope and include --redact-identity-token; unfiltered output can include unrelated active sessions and their registration tokens.
  • tasks [--create NAME --goal G --summary S] — List or create tasks.
  • create-disposable-binding TASK [--worker NAME] [--manager NAME] [--template TEMPLATE | --required-before-continue TYPE] [--adversarial] — Create a no-tmux manager/worker binding for real Ralph-loop slices. The helper creates the task when missing, marks it managed, writes valid Codex rollout JSONL files, registers worker and manager sessions with tmux_session=null, binds them, optionally creates a template-backed or custom Ralph-loop policy run, and prints replay commands for Dispatch, worker inbox polling, and loop-status.
  • discover [QUERY] [--all] [--limit N] / search [QUERY] — Search tasks, registered sessions, active bindings, and recent telemetry in one JSON result. Use this for conversational setup when a manager or Codex session needs to present likely worker/manager/task connection options instead of asking the user for generated names:
    conveyor discover dashboard
    conveyor search "auth refactor"
    
    The output includes tasks, sessions, bindings, telemetry, and suggestions; suggestions may include a ready-to-run conveyor bind command or next-step prompts to register the missing worker or manager.
  • handoff <task> --summary S [--next-step N ...] [--payload-json JSON] — Persist a compact worker handoff for the task. Use this when a worker is becoming managed or before a long context transition so the manager can read progress and likely next steps from SQLite.
  • manager-config <task> [--mode light|guided|strict] [--objective O] [--guideline G ...] [--acceptance A ...] [--reference R ...] [--permit CATEGORY.ACTION ...] [--tool TOOL ...] [--epilogue STEP ...] [--nudge-on-completion MODE] [--require-acks] [--allow-pr] [--allow-merge-green] [--allow-worker-compact-clear] — Persist the manager's supervision contract: what to check against, how structured the loop should be, acceptance criteria, source references, and categorized permissions. With no recorded config it creates the default guided config; with no mutating flags after that it prints the current config. Use --questions from a manager Codex session to get a stable JSON question schema to ask the user in chat, then save the answers with noninteractive flags. Use --interactive only as a terminal fallback when a human is running conveyor directly. --permit grants taxonomy permissions such as repo.open_pr, verification.run_pytest, context.spawn_reviewer, communication.notify_operator, or worker_session.compact. Use --tool to record expected verification/context tools, --epilogue for required built-in finish steps (run-tools, draft-pr, subagent-review, record-handoff), --nudge-on-completion for continuation review behavior (off, ask-operator, auto-review, auto-proceed), and --require-acks when cycle/finish-task should fail closed until both sides acknowledge. Legacy flat flags and --permissions-json keys (create_pr, merge_green_pr, worker_compact_clear, plus older allow_* aliases) are still accepted and normalized into the categorized taxonomy.
  • criteria <task> — Track emergent acceptance criteria discovered during supervision. Managers should add useful proposed criteria, accept must-have items, defer follow-ups, and mark criteria satisfied only when worker receipts and verification cover them.
    conveyor criteria my-task --list
    conveyor criteria my-task --list --status accepted
    conveyor criteria my-task --add --criterion "..." --source worker_proposed --status proposed
    conveyor criteria my-task --add --criterion "..." --source manager_inferred --status accepted
    conveyor criteria my-task --accept 12 --rationale "Must-have for this task"
    conveyor criteria my-task --satisfy <id> --evidence-json '{"command":"...","status":"pass"}'
    conveyor criteria my-task --defer 13 --rationale "Follow-up after this task"
    conveyor criteria my-task --reject 14 --rationale "Duplicate or out of scope"
    
    Replace placeholder ... values with the actual criterion and verification command. Use worker_proposed for criteria proposed by the worker. Use manager_inferred for criteria inferred from manager config, cycle evidence, or manager inspection; manager_config is not a valid criteria source. To add a criterion and satisfy that same row after verification:
    criterion_id=$(conveyor criteria my-task --add --criterion "Targeted prompt tests pass" --source worker_proposed --status proposed | python3 -c 'import json,sys; print(json.load(sys.stdin)["affected_criterion"]["id"])')
    conveyor criteria my-task --satisfy "$criterion_id" --evidence-json '{"command":"python3 -m unittest tests.test_workerctl.ManagerBootstrapPromptTests -v","status":"pass"}'
    
    For mutation responses, treat affected_criterion as the authoritative receipt for the row changed by that command. When a manager applies multiple criteria changes, run criteria <task> --list before final audit or other decisions; the list command is the canonical task-level criteria state.
  • criteria-plan <task> --from-text ...|--from-worker-response PATH|--from-stdin [--json] — Draft reviewed criteria --add commands from a worker response that separates must-have current-task criteria from deferred follow-ups. This helper is read-only: it resolves the task and prints suggestions, but does not mutate acceptance criteria, events, or commands.
    conveyor criteria-plan my-task --from-worker-response response.md --json
    
  • manager-permission <task> <CATEGORY.ACTION|CATEGORY> [--list] [--require] [--require-handoff] — Check and audit whether the saved manager config allows a categorized action, or list granted actions in a category. Use --require when a manager command should fail closed. Use --require-handoff before worker compact/clear style instructions so visible context is persisted first.
  • worker-ack <task> --from-stdin|--json [--correlation-id ID] / manager-ack <task> --from-stdin|--json [--correlation-id ID] — Persist or read the latest structured acknowledgement from the worker or manager. Acks are revisioned and exposed to cycle, replay, and audit so startup contract drift can be distinguished from later drift.
  • continuation <task> --submit worker|manager --from-stdin [--correlation-id ID] — Record independent worker/manager "what's next" proposals for a completion turn. The worker proposal must be written first, and manager-side reads are redacted until the manager submits its own proposal.
  • continuation <task> --review --from-stdin [--correlation-id ID] — Record a structured reviewer verdict over the paired continuation proposals. This requires context.spawn_reviewer permission and reviewer separation metadata (subagent_run.reviewer_session_id distinct from the manager and manager_rollout_access=false). Divergent reviews are routed for operator attention unless --nudge-on-completion auto-proceed is configured.
  • continuation-reviewer <task> --correlation-id ID --reviewer-session-id ID --manager-session-id ID --reviewer-command ... — Run a reviewer command with the allowed read-only context on stdin, capture reviewer metadata, and persist the structured review. The context includes paired proposals, acceptance criteria, manager config summary, diff metadata, and recent PR metadata; it does not include manager rollout context. Reviewer commands run from an isolated temporary cwd with a stripped environment and, on macOS, through sandbox-exec. The sandbox keeps the targeted denial of bound worker/manager rollout files plus the active control database and sidecars, and also denies direct reads of the active .codex-workers state root so legacy session files, transcripts, capture metadata, task state, and exports are not available through filesystem reads. The allowed reviewer context still arrives on stdin, and replay/audit/export commands outside this reviewer subprocess are unchanged. Sandbox setup failures, reviewer command failures, timeouts, or invalid JSON are recorded as verdict=stop, not silent approvals. Use --dry-run to inspect the exact context without running the command.
  • continuation <task> --list [--as-role all|worker|manager|reviewer] [--include-payload] — List continuation proposals and reviews with role-aware payload redaction.
  • record-decision <task> <wait|nudge|interrupt|escalate|stop|inspect> --reason R [--cycle-id N] [--payload-json JSON] — Persist a manager decision and print its id. Use this before strict mutating commands that require --decision-id.
  • compact-worker <task> --reason R [--clear] [--prompt-only] — Convenience wrapper that records a nudge manager decision, then sends Codex /compact to the worker through the same strict audited path as request-worker-compact. Use --clear to send /clear.
  • request-worker-compact <task> --decision-id N --strict-decisions — Send Codex /compact to the worker through the audited path. Use --clear to send /clear, or --prompt-only to send an explanatory prompt instead. Fails closed unless worker_compact_clear is enabled in manager config and a worker handoff exists. Records a durable command and audit events before/after sending the worker instruction. --dry-run still records the command in commands, replay, and mutation-audit with dry_run: true and sent: false.
  • bind --task T --worker W --manager M — Create the task binding.
  • unbind --task T — End the active binding for a task.
  • finish-task <task> [--reason R] [--require-criteria-audit] [--require-acks] [--require-epilogue] [--require-adversarial-proof] [--stop-manager] [--stop-worker] [--capture-transcript-before-stop] — Mark a task done. Leaves the manager terminal open by default for review. With --require-criteria-audit, fails before finishing if any acceptance criteria for the task are still accepted; proposed, satisfied, deferred, and rejected criteria do not block. With --require-acks, fails if worker or manager acknowledgement is missing. With --require-epilogue, fails if any configured epilogue step is not succeeded. With --require-adversarial-proof, fails before finishing unless the task has at least one satisfied criterion with evidence_type=adversarial_check and non-empty failure_mode, check, and result fields; use this when tests passed is not enough by itself. With --capture-transcript-before-stop, captures transcript segments for any worker/manager sessions being stopped before killing tmux sessions; capture failure fails before stop side effects.
  • stop-task <task> [--reason R] [--stop-worker] — Force-stop a task's manager (and optionally the worker), recording the reason in the audit payload.
  • stop <session> — Stop a tmux-backed worker or manager session by name. This works for both legacy worker records and session-table workers/managers. For a completed task with an active binding, prefer an idempotent cleanup pass with finish-task <task> --stop-manager --stop-worker so the task audit records the cleanup against the binding.

Observation

  • dashboard [--task T] [--ensure-dispatch] [--dispatcher-id ID] [--host 127.0.0.1] [--port 8797] — Launch the local live supervision cockpit. The dashboard binds to loopback by default, uses the TypeScript backend to shell out to conveyor JSON commands, and attaches interactive terminals to tmux-backed worker/manager sessions through a WebSocket PTY bridge. It includes browser bootstrap controls for creating a task, starting a worker/manager pair with conveyor pair, auto-attaching the terminals, attach/bind controls, and audited action receipts for cycle, nudge, interrupt, finish, and export. With --ensure-dispatch, launch also ensures a Dispatch watch process using the supplied --dispatcher-id when provided, reusing only a fresh heartbeat from that same dispatcher id. Use --dry-run --json to inspect the launch command.

  • cycle <task> [--busy-wait-seconds N] — One observation cycle. Idempotent. Runs ingest, computes worker state from the JSON event stream, captures the tmux pane as a shadow signal, writes a manager_cycles row, and returns a JSON dict the manager Codex consumes. The status_payload includes:

    • worker_alive / manager_alive — booleans computed by probing the registered session pids (os.kill(pid, 0)). False when the session's pid is NULL (legacy backfill) or the process has exited — useful for detecting silently-dead workers between cycles.
    • last_event_subtype — the subtype of the most recent codex_events row for the worker, or null if no events exist.
    • task_completedtrue iff last_event_subtype is "task_complete". Disambiguates "worker finished cleanly" from "worker idle but never started."
    • manager_context — the latest manager-config, worker/manager acknowledgements, handoff, and acceptance_criteria records for the task, so each manager loop can reference the saved objective, living acceptance criteria, categorized permissions, expected tools, acked contract, worker progress, and next steps. manager_context.acceptance_criteria groups criteria by status, includes summary counts, and exposes open as accepted criteria that still need proof before finishing. manager_context.criteria_negotiation is advisory: when needed is true, the manager should ask the worker for must-have current-task criteria versus follow-up criteria, then record the result with conveyor criteria. The field does not send nudges or mutate criteria automatically.

    The cycle subcommand accepts --busy-wait-seconds N (default: 90) to tune the pane-signal classifier's stuck-busy threshold. Lower values flag stalls faster but increase false positives on long-running real work:

    conveyor cycle my-task                          # default 90s threshold
    conveyor cycle my-task --busy-wait-seconds 30   # tighter detection
    
  • ingest <session> — Pull new events from a session's rollout JSONL into the codex_events table. Tracks a byte offset, so subsequent runs only pick up new events.

  • tail <session> [--limit N] [--subtype T] [--include-content] — Print the most recent events for a session, newest first. Text payload fields are redacted by default; use --include-content only when stdout is redirected or verbatim text is intentionally needed.

  • divergences <task> [--limit N] — Cycles whose shadow pane signal flagged a notable pattern (trust prompt, rate-limit prompt, approval prompt, etc.). Useful for auditing the shadow signal against the JSON state.

  • dispatch [--once|--watch] [--limit N] [--interval SECONDS] [--dispatcher-id ID] [--type notify_manager|nudge_worker|worker_task_complete] [--watch-iterations N] [--lease-seconds N] [--dry-run] [--json] — Run Dispatch, the mechanical routing/actuation role. worker_task_complete routing reads from codex_events, records deduplicated routed_notifications keyed by the source event, and notifies the bound manager without deciding task success. Explicit notify_manager and nudge_worker command rows are atomically claimed, executed, and recorded through command_attempts with conservative side-effect metadata. --watch repeats polling with heartbeat telemetry; --watch-iterations bounds a watch run for scripts and verification; --lease-seconds tunes command claim recovery; --once performs one pass.

  • enqueue-notify-manager <task> --message "..." [--correlation-id C] [--required-permission P] [--idempotency-key K] [--json] — Queue a notify_manager command row for Dispatch to claim and deliver to the bound manager.

  • enqueue-nudge-worker <task> --message "..." [--correlation-id C] [--required-permission P] [--idempotency-key K] [--json] — Queue a nudge_worker command row for Dispatch to claim and deliver to the bound worker. Use this dispatcher-backed route instead of session-nudge when the worker is registered without tmux; the worker then receives the message through worker-inbox.

  • session-inbox <session> [--consume-next] [--wait] [--timeout N] [--interval N] [--limit N] [--json] — List or consume unconsumed routed notifications addressed to a registered session. Text output includes the pending count, signal type, delivery mode, source/target sessions, delivered timestamp, and correlation id. Use --consume-next --wait --json for Codex app long-polling; consumed items emit dispatch_inbox_consumed telemetry.

  • manager-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N] [--limit N] [--json] — Resolve the task's bound manager session and read its dispatcher inbox.

  • worker-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N] [--limit N] [--json] — Resolve the task's bound worker session and read its dispatcher inbox.

Actuation

  • session-nudge <name> "<text>" [--dry-run] — Send text plus Enter to the session's tmux pane. Requires the session to have been registered with --tmux-session. Managers running outside tmux cannot receive nudges; only workers do.
  • session-interrupt <name> [--key K] [--followup T] [--dry-run] — Send an interrupt key (default C-c). Optional --followup text after the interrupt.

Audit

  • audit <task> — Events history for a task. Lists events-table rows only. audit --json redacts stored terminal/transcript content unless --include-content is passed.
  • replay <task> [--format compact|timeline|transcript|full-transcript] [--role all|worker|manager] [--limit N] [--include-content] — Render a chronological, human-readable reconstruction of the task. Cycle entries include [pane pattern: <pattern_id>] when the shadow signal flagged something. full-transcript is blocked unless --include-content is passed.
  • mutation-audit <task> — Manager decisions and their consequences.
  • events <name> — Worker events log.
  • commands [--task T] [--type T] [--state S] [--attempts] — Durable side-effect commands log. Use --attempts to include per-dispatcher attempt history.
  • epilogue <task> --step run-tools|draft-pr|subagent-review|record-handoff [--json] [--correlation-id ID] — Run one configured epilogue step and record its durable state. Use --list or --status to inspect configured steps and latest run results.
  • telemetry [--run RUN] [--task TASK] [--search QUERY] [--summary] [--json] — Query local structured telemetry events, search them with SQLite FTS, or print aggregate counts for a run/task. telemetry snapshot --task <task> --json prints the task-scoped dashboard overview contract.
  • telemetry task <task> --json — Print a task-scoped telemetry triage view: recent cycle history, last successful cycle, worker/manager liveness, decisions, commands, failed cycles/commands, ingest skipped/error summaries, pane capture failures and notable patterns, open criteria counts, telemetry counts, and retained storage counts. Raw transcript, pane, prompt, criterion, command payload, and command result bodies are not included.
  • telemetry failures --json — Print an operator failure triage view across tasks: recent failed cycles, failed commands, ingest errors/skipped lines, pane capture failures, open accepted criteria, active task/session health, and retained storage counts without raw transcript or prompt content. Use --task, --run, --active-only, or --window 2h to narrow the failure view for recency or active-task triage.
  • telemetry metrics --window 24h --json — Print bounded JSON rollups for local telemetry and related tables: active tasks/sessions, cycle and command success/failure counts, ingest/skipped-line totals, criteria counts, reconcile drift counts, export counts, and retained capture/transcript bytes.
  • export-task <task> [--zip] — Dump task status, audit, prompts, and transcript metadata into an export bundle. Exports include telemetry-events.json, telemetry-summary.json, and telemetry-report.md; see docs/local-telemetry-workflow.md.

Administration

  • doctor — Local dependency and tmux health check.
  • doctor-self — Verify the current Codex session can self-register.
  • db-doctor — SQLite schema health check.
  • reconcile [--apply] [--stale-cycles-seconds N] — Report (and optionally fix) dead-pid sessions, dangling bindings, and stuck tasks. Default stale-cycle threshold is 3600 seconds (1h); override with --stale-cycles-seconds N to catch tasks where the manager has been silent for shorter intervals. JSON output.
  • prune [--keep-latest N] [--dry-run] — Drop old transcript content while preserving metadata.
  • transcript-prune <task> [--keep-latest N] — Same, scoped to a task.
  • transcript-capture <task> [--role R] [--mode M] — Capture deduplicated transcript segments. JSON output redacts raw captured terminal output by default.
  • transcript-show <task> [--role R] [--include-content] — Show stored transcript segment metadata. Segment text is redacted unless --include-content is passed.
  • qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor> — Print a repeatable manual QA checklist.
  • qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB] — Run a deterministic no-tmux QA harness and save a JSON receipt. ralph-loop-guardrails proves max-iteration cutoff, missing-evidence cutoff, fresh retry delivery after structured adversarial_check evidence, and the pr_ci_merge_loop preset evidence gate. generic-loop-template proves the visual_diff_loop template blocks before visual evidence, rejects unstructured adversarial evidence, and delivers only after required visual receipts plus structured adversarial proof exist. generic-loop-template-browser runs the same visual_diff_loop gate proof with a browser-rendered static HTML candidate screenshot, recording browser backend, viewport, candidate HTML, screenshot, visual diff, and structured adversarial evidence in the saved receipt. It uses the repo's Node Playwright dependency and requires Chromium to be installed and launchable; when unavailable, it fails with the browser-backed QA helper message. test-coverage-loop proves the test_coverage_loop template blocks before coverage evidence, rejects malformed adversarial evidence, and delivers only after a structured coverage receipt plus adversarial proof exist. build-clear-loop proves the non-coverage build_then_clear template blocks before build_passed and cleanup receipts, still blocks after build evidence alone, and delivers only after both build and cleanup evidence exist.
  • loop-triggers --list|--classify PROMPT [--json] — List the controlled natural-language loop triggers or classify a manager/operator prompt before creating a loop policy or continuation gate. Approved trigger phrases include the adversarial Ralph-loop, iteration, finish, worker-directed proof, and manager-created adversarial criteria gates; generic caution like "be careful and run tests" intentionally does not arm those gates.
  • loop-templates --list|--show TEMPLATE|--create-run TASK --template TEMPLATE — List generic loop templates or create a template-backed loop policy run. Template-backed runs use the same Dispatch guardrails as Ralph-loop presets: max_iterations prevents over-looping, and required_before_continue evidence blocks a manager continuation before worker delivery until matching satisfied criterion evidence exists. ralph-loop-presets remains as a compatibility alias for the current Ralph-loop QA flows. The built-in visual_diff_loop template requires reference_artifact, candidate_screenshot, visual_diff_report, diff_below_threshold, and adversarial_check evidence before a manager-requested next visual pass can reach the worker. Quality-oriented templates (pr_ci_merge_loop, test_coverage_loop, and visual_diff_loop) also expose an artifact_requirements["adversarial_check"] object requiring failure_mode, check, and result fields.
  • loop-status TASK --run RUN [--json] — Summarize a Ralph-loop run for manager review: policy template, iteration bounds, command states, routed notifications, worker inbox backlog, evidence types, consumed-inbox telemetry, failure counts, and a recommendation.

For real vertical slices, start with the Ralph loop operator guide in docs/qa/ralph-loop-operator-guide.md. It explains the controlled natural-language triggers, Dispatch authority model, worker inbox polling, required evidence, adversarial proof, loop-status, and telemetry review pass bar. Use create-disposable-binding when the manager and worker are Codex app or other no-tmux sessions and you want the same Dispatch rails without manual task/session/bind setup.

  • enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N — Queue a manager-requested next loop pass for Dispatch. The command refuses same/current iteration requests before they become pending queue rows, while Dispatch also blocks any stale same/current iteration command that reaches the queue. Max-iteration and missing-evidence refusals remain Dispatch policy receipts. JSON output includes loop_policy; delivered manager/worker inbox payloads include the same loop_policy plus enriched ralph_loop metadata so tmux and Codex app sessions can see the template, cleanup policy, required evidence, artifact requirements, and recommended tools.
  • loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE — Record a run-qualified evidence receipt for a loop policy. Use loop-evidence visual-diff to compare PNG screenshots, write an optional diff/report artifact, and record visual_diff_report plus diff_below_threshold as satisfied only when the computed score is within threshold.
  • loop-evidence adversarial-check TASK --loop-run RUN --iteration N --failure-mode F --check C --result R — Record first-class adversarial proof for a loop iteration. Use it when a manager or worker tried to disprove the iteration before continuing. The receipt is stored as evidence_type=adversarial_check with structured failure_mode, check, and result metadata and can satisfy Ralph-loop continuation policy. See docs/qa/adversarial-proof.md for the receipt shape and how it maps to manager prompts, Ralph-loop evidence, Dispatch blocking, and audited finish.
  • qa-plan goalbuddy-conveyor — Print the reusable natural-language starter prompt and QA contract for autonomous GoalBuddy conveyor runs. Use it when a manager should split broad work into one parent board plus sequential vertical-slice child boards with PR/CI/merge receipts, satisfied-on-main proof, and adversarial review gates. See docs/qa/goalbuddy-conveyor.md.
  • ralph-loop-presets --list|--show PRESET|--create-run TASK --preset PRESET — List saved Ralph-loop guardrail templates or create a preset-backed ralph_loop policy run.
  • import-compat — Dry-run or import existing .codex-workers/<worker>/ artifacts into SQLite.

Worker setup

  • create <name> --cwd D --task "..." — Full worker creation: spawn a tmux session, start Codex, send the initial worker contract.
  • start <name> --cwd D — Start a plain Codex session inside tmux without registering a worker. Useful when you want to register it manually later.
  • start-test <name> — Low-risk verification worker that only updates its ignored status.json.

QA Plans

Print repeatable live QA checklists from the CLI:

conveyor qa-plan self-management
conveyor qa-plan emergent-criteria
conveyor qa-plan emergent-criteria --json
conveyor qa-plan tmux-errors
conveyor qa-plan dispatch-completion
conveyor qa-plan ralph-loop
conveyor qa-plan adversarial-triggers
conveyor qa-plan goalbuddy-conveyor
conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
conveyor loop-templates --list --json
conveyor loop-templates --show visual_diff_loop --json
conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
conveyor ralph-loop-presets --list --json

General loop templates let operators create policy-backed runs without adding bespoke Dispatch behavior for each loop shape. For example, conveyor loop-templates --create-run qa-task --template visual_diff_loop creates a visual-diff loop run whose required_before_continue evidence must be recorded before the manager's next visual pass can reach the worker. Existing ralph-loop-presets commands remain compatible aliases over the same template-backed guardrails.

For natural-language control, run conveyor loop-triggers --classify "<manager prompt>" --json before automatically creating policy or continuation gates. Only a matched controlled trigger should create a Ralph-loop policy, require adversarial_check before continuation, require finish-task --require-adversarial-proof, or record worker-proposed adversarial proof. The executable receipt check is conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json.

The emergent-criteria scenario covers a real worker/manager pair, criteria negotiation, audited finish gating, replay/export evidence, and --stop-manager --stop-worker cleanup verification. It also includes an optional criteria-plan step for drafting reviewed criteria commands from the worker's separated must-have and follow-up response.

The tmux-errors scenario covers read-only JSON degradation, mutating command failures, pane capture degradation, stop failures, and reconcile recovery when tmux is unavailable or a disposable tmux target disappears.

The dispatch-completion scenario covers the issue #113 completion-routing flow: a worker task_complete signal is read from codex_events, Dispatch records and deduplicates a routed notification, the bound manager receives a mechanical wake-up, duplicate-route races emit suppressed telemetry without an extra send, and audit/replay/dashboard surfaces show readable, chronological dispatch evidence.

The ralph-loop scenario covers the issue #152 managed delivery loop: the manager runs the same seed prompt through at least two iterations, requires criteria and epilogue evidence, gates PR creation, CI monitoring/fixing, green merge, handoff, and worker clear on explicit permissions, and proves the second iteration starts after audited clear in fresh-worker isolation. Replay iterations start with an inspect-first guard: if the previous iteration's work is already merged, the worker records that state and stops without making replacement edits or opening another PR unless something is actually missing. The same QA plan also covers preset-backed guardrails such as pr_ci_merge_loop, where Dispatch blocks another worker iteration until required pr_url, ci_green, merge, and adversarial_check evidence exists.

Terminal helpers

  • open <name> — Open a macOS terminal window attached to a registered worker.
  • open-worker <task> — Open a terminal window for a task's worker without spelling raw tmux session names.
  • open-manager <task> — Same, for the task's manager.

Low-level worker actions (legacy worker-name-keyed)

These commands operate against workers by name and predate the manual-binding path. They remain useful for direct access against backfilled workers and for debugging.

  • list — List known workers.
  • status <name> — Print worker status as JSON.
  • idle-check <name> — Classify worker freshness and recommend an action.
  • capture <name> [--include-content] — Capture recent terminal output. Default output is metadata only; pass --include-content only when verbatim pane text is intentionally needed.
  • nudge <name> "<text>" — Legacy worker-directory nudge. For managed session pairs, prefer session-nudge <name> "<text>"; nudge falls back to session-name delivery when no legacy worker directory exists.
  • interrupt <name> — Send an explicit interrupt key.
  • stop <name> — Stop a worker tmux session.
  • update-status <name> — Update a worker status contract.
  • classify --text "..." — Debug the busy-wait pattern classifier.

Manager Loop Pattern

A manager Codex drives supervision by calling conveyor cycle <task> repeatedly. Each call:

  1. Runs ingest against the worker's rollout JSONL (idempotent; picks up only new bytes since the last cycle).
  2. Computes the worker's current state from the JSON event stream (busy, idle, or unknown). task_started/user_message set the session to busy; task_complete sets it to idle; everything else does not change state.
  3. Captures the worker's tmux pane (if attached) and runs the legacy pattern detector — surfaces trust_prompt, rate_limit_prompt, enter_to_confirm, etc. as notable_pane_pattern. This is the shadow signal: best-effort and supplementary.
  4. Writes a row to manager_cycles so the full observation history is replayable via conveyor replay <task>.
  5. Returns a structured JSON dict.

The manager parses the JSON, decides whether to act, and optionally calls conveyor session-nudge / session-interrupt. Then it loops.

conveyor cycle auth-refactor
# {
#   "kind": "session_cycle",
#   "task": "auth-refactor",
#   "worker_session": "auth-worker",
#   "manager_session": "auth-mgr",
#   "ingest": { "new_events": 3, "new_offset": 12345 },
#   "state": "busy",
#   "last_state_event_at": "2026-05-11T14:32:11Z",
#   "staleness_seconds": 4.2,
#   "notable_pane_pattern": "trust_prompt",
#   "pane_signal": {
#     "captured": true,
#     "classifier": { "pattern": "trust_prompt", ... },
#     "notable_pattern": "trust_prompt"
#   },
#   "manager_context": {
#     "manager_config": {...},
#     "worker_handoff": {...},
#     "acceptance_criteria": {
#       "summary": {"proposed": 1, "accepted": 2, "satisfied": 0, "deferred": 1, "rejected": 0},
#       "open": [...],
#       "proposed": [...],
#       "satisfied": [...],
#       "deferred": [...],
#       "rejected": [...]
#     },
#     "criteria_negotiation": {
#       "needed": true,
#       "reason": "no_criteria",
#       "prompt": "Please propose 2-4 acceptance criteria for the current slice...",
#       "suggested_actions": [...]
#     }
#   },
#   "cycle_id": 17,
#   ...
# }

If notable_pane_pattern is set the manager can branch on it directly — e.g., on trust_prompt send Enter via session-nudge rather than waiting on staleness_seconds. On IngestError (rollout missing or rotated), cycle records a state='failed' row before re-raising so the audit trail still captures the attempt.

Audit convention. Mutating commands (session-nudge, session-interrupt) write to the events table. Observation and dedicated-table commands (cycle, ingest) write to their own tables (manager_cycles, codex_events) — those tables ARE the audit trail. The plain-text conveyor audit <task> lists events rows only; cycle observations show up via conveyor replay <task> and the manager_cycles table.

Phase 6 Polish

Recent additions to streamline worker setup and observability:

  • start-worker convenience command for spawn-and-register in one call.
  • reconcile --stale-cycles-seconds N to customize the stale-cycle threshold.
  • Observability: terminal_capture_error / terminal_fresh fields in status/idle JSON; rollback_error in nudge/interrupt audit payloads; skipped_lines in cycle output's ingest field; stderr warnings on malformed event lines and audit-insert failures.

Phase 7 polish (2026-05-11)

Three quality-of-life additions following Phase 6 dogfood:

  • sessions --state — by default, conveyor sessions now hides Phase 1 backfill rows (pid IS NULL) and rows marked state='gone'. Use --state all to inspect every row, --state gone for completed/dead registrations, or --state active for the default view.
  • worker_alive / manager_alive in cycle output — every conveyor cycle JSON now includes these booleans, computed by os.kill(pid, 0) against the registered session pids. Surfaces silently-dead workers between cycles.
  • cycle --busy-wait-seconds N — exposes the pane-signal classifier's stuck-busy threshold (previously hard-coded at 90s) as a per-cycle flag.

Phase 8 classifier improvements (2026-05-12)

  • Recent event suppression for long_running_interruptible — the classifier now weighs recent_event_count (from ingest.new_events) alongside status_age_seconds. When a worker is actively emitting events (>= 10/cycle), the long_running_interruptible flag is suppressed—the worker is healthy despite stale status.json. This stops false positives on long-running tools (e.g. test suites, large file reads) that stay busy but quiet on status updates.

Dispatch and completion contracts

Dispatch is the mechanical core infrastructure between workers and managers. It routes facts and executes queued side effects; it does not decide whether work is correct, finish tasks, satisfy criteria, choose strategy, merge PRs, or route to human operators.

Current dispatch state:

  • dispatch --once routes bound worker task_complete signals from codex_events, not the pane classifier.
  • Routed completion notifications are deduplicated by source event id, recorded in routed_notifications, and threaded with correlation_id.
  • The session inbox is the same routed_notifications stream addressed by target_session_id: tmux push is optional transport. Codex app-based sessions should long-poll with manager-inbox --consume-next --wait --json or worker-inbox --consume-next --wait --json.
  • Template-backed continue_iteration deliveries include loop_policy in the inbox payload, with template name, current/max iteration, cleanup policy, required evidence, artifact requirements, and recommended tools. Codex app-based workers receive the same policy context by polling that tmux workers receive by push.
  • A target with a tmux session records delivery_mode='push' after successful tmux delivery. A target without tmux records delivery_mode='pull_required' and remains unconsumed until the addressed session polls and consumes it.
  • Consuming a mailbox item records dispatch_inbox_consumed telemetry with the notification id, signal type, delivery mode, target session role, and poll count, so manager/worker dispatcher handoffs are visible in audit evidence.
  • If doctor-self --json reports workerctl_on_path=false inside a Codex app session, run conveyor ... from the repository root or install the local wrapper with scripts/install-local --write. Its inside_tmux check describes the shell running doctor-self; for Codex app evidence, prefer the rollout JSONL path, lsof lookup, and the registration role.
  • When a live drill ingests a whole rollout, Dispatch may route older completion signals before the target proof turn. Either ingest after the target worker turn or have the manager consume/review older completion signals before deciding on the current one.
  • Explicit notify_manager and nudge_worker command rows can be processed by Dispatch with atomic claim/lease metadata, durable command_attempts, invalid-payload failure before side effects, and conservative tmux side-effect started/completed flags.
  • dispatch --watch continuously repeats the same mechanical polling loop with dispatcher identity and heartbeat telemetry; --watch-iterations N bounds the run and --lease-seconds N controls when attempted command claims become recoverable.
  • Replay/audit surfaces include routed notifications, command attempts, and correlation chains where the data exists. Routed notification replay includes delivery mode, source/target sessions, delivered timestamp, consumed-by session, and consumed timestamp. The dashboard groups bound-task dispatch correlation chains with command state, attempt counts, notification counts, inbox pending/consumed counts, decision/cycle ids, source event ids, suppressed-signal visibility, chronological ordering, and side-effect risk.
  • Dashboard manual QA should use conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard and visually confirm the Dispatch active banner, dispatcher id, heartbeat age, iteration, processed count, dry-run/live state, completion/routing/cycle conversation lane entries, command claim/attempt/delivery entries, inbox pending/consumed counts, pull-required notification evidence where applicable, and stale or not-observed warnings.

The adjacent completion-contract surfaces are separate from Dispatch:

  • Worker and manager acknowledgements persist the startup contract and can gate cycle/finish-task.
  • Epilogues are named post-completion steps that can gate finish-task.
  • Continuations persist worker-first and manager-independent "what's next" proposals plus a recorded reviewer verdict. The CLI enforces ordering, redaction, permission checks, reviewer separation metadata, and can run an independent restricted-context reviewer command through continuation-reviewer. Reviewer execution is additionally isolated with a temporary cwd, stripped environment, and macOS sandbox-exec denial of bound rollout/database reads plus direct reads under the active .codex-workers state root. That broader state-root denial applies only to the continuation-reviewer subprocess; normal replay, audit, export, and telemetry generation paths remain outside that sandbox.

Schema

SQLite database at .codex-workers/workerctl.db. Key tables:

  • sessions — Unified worker/manager registration.
  • bindings — Task ↔ worker session ↔ manager session.
  • tasks — Task records.
  • codex_events — Per-session JSONL events ingested from rollout files.
  • manager_cycles — One row per cycle invocation, with the full JSON payload as status_json.
  • events — Actuation audit log (session_nudged, session_interrupted, etc.).
  • commands — Durable side-effect command log, including Dispatch claim/lease metadata for queued command execution.
  • command_attempts — Per-dispatcher command execution attempts with side-effect started/completed flags and result/error payloads.
  • routed_notifications — Mechanical worker/manager routed facts and command delivery records, deduped and linked by correlation_id.
  • task_acknowledgements — Revisioned worker/manager startup contract acknowledgements.
  • epilogue_runs — Durable state for configured post-completion epilogue steps.
  • task_continuations / continuation_reviews — Worker/manager continuation proposals and reviewer verdicts for "what's next" review flows.
  • workers, managers — Legacy tables retained for read-only history.

conveyor db-doctor reports schema health. conveyor reconcile reports runtime drift (dead-pid sessions, dangling bindings, stuck tasks); add --apply to fix.

Migration from the Legacy Path

Earlier prototypes used a worker-first promotion flow where a worker was created first and a manager was then spawned to supervise it. Those legacy commands have been retired. The new path inverts the model: register two already-running Codex sessions, create a task, bind them, and let the manager Codex drive observation via cycle.

The legacy database tables (workers, managers) remain readable via audit, replay, and export-task for historical reference, but no kept CLI command writes to them. To resume work on a legacy task, call finish-task on it and start fresh via register-worker + register-manager + bind.

Tests

Release-candidate deterministic gate:

scripts/rc-check --skip-live-smoke-repeat

Full local release-candidate gate:

scripts/rc-check --with-live-smoke-repeat

Underlying deterministic checks:

python3 -m unittest discover -s tests -v
scripts/check-resource-warnings
python3 -m py_compile scripts/workerctl scripts/check-resource-warnings workerctl/*.py
scripts/package-smoke

For local parallel experiments, prefer:

scripts/run-unittests-isolated

This gives the process a temporary WORKERCTL_STATE_ROOT and a test namespace. The standard CI job remains serial.

GitHub Actions runs scripts/rc-check --skip-live-smoke-repeat and scripts/package-smoke on every push and pull request. The live smoke repeat remains local/manual because hosted runners may not have codex. The ResourceWarning gate intentionally fails on any ResourceWarning text in test output so finalization-time resource warnings cannot be hidden by a zero unittest exit status.

Live local smoke gate:

scripts/live-smoke

The live smoke requires macOS, tmux, codex, and rg. It starts disposable Codex worker/manager sessions, exercises pair, cycle, session-nudge, criteria mutation, transcript capture before stop, replay, mutation audit, and export, then verifies cleanup with sessions --state active and reconcile. It writes evidence under docs/live-qa-artifacts/ and should leave no active smoke sessions, tmux panes, dangling bindings, or stuck tasks.

For the focused manual coverage pass, use docs/manual-qa-checklist.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_conveyor-0.1.0.tar.gz (400.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_conveyor-0.1.0-py3-none-any.whl (239.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_conveyor-0.1.0.tar.gz.

File metadata

  • Download URL: agent_conveyor-0.1.0.tar.gz
  • Upload date:
  • Size: 400.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_conveyor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 356bd2f506a0c75427b181e171a8b085546c9974c8f43fc5732d79672a9363be
MD5 ab449ae183d512c3751c418cac78d0f2
BLAKE2b-256 2566a98716e17b31cf2c5064923adc912a4f3cdf4085e7fe0149c22b746911ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_conveyor-0.1.0.tar.gz:

Publisher: publish.yml on neonwatty/codex-terminal-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_conveyor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_conveyor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 239.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_conveyor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 51d5d02e2867ba1cdd1b1cca575c053f69e38092e8869a51e17207952881269a
MD5 ec9031b8df799013ee80b061fc507766
BLAKE2b-256 4f310658e2a259d9b423dab62a12b16b99ef598bb526dc7137474364d40adb51

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_conveyor-0.1.0-py3-none-any.whl:

Publisher: publish.yml on neonwatty/codex-terminal-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page