Skip to main content

CLI toolchain for authoring and running deterministic, resumable agent workflows.

Project description

checkpointflow

Deterministic, resumable workflows for agents, operators, and shells.

checkpointflow lets you define a workflow as a portable state machine in YAML, validate it before execution, run it through a stable CLI, pause for explicit external input, and resume later without relying on hidden chat history or process-local state.

It is built for workflows that mix:

  • local CLI steps
  • structured branching
  • human approval
  • bounded agent judgment
  • durable handoff across shells, tools, and operators

Why checkpointflow

Most agent workflows break down in the same places:

  • important state lives only in a chat thread
  • pause and resume are implicit instead of modeled
  • outputs are hard to validate before taking the next step
  • another agent or operator cannot reliably continue the work

checkpointflow solves that by making the workflow contract explicit:

  • workflows are written as versioned YAML documents
  • inputs, outputs, and resume events are validated with JSON Schema
  • stdout returns stable machine-readable JSON envelopes
  • waiting is explicit and durable through await_event
  • state is persisted under ~/.checkpointflow/ by default
  • workflow files can live anywhere on disk

What You Get

  • A narrow, predictable runtime contract that is easy for agents to drive.
  • Stable CLI commands for validate, run, resume, inspect, status, cancel, flows, guide, and gui.
  • Explicit pause/resume semantics instead of hidden conversational continuation.
  • Deterministic control flow driven by persisted inputs and step outputs.
  • A default user-scoped state store at ~/.checkpointflow/.
  • A workflow format that is simple enough to derive from human-agent conversation.

Installation

Install with uv:

uv tool install checkpointflow

Then verify the CLI is available:

cpf --help
cpf guide

If you want to use it inside an existing Python project:

uv add checkpointflow

Quick Start

Create a workflow file anywhere you want, for example workflow.yaml:

schema_version: checkpointflow/v1
workflow:
  id: publish_confluence_change
  name: Publish Confluence change
  version: 1.0.0
  inputs:
    type: object
    required: [page_id, source_file]
    properties:
      page_id:
        type: string
      source_file:
        type: string

  steps:
    - id: plan
      kind: cli
      command: confpub plan --page-id ${inputs.page_id} --file ${inputs.source_file} --format json
      outputs:
        type: object
        required: [plan_file, summary]
        properties:
          plan_file: { type: string }
          summary: { type: string }

    - id: approval
      kind: await_event
      audience: user
      event_name: change_approval
      prompt: Review the proposed page update and approve or reject it.
      input_schema:
        type: object
        required: [decision]
        properties:
          decision:
            type: string
            enum: [approve, reject]
          comment:
            type: string
      transitions:
        - when: ${event.decision == "approve"}
          next: apply
        - when: ${event.decision == "reject"}
          next: rejected

    - id: apply
      kind: cli
      command: confpub apply --plan ${steps.plan.outputs.plan_file} --format json
      outputs:
        type: object
        required: [page_url]
        properties:
          page_url: { type: string }

    - id: rejected
      kind: end
      result:
        status: rejected

  outputs:
    type: object
    properties:
      page_url:
        from: steps.apply.outputs.page_url

Create an input file:

{
  "page_id": "123456",
  "source_file": "docs/architecture.md"
}

Validate the workflow:

cpf validate -f workflow.yaml

Run it:

cpf run -f workflow.yaml --input @input.json

If the workflow reaches a wait point, checkpointflow exits with code 40 and prints a waiting envelope on stdout:

{
  "schema_version": "checkpointflow-run/v1",
  "ok": true,
  "command": "run",
  "status": "waiting",
  "exit_code": 40,
  "run_id": "01JQEXAMPLE00000000000002",
  "current_step_id": "approval",
  "wait": {
    "kind": "external_event",
    "audience": "user",
    "event_name": "change_approval",
    "prompt": "Review the proposed page update and approve or reject it.",
    "input_schema": {
      "type": "object",
      "required": ["decision"],
      "properties": {
        "decision": {
          "type": "string",
          "enum": ["approve", "reject"]
        },
        "comment": {
          "type": "string"
        }
      }
    },
    "instructions": [
      "Ask the intended audience for input using the prompt.",
      "Collect JSON that matches input_schema.",
      "Resume with the provided run_id and event_name."
    ],
    "resume": {
      "command": "cpf resume --run-id 01JQEXAMPLE00000000000002 --event change_approval --input @response.json"
    }
  }
}

Resume with explicit input:

cpf resume --run-id 01JQEXAMPLE00000000000002 --event change_approval --input @response.json

Core Ideas

Explicit pause and resume

checkpointflow uses a single pause primitive:

  • await_event

That same primitive covers:

  • user approval
  • agent decision
  • callback from another system
  • manual operator continuation

This keeps the runtime small and the resume contract uniform.

Deterministic orchestration

Control flow is driven by persisted workflow inputs, validated event payloads, and step outputs. The orchestrator does not depend on hidden prompt state, wall-clock reads during evaluation, or direct network results inside expressions.

Agent-agnostic execution

A workflow can be driven by:

  • Codex
  • Claude Code
  • GitHub Copilot agents
  • CI jobs
  • shell scripts
  • a human operator

The workflow remains the same because the runtime contract remains the same.

Stable CLI Surface

cpf init
cpf guide
cpf validate -f workflow.yaml
cpf flows                             # list available workflows
cpf flows --detail <id-or-name>       # show details for one workflow
cpf run -f workflow.yaml --input @input.json
cpf resume --run-id <run_id> --event <event_name> --input @event.json
cpf status --run-id <run_id>
cpf inspect --run-id <run_id>
cpf cancel --run-id <run_id> --reason "..."
cpf gui

Workflow Model

The runtime is intentionally opinionated.

The core step kinds are:

  • cli — run a shell command
  • await_event — pause for external input
  • end — terminate with a result

Additional step kinds for composition and control flow:

  • api — HTTP request with JSON response capture
  • switch — conditional branching
  • foreach — iterate over a list
  • parallel — concurrent branch execution
  • workflow — invoke a sub-workflow

See cpf guide for full documentation of each step kind.

Storage Model

Workflow location and runtime state are separate concerns.

  • Workflow files can live anywhere on disk.
  • Runtime state defaults to ~/.checkpointflow/.
  • Run metadata lives in ~/.checkpointflow/runs.db.
  • Per-run artifacts live under ~/.checkpointflow/runs/<run_id>/.
  • The workflow source path is recorded as metadata, but it does not determine where state is stored.

This makes workflows easy to keep in repos, temp directories, shared folders, or generated output locations without coupling them to runtime storage.

Exit Codes

0   Success
10  Validation error
20  Runtime error
30  Step failed
40  Waiting for external event
50  Cancelled
60  Persistence error
70  Concurrency or lock error
80  Unsupported feature
90  Internal error

40 is not a failure. It means the workflow is paused and ready to be resumed with an explicit event payload.

Why Not X?

Tool What it does well Where checkpointflow differs
Bash scripts Simple, ubiquitous No pause/resume, no schema validation, no structured handoff between operators
GitHub Actions / CI Great for build pipelines Tied to Git events; no interactive pause for human/agent judgment mid-run
AWS Step Functions Durable, scalable orchestration Cloud-locked, heavyweight; overkill for local agent-driven workflows
Temporal Long-running workflows with retries Requires a server cluster; designed for distributed systems, not CLI-driven agent loops
Arazzo (OpenAPI Workflows) API call sequencing with OpenAPI Focused on HTTP orchestration; no CLI steps, no agent audience, no local state
LangGraph / CrewAI Multi-agent orchestration State lives in memory/chat; implicit continuation; hard to validate, resume, or hand off
Plain prompt chaining Zero setup State trapped in context window; no durability, no explicit pause, no schema contract

checkpointflow occupies a specific gap: deterministic orchestration with explicit pause/resume, schema-validated handoff, and agent-agnostic driving through a local CLI. It is not trying to replace cloud workflow engines or agent frameworks. It is the layer that makes a workflow portable, resumable, and auditable without requiring infrastructure.

Who It Is For

checkpointflow is a good fit if you want to:

  • turn a conversation into a durable workflow file
  • run mixed human-agent workflows without hidden continuation
  • encode operational runbooks that can be resumed later
  • keep the reliable parts of an agent workflow outside the prompt
  • hand work from one agent or operator to another safely

It is not trying to be a general-purpose programming language or a heavyweight distributed orchestration platform.

Documentation

Development

This project is developed with strict TDD and uses uv as the package manager.

uv sync --group dev
uv run python scripts/install_git_hooks.py
uv run pytest
uv run ruff check .
uv run mypy

GitHub Actions enforces the same quality gate in CI:

  • uv lock --check
  • ruff format --check
  • ruff check
  • mypy
  • cross-platform pytest
  • uv build plus a built-wheel smoke test

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checkpointflow-1.10.0.tar.gz (804.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

checkpointflow-1.10.0-py3-none-any.whl (821.8 kB view details)

Uploaded Python 3

File details

Details for the file checkpointflow-1.10.0.tar.gz.

File metadata

  • Download URL: checkpointflow-1.10.0.tar.gz
  • Upload date:
  • Size: 804.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for checkpointflow-1.10.0.tar.gz
Algorithm Hash digest
SHA256 ada2278ec9e9aab558e789d97e036b2005cc8b211b3a75fb84cdeae2ce6f247d
MD5 707db0cb94c4b3555a28ec12633cba68
BLAKE2b-256 c111468773cb2ea2bba7fe7ab79f8c73f3ae0b07cd6864195a822e7fcaf0b026

See more details on using hashes here.

Provenance

The following attestation bundles were made for checkpointflow-1.10.0.tar.gz:

Publisher: publish.yml on ThomasRohde/checkpointflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file checkpointflow-1.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for checkpointflow-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 234f45d56bd34cb1ac342274049bde48a62836e271c85bbf1d450de913b64fa2
MD5 1a32c18ed5c2c6ae099004e8ccbe048a
BLAKE2b-256 4e148b82de9b51abd4a28fe15ab15226980bf6f50afd8d77903ee6db227b92f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for checkpointflow-1.10.0-py3-none-any.whl:

Publisher: publish.yml on ThomasRohde/checkpointflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page