Skip to main content

Workflow-as-code journey testing compiler

Project description

Journey SDK

With AI, testing is the new coding.

Overview

Journey SDK is an AI-assisted workflow-as-code toolkit for testing long user journeys with branches, many steps, third-party services, and channels. Those journeys have always been painful to automate: teams duplicate similar test cases, rerun slow setup, babysit long waits, wire throwaway inboxes and callbacks, and hand-maintain browser selectors. Journey turns that into one Python journey that compiles paths, replays from durable steps, resumes long waits, uses Journey Cloud tools, and integrates natively with AI coding assistants.

The core value is:

  • One journey spec for all paths: use branch() inside ordinary Python so one journey compiles into the executable cases users can take, without copying shared setup into separate tests.
  • Replay from a step: use branch(start_from=...) so later branch cases start from a known durable step boundary instead of replaying expensive browser, account, cart, or service setup.
  • Cloud tools for tests outside the browser: use helpers under journeysdk.tools for hosted inboxes, webhook endpoints, browser prompting, Docker snapshots, and other documented tool surfaces while app-specific integrations keep their own code.
  • Interrupt long waits, resume later: run with --state so a test can stop while waiting on async work or a third-party service and continue later from saved step state.
  • AI-generated steps with page.prompt(...): describe browser behavior in natural language, use prompt memory for faster repeat runs, and keep tests editable by the same AI coding assistants that write application code.

That makes Journey SDK useful for flows such as:

  • testing checkout paths such as card versus wallet payment from the same cart setup
  • waiting for email, SMS, voice, webhook, payment, or third-party side effects without keeping a laptop busy
  • asking an LLM-driven browser step to complete UI work while prompt memory reduces repeated model calls
  • iterating on one failed late step without rerunning the whole journey from the beginning

Who it's for

  • developers and test engineers who want one Python journey for all meaningful user paths
  • QA teams replacing duplicated browser/API/channel tests with compiled journey cases
  • platform teams testing lifecycle flows that cross email, SMS, voice agents, payments, webhooks, and third-party APIs
  • AI coding agents that need to generate, run, and iterate on tests while implementing features

AI Agent Support

Use the journey-developer skill when an AI coding agent needs to create, execute, debug, or maintain Journey SDK journeys.

Install

Install The Python Package

Install Journey SDK into an existing environment:

pip install journey-sdk

Or add it to a uv-managed project:

uv add journey-sdk

For authoring, import only the Journey primitives you use:

from journeysdk import branch, journey, step

Install The CLI

Run the CLI once without installing it:

uvx --from journey-sdk journey --help

Install a persistent CLI with uv:

uv tool install journey-sdk
journey --help

If your shell cannot find journey yet, refresh the shell PATH hook:

uv tool update-shell

Install the CLI inside a virtual environment with pip:

python -m pip install journey-sdk
journey --help

Use the CLI from a project-local environment:

uv add journey-sdk
uv run journey --help

Playwright and LangChain are included in the default install. The first browser step automatically downloads Chromium in the active environment, so there is no separate playwright install step for the standard Journey SDK flow.

See docs/00-installation-and-cli.md for the full CLI installation guide, local editable installs, and local wheel smoke testing.

Develop Locally

uv sync --extra dev
uv run pytest

See CONTRIBUTING.md for editable-install workflows, the local package smoke test, and the manual publish checklist.

Authoring model

Write one journey in sequential Python with step, branch, branch(start_from=...), --state, and optional step retries via step(..., retry=..., retry_delay=..., retry_from=...). Decorate module-level journey entrypoints with @journey. Journey SDK compiles that authoring flow into linear executable cases so teams can cover branching user paths without duplicating test code.

Step functions are plain callables: pass every required input as explicit arguments, and return any value that later steps or resumed runs must reuse. The step boundary is the durable unit: successful steps can be reused, interrupted or retried steps restart from the top with saved inputs, and branch(start_from=...) creates a replay anchor that lets each branch reuse the same saved setup.

For example, one checkout journey can create a cart once, exercise card and wallet payment paths from that cart, use page.prompt(...) to drive the browser, wait for email and SMS, then verify the returned order id:

from journeysdk import branch, journey, step
from journeysdk.tools.email import get_email_inbox
from journeysdk.tools.playwright import open_page


def checkout(cart, inbox, method) -> dict[str, object]:
    page = open_page(app_checkout_url(cart))
    try:
        return page.prompt(
            f"Check out with {method}. Send receipts to {inbox.address}.",
            memory="checkout",
            output={"order_id": "The id of the created order."},
        )
    finally:
        page.__exit__(None, None, None)


@journey
def checkout_journey() -> None:
    inbox = step(get_email_inbox())
    cart = step(create_cart, inbox.address)

    if branch(start_from=cart):
        order = step(checkout, cart, inbox, "card")
    elif branch(start_from=cart):
        order = step(checkout, cart, inbox, "wallet")

    messages = step(wait_for_email_and_sms, order["order_id"], inbox)
    step(mark_order_ready, order["order_id"], messages)

get_email_inbox() and open_page() are documented SDK tools. Functions such as create_cart, wait_for_email_and_sms, and mark_order_ready are app-specific integration code. Voice agents, SMS, WhatsApp, payments, and third-party APIs should stay app-specific unless the docs describe an official helper.

Retryable steps can poll for async effects, rerun from the step itself, or replay from an earlier step. They are retried when they raise an exception and retry is greater than 0. The explicit defaults are retry=0, retry_delay=5, and retry_from=None; when retries are enabled and retry_from is omitted, the current step is retried.

Glossary

  • Journey: one decorated Python function that describes the full user journey under test.
  • Case: one linear executable path compiled from a journey, including one selected inline if branch() / elif branch() choice where the journey can split.
  • Step: one step(...) call and the plain Python function it runs.
  • Step boundary: the boundary before and after a step where Journey can save progress, stop, retry, or resume.
  • State file: the --state file that stores selected cases, completed case reports, active progress, saved step bindings, and branch-anchor snapshots.
  • Saved step binding: stored step inputs, metadata, and optional result that Journey can use when replaying or resuming.
  • Dirty step: the step that had started but had not completed when execution was interrupted.
  • Replay: rerunning part of a case from a step boundary while reusing saved values before that boundary.
  • Replay boundary: the step index where replay starts.
  • Replay anchor: the step label reported for a targeted branch run or used by retry and branch replay.
  • Branch-anchor snapshot: saved records, step bindings, retry counters, and attempt counters captured after an anchor step reaches post-exit.
  • Branch: an inline if branch(): ... elif branch(): ... arm that compiles into a separate case.
  • Targeted run: a --step LABEL run that executes the one case reaching that label and stops after it. A reported replay_anchor identifies the branch step anchor, but targeted runs do not skip directly to that anchor.
  • Step lifecycle: initialization, execution, storage, pre-exit, exit, and post-exit for one step attempt.
  • Develop-step pause: a --develop-step LABEL stop at pre-exit after the selected step has been stored and before returned handles are exited, used for quick edit-run loops.
  • Pause action: continue or retry after a develop-step pause.
  • Rehydration: storing and restoring values that cross replay boundaries.
  • Rehydratable value: a value with __store__ and __restore__ hooks for custom replay storage.

Journey Rehydration Protocol

When retries, --state, or step-started branches need to reuse a step value across a replay boundary, Journey rehydrates that value from SDK-managed saved step bindings. Any step argument or return value that may cross one of those boundaries must be pickle-serializable or implement the Journey rehydration protocol:

class ExternalState:
    def __store__(self, context):
        return {"payload": "pickle-serializable"}

    @classmethod
    def __restore__(cls, payload, context):
        return cls(...)

__store__(context) returns a pickle-serializable payload. Journey stores the payload together with an importable reference to the value's class, so custom rehydratable classes must be defined at module top level, not inside a function. __restore__(payload, context) receives that payload and returns the restored step value.

The context object describes where and why the value is being stored or restored. Use context.artifact_root for larger file artifacts. Inspect context.boundary_kind and context.boundary_id when a value needs different behavior for active state, step bindings, or branch-anchor snapshots.

Restored values should be usable as later step inputs. For values backed by live external resources, store enough data to reopen the resource explicitly in the next step instead of trying to pickle the live resource itself. Official tools follow this pattern: JourneyPlaywrightPage stores browser state, and later steps reopen it with open_page(saved_page).

Step Lifecycle

Official tools that open live resources inside a step should return an object with the standard context-manager __exit__(exc_type, exc, traceback) method. Each step attempt has six phases:

  1. Initialization: Journey restores saved values, calls __restore__ hooks when needed, and resolves the arguments passed to the step function.
  2. Execution: Journey calls the step function. The function may succeed, fail, retry, or be interrupted.
  3. Storage: Journey calls __store__ hooks when needed and stores the step inputs plus the returned value in the state file.
  4. Pre-exit: --develop-step pauses here after a matched step, with returned handles still live.
  5. Exit: Journey discovers returned __exit__ handles and closes them before the next step runs.
  6. Post-exit: a graceful CLI Ctrl-C stops here after the completed step has been saved and exited.

In noninteractive --develop-step mode, Journey stores the returned value, pauses at pre-exit, then closes returned handles before the command exits. With --develop-step --interactive, Journey shows the continue/retry prompt while those handles are still live, then closes them after the user chooses continue or retry, or cancels the prompt.

Use this pattern when a tool owns a resource that should not outlive the step attempt:

class ResourceHandle:
    def __init__(self):
        self._resource = acquire_resource()
        self._closed = False

    def __exit__(self, exc_type, exc, traceback):
        if self._closed:
            return
        self._closed = True
        self._resource.close()


def open_resource():
    return ResourceHandle()


def use_resource():
    handle = open_resource()
    handle.do_work()
    return handle

Journey looks for lifecycle handles in the direct step result and inside built-in tuple, list, and dict containers. It de-duplicates handles by object identity and calls __exit__ in reverse discovery order. On successful step returns, __exit__ receives (None, None, None). Journey ignores the return value, so __exit__ cannot suppress cleanup failures.

The important constraint is visibility: Journey only auto-exits handles it can see in the returned value graph. A live local resource that is not returned is outside this protocol. Either return the handle, return a container that contains it, or close it explicitly with local try / finally code.

Keep lifecycle methods idempotent, and close only resources owned by that tool call. If the step returns a value that must survive retries, --state, or branch replay, that value should also implement the Journey rehydration protocol above; do not rely on pickling live resources. JourneyPlaywrightPage is the canonical example because it implements both protocols: __exit__ closes the live browser objects at step exit, while __store__ / __restore__ save enough browser state for a later step to reopen the page explicitly.

Official tools are ordinary Python helpers that return step callables or serializable helper values. For example, the webhook tool can acquire a Journey Cloud-hosted endpoint before the app under test sends to it:

from journeysdk import step
from journeysdk.tools.webhook import get_webhook_endpoint, wait_for_webhook_request

endpoint = step(get_webhook_endpoint(path="/invoice-paid"))
step(send_invoice_paid_callback, endpoint.url)
request_payload = step(
    wait_for_webhook_request(path="/invoice-paid", timeout=1, poll_interval=0.1),
    endpoint,
    retry=3,
    retry_delay=1,
)

The official email tool follows the same step-oriented model and uses the default hosted inbox assigned to the active Journey Cloud API key:

from journeysdk import step
from journeysdk.tools.email import get_email_inbox, send_email, wait_for_email

inbox = step(get_email_inbox())
step(send_email(subject="Welcome", text_body="Hello from Journey"))
message = step(
    wait_for_email(subject_contains="Welcome", timeout=1, poll_interval=0.1),
    inbox,
)

The Docker tool can start a local Compose app as a step value and pair a step anchor with exact rollback of container filesystems plus Docker-managed volume contents. DockerComposeStack already implements the rehydration protocol:

from journeysdk import branch, step
from journeysdk.tools.docker import run_docker

stack = step(run_docker(compose_file="docker-compose.yml"))
baseline = step(capture_baseline_state, stack)
if branch(start_from=baseline):
    step(mutate_compose_app, stack)
elif branch(start_from=baseline):
    step(assert_compose_logs, stack)

Current Docker snapshots are intentionally strict: bind mounts, external volumes, read-only mounts, and multi-container services are rejected so restore can stay exact and predictable.

from journeysdk import step

created = step(create_subscription)
step(
    invoice_paid,
    created,
    retry=15,
    retry_delay=2,
    retry_from=created,
)

The Playwright tool packages one page into a resumable step value:

from journeysdk.tools.playwright import (
    JourneyPlaywrightPage,
    open_page,
)

def login_and_capture_session():
    page = open_page("https://app.example/login")
    page.get_by_role("button", name="Sign in").click()
    page.wait_for_url("**/dashboard")
    return page

def assert_dashboard(session: JourneyPlaywrightPage) -> JourneyPlaywrightPage:
    page = open_page(session)
    assert page.url.endswith("/dashboard")
    return page

The same live page can also run a bounded LLM action loop. By default, page.prompt(...) returns a plain string. Pass output=... when you want LangChain structured output as a dictionary:

from journeysdk.tools.playwright import open_page

def capture_popup_title() -> dict[str, object]:
    page = open_page("https://app.example/login")
    try:
        return page.prompt(
            'click on a "Sign in" button and get the title of the opened popup',
            model="anthropic:claude-sonnet-4-5",
            memory="sign-in-popup",
            output={
                "popup_title": "The title of the opened popup.",
            },
        )
    finally:
        page.__exit__(None, None, None)

Set provider credentials with the provider's normal environment variables such as OPENAI_API_KEY or ANTHROPIC_API_KEY, and either pass a LangChain model identifier like model="anthropic:claude-sonnet-4-5" or set JOURNEY_PLAYWRIGHT_PROMPT_MODEL. The optional memory="sign-in-popup" argument stores a replayable fast path from successful runs in sign-in-popup.memory.md beside the journey source; pass --no-memory when you want a run to ignore and avoid updating prompt memory, or --no-memory-update when you want to read existing memory without writing new updates. The optional output={...} argument maps field names to descriptions or JSON-schema fragments and stores a dict[str, object] return value instead of plain text. If the browser task cannot be completed because the page shows a blocking app state, such as a locked account or invalid credentials, page.prompt(...) raises RuntimeError instead of returning successful prompt output.

Interrupted executions can also be resumed with journey --state run.state. When state persistence is enabled, Journey stores the step inputs and outputs it may need to replay later, so those values must be pickle-serializable. In the CLI, the first Ctrl-C during an active step lets that step finish storage and exit before the command stops; the next run continues after that step. Press Ctrl-C a second time to stop inside the dirty step, which restarts from the top later with the same inputs. The same replay rule applies to steps that may be replayed because of retries or branch(start_from=...). The state file is kept after the run finishes, so rerunning the same command can reuse that saved progress; delete the file when you want to start fresh.

How it works

  1. Write one journey spec in Python using journey, step, branch, and documented helpers from journeysdk.tools.
  2. Run journey, which compiles branch choices into linear executable cases and executes them.
  3. Use branch(start_from=...), retries, and state files to replay from durable step boundaries instead of rerunning every expensive setup step.
  4. Use --state when a long test may be interrupted while waiting on async work or a third-party service.
  5. Use --step or --develop-step when you only want the case that reaches one target step label.
  6. Use page.prompt(..., memory=...) when a browser step is easier to describe than hand-maintain with selectors.

Journey-owned CLI output is emitted on stdout through the Journey logger. The default pretty format is meant for humans at a terminal, for example:

      create_customer_profile  ok attempt=1 duration=0.012s

Use --output structured for the logfmt-style [journey] time=... component=... event=... format, or --output jsonl when tooling should consume newline-delimited JSON log records. Use --log-level debug|info|warning|error|off to tune output. The default is info; --log-level off suppresses all Journey-owned output.

CLI commands discover functions annotated with @journey in the current directory. Use --file to scope to one file, --journey to scope to one decorated function name, and --step to execute only the single flow that reaches a target step label. A targeted run still starts from the selected case's beginning; a replay_anchor in the report identifies the branch step anchor but does not mean Journey skipped shared setup. Use --develop-step to run that same single case in development mode. By default it executes one target step, stores state, prints the paused result, and exits so coding agents can iterate with synchronous command calls. Run the same --develop-step LABEL --state dev.state command to retry that step from its replay boundary, or target the next step with the same state file to continue. Add --interactive to keep the current process open and prompt after each paused step. Develop-step retries are unlimited and do not spend the step's configured step(..., retry=...) budget. Each retry or continue reloads and recompiles the journey file first, so edits to the current step, later steps, or future journey structure are picked up. If the already-run part of the selected case changed, Journey starts that case over so the reused prefix is not stale.

Core principles

  • One journey spec for all paths: author the full user journey once and let branch() compile the executable cases.
  • Replay from a step: use branch(start_from=...), retries, and targeted runs to reuse saved setup from durable step boundaries.
  • Interrupt long waits, resume later: keep long journeys restartable by saving progress between steps with --state.
  • Cloud tools for external tests: integrate hosted inboxes, webhooks, browser pages, Docker snapshots, and app-specific channel or service code without forcing them into a custom DSL.
  • AI-generated steps with page.prompt(...): describe browser work in natural language and let prompt memory make repeat runs faster.
  • Native with AI coding assistants: keep tests in ordinary Python files so coding agents can generate, edit, run, and debug them beside application code.

Quick start

Execute all compiled cases:

uv run journey

The default output shows the compiled cases first, then a concise execution timeline. Add --output structured when you need logfmt fields, or --output jsonl for one parseable JSON object per line.

Execute with persisted state so Ctrl-C can be resumed later:

uv run journey --state run.state

Execute only the case that reaches a target step label:

uv run journey --step assert_local_file_contents

Execute one target case in development mode and stop after the target step:

uv run journey --develop-step assert_local_file_contents --state dev.state

Rerun that command to retry the same step after editing code. To continue, target the next step with the same state file. For a human prompt loop, add --interactive:

uv run journey --develop-step assert_local_file_contents --state dev.state --interactive

The cloud webhook and email helpers use JOURNEY_CLOUD_API_KEY and JOURNEY_CLOUD_BASE_URL at execution time. Point those variables at your hosted cloud control plane or any compatible service:

export JOURNEY_CLOUD_API_KEY=journey-demo-key
export JOURNEY_CLOUD_BASE_URL=https://journey-cloud.example.test

The official webhook and email SDK tools require Journey Cloud; the SDK no longer hosts local webhooks or talks directly to SMTP/IMAP servers.

Journey Cloud authenticates SDK control-plane calls with Authorization: Bearer $JOURNEY_CLOUD_API_KEY. The same pattern should apply to all Journey cloud tools: the first API key that reserves a cloud-managed handle becomes its owner. That means a webhook path, mail inbox, or similar cloud-managed identifier belongs to the API key that claimed it first, and other API keys should not be able to reserve or manage that same handle afterward.

Testing

Run the full framework suite from this root:

uv run pytest

Smoke test the built package and CLI locally:

./scripts/smoke_test_package.sh

See docs/README.md for the runnable handbook. It starts with one journey spec for all paths, then walks through replay from a step, retries, interrupting long waits and resuming later, browser automation with page.prompt(...), Journey Cloud tools, and debugging failure modes with code, commands, and expected CLI output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

journey_sdk-0.1.0.tar.gz (156.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

journey_sdk-0.1.0-py3-none-any.whl (112.7 kB view details)

Uploaded Python 3

File details

Details for the file journey_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: journey_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 156.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for journey_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 af4db898d09e6038d4bebd711ec3291070fc74992d085fdd7acb15cce08f8a19
MD5 14716f355911aa63adae5bb3548ac70e
BLAKE2b-256 5c828e4b129688d67f0f47ca352c89647b738fa86e4b512ad1ba832a4d52e46a

See more details on using hashes here.

File details

Details for the file journey_sdk-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for journey_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f93627784133713c9fb04d91f6b1617d9766a60944aa9d5857bdcfb206d3f93
MD5 92c74259a9aadf4936ba35b678b7c4a3
BLAKE2b-256 dd7581c9f55b8a3d7d7237871217687b2eb73f9a3ee8160413058733be4e4862

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page