Wendell CLI for playbook-generated agent test suites and run uploads
Project description
Wendell CLI
Wendell is a Python CLI for evaluating agents against Wendell-managed worlds and scenario packs.
The first version is advisory by default: it reports scores, captures traces, and returns a successful process exit unless the project explicitly enables blocking gates.
Intended split
- Wendell system: creates worlds, versions scenario packs, owns rubrics, stores traces, and reports regressions.
- Wendell CI runner: runs in a repo or CI job, fetches or reads a scenario pack, invokes the customer's agent adapter, captures traces, evaluates gates, uploads results, and prints a concise summary.
Install
Install the latest published CLI:
uv tool install wendell
Alternative installers:
pipx install wendell
python3 -m pip install --user wendell
The packaged CLI includes Wendell's local compiler/runtime engine. It does not
require an OpenAI, OpenRouter, Anthropic, or other LLM provider key to install,
register, compile, or run tests. Your own agent_command may need provider
credentials if your agent uses an LLM.
For unreleased changes from current main:
python3 -m pip install "git+https://github.com/croppia/wendellai.git#subdirectory=wendell-ci"
For release validation from this monorepo, run:
scripts/smoke_wendell_production_app_install.sh
That smoke creates a clean app directory, installs the packaged wendell CLI
into a fresh virtualenv, runs wendell init, and runs wendell test without
using repo-local PYTHONPATH. The smoke adapter asserts the customer-facing
payload uses wendell.agent_input.v1 and does not expose rubrics, hidden facts,
source lineage, terminal outcomes, or success/failure criteria.
The package installs the wendell command. It also installs the temporary local-wendell alias for existing local dogfood scripts.
Local development
cd wendell-ci
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest
wendell --help
Login
For a first-time Wendell runner, install and register the CLI:
curl -fsSL https://www.wendellai.com/install | bash
wendell register
Registration creates a short-lived runner session, asks for email/password and agent details, and stores a scoped local runner credential.
For local development with an existing InkPass API key, store it once:
wendell login --api-key-stdin --validate
wendell auth status
Credentials are stored at ~/.config/wendell/credentials.json by default, or $WENDELL_CONFIG_HOME/credentials.json when set. The directory is created with 0700 permissions and the credentials file with 0600 permissions.
CI should continue to use WENDELL_INKPASS_API_KEY; environment variables take precedence over stored credentials.
Minimal config
project = "support-agent"
mode = "advisory"
api_key_env = "WENDELL_INKPASS_API_KEY"
world = "world_support_ops_v1"
world_version = "2026-05-08.1"
scenario_pack = "smoke"
scenario_pack_version = "1.0.0"
agent_command = "python examples/simple_cli_agent.py"
upload_traces = true
[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
Production app test loop
Use wendell test as the local and CI entry point for production agents. Unlike
the legacy top-level command, wendell test requires a real agent_command and
will not silently fall back to a demo agent.
Fast start from an application repo:
wendell init --project support-agent
export WENDELL_APP_AGENT_COMMAND="python scripts/run_my_agent.py"
wendell test --config wendell.toml
Your adapter receives a production-facing JSON payload. Wendell does not send the scoring rubric, hidden facts, expected outcomes, or success/failure criteria to the agent process.
{
"schema_version": "wendell.agent_input.v1",
"task": "Respond as a support agent for Support Agent.",
"policies": ["Inspect the request before completing the workflow."],
"transcript": [{"speaker": "customer", "text": "I need help with this request."}],
"available_tools": [
{
"name": "workflow_console.inspect_request",
"arguments": {"case_id": "str"},
"description": "inspect request"
}
],
"case": {"case_id": "case_123", "request": "I need help with this request."},
"scenario": {"id": "case_123", "kind": "realistic"}
}
The adapter must print JSON:
{
"message": "I inspected the request and recorded the required evidence.",
"tool_calls": [
{"name": "workflow_console.inspect_request", "args": {"case_id": "case_123"}}
],
"metrics": {}
}
wendell init creates:
playbook.md: the business behavior contract to edit with your team.wendell/suite.json: the generated, reviewable test suitescripts/wendell_agent_adapter.py: the adapter boundary for your production agentwendell.toml: the CI-ready test config
The generated adapter is intentionally not a fake passing agent. It requires
WENDELL_APP_AGENT_COMMAND or replacement code that calls your production agent
and maps its real actions back to Wendell tool names. For install-only smoke
validation, run wendell init --example-agent; do not use that adapter as a
production test.
To compile an existing Playbook instead:
wendell playbook compile \
--source ./playbook.md \
--name "Support Agent Regression" \
--workflow-summary "Evaluate support agents against the approved workflow." \
--output ./.wendell/support-agent-suite.json \
--config-output ./wendell.toml \
--agent-command "python scripts/run_agent_adapter.py"
wendell test --config wendell.toml
wendell test --config wendell.toml --json
wendell playbook compile is a local, deterministic bootstrap path. It turns a
markdown Playbook into a reviewable customer-input suite JSON using the same
Playbook assertion compiler that powers generated suites. Teams can commit the
generated suite, review it in code review, and run it in CI.
For CI, set mode = "blocking" so failed gates return a nonzero exit code:
project = "support-agent"
mode = "blocking"
worldsim_input = "configs/customer_inputs/support_agent_suite.json"
agent_command = "python scripts/run_agent_adapter.py"
upload_traces = false
[gates]
suite_min_score = 0.90
scenario_min_score = 0.85
critical_failures_allowed = 0
Failure output includes the failed scenario, gate reason, incomplete workflow steps, Playbook rule id, assertion id, trajectory event indexes, and fix prompts when the suite provides assertions.
What to commit
Commit these files:
playbook.md.wendell/suite.jsonwendell.toml- your production adapter, usually
scripts/wendell_agent_adapter.py
Do not commit secrets, raw customer transcripts, API keys, or generated run outputs. Keep production credentials in your CI secret store.
GitHub Actions
name: Wendell Agent Tests
on:
pull_request:
push:
branches: [main]
jobs:
wendell:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install Wendell
run: python -m pip install wendell
- name: Run Wendell
env:
WENDELL_APP_AGENT_COMMAND: python scripts/run_my_agent.py
run: wendell test --config wendell.toml
For hosted Wendell reporting, keep upload_traces = true and provide
WENDELL_INKPASS_API_KEY from CI secrets. Add api_url only when targeting a
staging, preview, or local API.
Wendell dogfood suite
This repo includes a production-style dogfood suite for Wendell's own Playbook-to-test-suite behavior:
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
The suite lives at
configs/customer_inputs/wendell_playbook_compiler_quality_input.json and is
derived from configs/playbooks/wendell_playbook_compiler_quality.md.
The real Wendell Playbook compiler adapter is expected to pass:
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
The intentionally broken agent is expected to fail with rule-linked evidence:
uv run --project wendell-ci --extra dev --with-editable . wendell test \
--config wendell.dogfood.toml \
--agent-command "python examples/wendell_playbook_compiler_broken_agent.py"
The intended production flow is:
local-wendellruns on the developer machine or CI worker.- It fetches the pinned world, scenario pack, rubrics, and scoring contract from Wendell.
- It invokes the agent locally through an adapter such as
agent_command. - It captures local traces, redacts them, and uploads traces/results back to Wendell.
- Advisory mode exits
0; blocking mode exits nonzero only when explicitly enabled.
Remote uploads authenticate with an InkPass API key sent as X-API-Key. The key needs at least wendell:worlds:read, wendell:runs:create, and wendell:runs:read.
Local worldsim dogfood config
While Wendell Cloud APIs are still taking shape, the runner can call the existing local worldsim.services layer directly:
project = "workspace-access-agent"
mode = "advisory"
world = "workspace_access_support"
scenario_pack = "smoke"
worldsim_input = "../configs/customer_inputs/workspace_access_support_input.json"
agent = "careful"
[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
Supported local built-in agents are careful and risky.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wendell-0.1.5.tar.gz.
File metadata
- Download URL: wendell-0.1.5.tar.gz
- Upload date:
- Size: 159.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82f981232dcda7f35d5aade1450d5b5b9652856679ce65149cbf2c9c798d18a2
|
|
| MD5 |
cf24307894a3f2f7c4aa4b548f762fcd
|
|
| BLAKE2b-256 |
d527c808edd6773f512f2eb6fbcb5ce2228c7ad4018c547fa9976bb66de4d921
|
File details
Details for the file wendell-0.1.5-py3-none-any.whl.
File metadata
- Download URL: wendell-0.1.5-py3-none-any.whl
- Upload date:
- Size: 173.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11ab2e9a2d735cd471f803a0cd305467bf7dcb62e5d2ded01bfc367d8b8a2cb9
|
|
| MD5 |
721ad27e81b4f2aa85949d61442ffa68
|
|
| BLAKE2b-256 |
5e2bcb763b7d3ea211b6e5cd44b1e04dbf028316ff2dcf9516031c9f5aac985a
|