Wendell CLI for playbook-generated agent test suites and run uploads
Project description
Wendell CLI
Wendell is a Python CLI for running production agents against hosted Wendell test suites.
The CLI is advisory by default: it reports scores, captures traces, and returns a successful process exit unless the project explicitly enables blocking gates. In blocking mode, wendell run returns a nonzero exit code when gates fail.
Intended split
- Wendell system: turns reviewed Playbooks into hosted test suites, owns rubrics, stores traces, and reports regressions.
- Wendell CLI runner: runs in a repo or CI job, fetches scenario work from a published suite, invokes the customer's agent adapter, uploads turns/results, and prints a concise summary.
Install
Install the latest published CLI:
uv tool install --force wendell
Alternative installers:
pipx install --force wendell
python3 -m pip install --user --upgrade wendell
The packaged CLI does not require an OpenAI, OpenRouter, Anthropic, or other LLM
provider key to install, register, create Playbook drafts, or run hosted suites.
Wendell's hosted service generates suites from reviewed Playbooks. Your own
agent_command may need provider credentials if your agent uses an LLM.
For unreleased changes from current main:
python3 -m pip install "git+https://github.com/croppia/wendellai.git#subdirectory=wendell-ci"
For release validation from this monorepo, run:
scripts/smoke_wendell_production_app_install.sh
That smoke creates a clean app directory, installs the packaged wendell CLI
into a fresh virtualenv, and exercises the hosted-suite contract path against a
local mock API without using repo-local PYTHONPATH. The smoke adapter asserts
the customer-facing payload uses wendell.agent_input.v1 and does not expose
rubrics, hidden facts, source lineage, terminal outcomes, or success/failure
criteria. The smoke also asserts the public package does not ship the internal
worldsim package.
The package installs the wendell command only. Customer-facing examples and
CI jobs should use wendell run, not compatibility aliases.
Local development
cd wendell-ci
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest
wendell --help
Login
For a first-time Wendell runner, install and register the CLI:
curl -fsSL https://www.wendellai.com/install | bash
wendell register
Registration creates a short-lived runner session, asks for email/password and agent details, and stores a scoped local runner credential.
For local development with an existing InkPass API key, store it once:
wendell login --api-key-stdin --validate
wendell auth status
Credentials are stored at ~/.config/wendell/credentials.json by default, or $WENDELL_CONFIG_HOME/credentials.json when set. The directory is created with 0700 permissions and the credentials file with 0600 permissions.
CI should continue to use WENDELL_INKPASS_API_KEY; environment variables take precedence over stored credentials.
Minimal config
project = "support-agent"
mode = "advisory"
agent = "support_agent"
agent_command = "python scripts/wendell_agent_adapter.py"
agent_timeout_seconds = 120
upload_traces = true
[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
When you run a hosted suite with wendell run --suite, Wendell resolves the
hosted world, version, scenario pack, and scoring contract from the published
suite.
Production app test loop
Use wendell run as the local and CI entry point for production agents.
wendell suites configure --suite refund-agent-regression --project support-agent
wendell run --suite refund-agent-regression --config wendell.toml
Your adapter receives a production-facing JSON payload. Wendell does not send the scoring rubric, hidden facts, expected outcomes, or success/failure criteria to the agent process.
{
"schema_version": "wendell.agent_input.v1",
"task": "Respond as a support agent for Support Agent.",
"policies": ["Inspect the request before completing the workflow."],
"transcript": [{"speaker": "customer", "text": "I need help with this request."}],
"available_tools": [
{
"name": "workflow_console.inspect_request",
"arguments": {"case_id": "str"},
"description": "inspect request"
}
],
"case": {"case_id": "case_123", "request": "I need help with this request."},
"scenario": {"id": "case_123", "kind": "realistic"}
}
The adapter must print JSON:
{
"message": "I inspected the request and recorded the required evidence.",
"tool_calls": [
{"name": "workflow_console.inspect_request", "args": {"case_id": "case_123"}}
],
"metrics": {}
}
wendell suites configure creates:
wendell.toml: the hosted-suite run configscripts/wendell_agent_adapter.py: the adapter boundary for your production agent
The generated adapter is intentionally not a fake passing agent. It requires
WENDELL_APP_AGENT_COMMAND or replacement code that calls your production agent
and maps its real actions back to Wendell tool names. Wendell no longer
generates a passing example adapter because that makes validation look real
without testing the customer's agent.
Create, review, and publish hosted suites with the wendell playbook and
wendell suites commands documented in the SDK docs. New customer projects
should use hosted suite creation and wendell run; the public CLI package does
not ship the internal local compiler/runtime.
For CI, set mode = "blocking" so failed gates return a nonzero exit code:
project = "support-agent"
mode = "blocking"
agent = "support_agent"
agent_command = "python scripts/wendell_agent_adapter.py"
agent_timeout_seconds = 120
upload_traces = true
[gates]
suite_min_score = 0.90
scenario_min_score = 0.85
critical_failures_allowed = 0
Failure output includes the failed scenario, gate reason, incomplete workflow steps, Playbook rule id, assertion id, trajectory event indexes, and fix prompts when the suite provides assertions.
agent_timeout_seconds controls the maximum duration for one adapter invocation.
The default is 120 seconds; increase it for production agents that need more
time for hosted tools or long-running workflow turns.
What to commit
Commit these files:
wendell.toml- your production adapter, usually
scripts/wendell_agent_adapter.py
Do not commit secrets, raw customer transcripts, API keys, or generated run outputs. Keep production credentials in your CI secret store.
GitHub Actions
name: Wendell Agent Tests
on:
pull_request:
push:
branches: [main]
jobs:
wendell:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.13"
- name: Install Wendell
run: |
python -m pip install --upgrade pip
python -m pip install wendell
- name: Run Wendell
env:
WENDELL_INKPASS_API_KEY: ${{ secrets.WENDELL_INKPASS_API_KEY }}
WENDELL_APP_AGENT_COMMAND: python scripts/run_my_agent.py
run: |
set +e
wendell run \
--suite refund-agent-regression \
--config wendell.toml \
--json | tee wendell-run.json
status=${PIPESTATUS[0]}
set -e
python - <<'PY'
import json
import os
from pathlib import Path
summary_path = os.environ.get("GITHUB_STEP_SUMMARY")
if not summary_path:
raise SystemExit(0)
payload = json.loads(Path("wendell-run.json").read_text(encoding="utf-8"))
remote = payload.get("remote") if isinstance(payload.get("remote"), dict) else {}
suite = payload.get("suite") if isinstance(payload.get("suite"), dict) else {}
lines = [
"## Wendell report",
"",
f"- Gate: `{payload.get('decision', 'unknown')}`",
f"- Run: `{remote.get('run_id', 'unknown')}`",
]
if suite.get("suite_score") is not None:
lines.append(f"- Score: `{suite['suite_score']}`")
if remote.get("private_report_url"):
lines.append(f"- Private report: {remote['private_report_url']}")
with Path(summary_path).open("a", encoding="utf-8") as summary:
summary.write("\n".join(lines) + "\n")
PY
exit "$status"
For hosted Wendell reporting, keep upload_traces = true and provide
WENDELL_INKPASS_API_KEY from CI secrets. Add api_url only when targeting a
staging, preview, or local API.
The JSON output includes remote.private_report_url when Wendell can create a
private dashboard handoff link for the run.
For a newly published hosted suite, create the repo-local config and adapter boundary first:
wendell suites show refund-agent-regression
wendell suites configure --suite refund-agent-regression --project refund-agent
After a hosted run uploads, inspect its private status and report:
wendell runs watch run_abc123
wendell runs report run_abc123
Wendell dogfood suite
This repo includes a production-style dogfood suite for Wendell's own Playbook-to-test-suite behavior:
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
The suite lives at
configs/customer_inputs/wendell_playbook_compiler_quality_input.json and is
derived from configs/playbooks/wendell_playbook_compiler_quality.md.
The real Wendell Playbook compiler adapter is expected to pass:
uv run --project wendell-ci --extra dev --with-editable . wendell test --config wendell.dogfood.toml
The intentionally broken agent is expected to fail with rule-linked evidence:
uv run --project wendell-ci --extra dev --with-editable . wendell test \
--config wendell.dogfood.toml \
--agent-command "python examples/wendell_playbook_compiler_broken_agent.py"
The intended production flow is:
wendell runruns on the developer machine or CI worker.- It fetches pinned public scenario work and tool schemas from Wendell.
- It invokes the agent locally through an adapter such as
agent_command. - It captures local traces and uploads agent turns/results back to Wendell for server-side scoring.
- Advisory mode exits
0; blocking mode exits nonzero only when explicitly enabled.
Remote uploads authenticate with an InkPass API key sent as X-API-Key. The key needs at least wendell:worlds:read, wendell:runs:create, and wendell:runs:read.
Offline dogfood config
For Wendell development and restricted offline environments, the runner can
call the local compiler/runtime directly with worldsim_input. Hosted customer
suites should omit this field and use wendell run --suite.
Legacy local commands such as wendell init, wendell test, and
wendell playbook compile are for Wendell development and require the internal
worldsim package from this monorepo. They are not part of the public
production workflow.
project = "workspace-access-agent"
mode = "advisory"
world = "workspace_access_support"
scenario_pack = "smoke"
worldsim_input = "../configs/customer_inputs/workspace_access_support_input.json"
agent = "careful"
[gates]
suite_min_score = 0.80
scenario_min_score = 0.75
critical_failures_allowed = 0
Supported local built-in agents are careful and risky.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wendell-0.1.18.tar.gz.
File metadata
- Download URL: wendell-0.1.18.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d7a6e990526e5207b831f0690ae0b88f35faac455ef5a11fdd7892ba2040636
|
|
| MD5 |
a96bd543f0b196ad89d6bd833b00e3cf
|
|
| BLAKE2b-256 |
6eb31a1f115b2aaa7997a50eb0c0fe5243f6c72f03fd097651eb29c9aca6044f
|
File details
Details for the file wendell-0.1.18-py3-none-any.whl.
File metadata
- Download URL: wendell-0.1.18-py3-none-any.whl
- Upload date:
- Size: 32.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
720030a2d02ae0ec5f6876745d2634dcb51b4e231c117ce9307e0f72520a0f63
|
|
| MD5 |
a23c2881862d31a2c6aa0bab590722c4
|
|
| BLAKE2b-256 |
d596d593c857d0c4fec7678f1743b9994d7e64e24ba17d452a702afe1d432a40
|