Skip to main content

LLM-driven BDD test authoring for Robot Framework — turn an intention + live app into a .robot suite

Project description

aitester-bdd

LLM-driven BDD test authoring for Robot Framework. Give it a story and a live web app; an agent explores the target via the agent-browser CLI, then writes a deterministic .robot suite with selectors grounded in the actual DOM it observed — or files a bug report when the system is broken in a way that prevents authoring.

What it is

A Robot Framework library that turns a plain-English intention into a deterministic, executable .robot test suite. Run-time has no LLM in the loop — the authored suite is plain RF code that runs reproducibly, no tokens consumed on PR gates.

What's novel

aitester-bdd
Intention → .robot suite An agent loop drives the live target via shell-out to agent-browser (Playwright under the hood), writes a Robot Framework suite with selectors grounded in the snapshots it actually took.
Bug-report exit channel When the SUT is broken in a way that prevents authoring (missing UI, broken auth flow, untestable terminal state), the agent writes triage/<story>.md rather than inventing selectors.
Three pluggable runtime backends agent-browser (default, zero-install) / playwright (in-process speed) / nodriver (bot-detection-resistant). Same .robot runs on any.
AOP failure aspect Each failed rule ships with an AI-written natural-language diagnosis (SUT-vs-test classification) plus a full MDP trajectory in walk_log.jsonl.
Rule DAG with parent-child composition Ported from WISE RPA BDD. Position-determined state checks (guard vs observation), retry-with-redo, scope inheritance — all expressed via Given/When/Then.

Status

Alpha. Authoring verified end-to-end on public sites (example.com, en.wikipedia.org, the-internet.herokuapp.com) and on a real internal SPA (login + chat + tool-rendering verification).

How fast is it?

Authoring is headless DeepAgents on Claude Opus 4.7 shelling out to the agent-browser CLI. Typical wall-time for a single suite:

Site / scope Steps Wall time
example.com smoke (heading + link) 9 ~27s
en.wikipedia.org search + article check (5 assertions) 27 ~70s
Real SPA login + chat + multi-rule verification 50-80 2-3 min

The agent batches multiple agent-browser subcommands per shell call (open && snapshot && get count ...) so ~1 LLM round-trip handles 2-4 browser ops. Most remaining wall-time is SUT-bound (waiting for the app's own LLM to stream a response) — not authoring overhead.

Quick start

# 1. Install
pip install aitester-bdd
npm i -g agent-browser

# 2. Point at an LLM endpoint. Defaults assume claude-code-proxy:
export AITESTER_LLM_MODEL=cc/claude-opus-4-7
export OPENAI_BASE_URL=http://localhost:20128/v1
export OPENAI_API_KEY=placeholder

# 3. Author a suite from a story
aitester author \
  --story "Open the homepage, search for 'BDD', verify the article heading and a paragraph containing 'BDD'." \
  --base-url https://en.wikipedia.org \
  --out wiki_smoke.robot

# 4. Run it (no LLM at run time)
aitester run wiki_smoke.robot

For visibility into the agent's exploration, add --debug to aitester author. Every LLM turn and shell call streams to stderr with timestamps.

Output sidecar files at <output_dir>/:

  • walk_log.jsonl — every MDP transition (rule_enter / before_action / after_action / state_check / dismiss / emit / rule_exit)
  • failures.jsonl — failure context + AI diagnosis for every failed rule
  • emit.jsonl — explicit And I emit "..." captures (intention-driven; only when the story is a diagnostic probe)

Three runtime backends, one authored suite

AITESTER_BROWSER= picks the driver at run-time:

Backend Default? Setup Best for
agent-browser none — CLI ships its own browser most cases; same driver author + run, zero install friction
playwright aitester init-browser once action-heavy tests where subprocess latency matters
nodriver pip install aitester-bdd[stealth] + Edge/Chrome bot-detected sites (DataDome / Cloudflare BM / etc.)

Same .robot runs on any of the three.

Architecture (one paragraph)

The LLM is the author, not the runtime. At authoring time, a DeepAgents/LangGraph agent reads SKILL.md as its system prompt, drives the live target by shelling out to the agent-browser CLI (via DeepAgents' LocalShellBackend.execute tool), and emits a .robot file with selectors grounded in real snapshots — or writes a bug report when the system is broken in a way that prevents authoring. The agent batches multiple agent-browser subcommands per shell call (open && snapshot && get count …) so each LLM round-trip drives multiple browser ops. At run time, plain Robot Framework executes the suite via one of three pluggable browser backends; no LLM in the loop. Failures fire an AOP diagnose aspect that hands the LLM the MDP trajectory plus snapshot and asks "why?" — short natural-language diagnoses land on RuleResult.ai_diagnosis and failures.jsonl. The walker, gotcha-fixes, and AspectRegistry are ported from the WISE RPA BDD skill.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aitester_bdd-0.2.0.tar.gz (371.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aitester_bdd-0.2.0-py3-none-any.whl (114.1 kB view details)

Uploaded Python 3

File details

Details for the file aitester_bdd-0.2.0.tar.gz.

File metadata

  • Download URL: aitester_bdd-0.2.0.tar.gz
  • Upload date:
  • Size: 371.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aitester_bdd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c40c4120cad7f5a9c9fa6ca0591c6d5377b9284923371f9a4f691302355e40c1
MD5 a13ec9404fd910a78171c620dfbf9328
BLAKE2b-256 179d5d719c0f10576edbe77670f9e95088045765022b52226d0a9373cb11c0e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for aitester_bdd-0.2.0.tar.gz:

Publisher: publish-pypi.yml on kundeng/aitester-bdd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aitester_bdd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: aitester_bdd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 114.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aitester_bdd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5233d2598ca1731abb0ad931e377c0651423afcc221b1a61633f9554297f39ba
MD5 a381cba5f6c5d55f9e4c745ecb1fa0d2
BLAKE2b-256 764717e775e8481f6c172473266c4d38f6b5fda1189a59d4ad080c166b6f3159

See more details on using hashes here.

Provenance

The following attestation bundles were made for aitester_bdd-0.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on kundeng/aitester-bdd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page