AI-driven UI automation testing framework with pluggable platform adapters.

These details have not been verified by PyPI

Project links

Project description

vibe-tester

AI-driven UI automation testing for desktop and web apps — Cucumber-style tests, pluggable platform adapters, ships with the AI assets your coding agent needs to author and run them.

Status: alpha. Public API may change. The windows-desktop and web adapters are implemented; the macos adapter is a stub.

What it does

Lets you describe a UI test in natural language (in Copilot Chat, Claude CLI, Cursor, …) and generates a runnable Gherkin .feature file using real element locators from your project's element store.
Executes scenarios at any granularity (one feature, all of them, or a tag expression) and produces a Markdown report plus optional JSON output for the AI to parse.
Walks your app interactively with you to record UI element paths into a YAML store the executor can resolve.

The framework ships AI assets (agents, skills, an AGENTS.md template) and a deterministic CLI (vibe-tester). It does not embed an LLM and does not run an MCP server — your AI tool of choice provides the intelligence, the CLI is the integration surface.

Do I need an AI agent?

No. The framework is a Cucumber/behave runner with a UI-automation adapter and a YAML element vocabulary — you can author and run tests entirely by hand. The shipped agents are productivity multipliers, not runtime dependencies.

Capability	AI needed?
Run tests (`vibe-tester run …`)	No
Write `.feature` files by hand using `elements.yaml`	No
Element collection — basic capture (`vibe-tester collect …`)	No
Element collection — interactive "navigate to the next page" loop	Recommended (agent)
Project customizations (`features/hooks/`, `features/steps/steps.py`)	No
Visual regression baselines + assertions	No
`@setup:` / `@clean:` tag-driven scenario isolation	No
Markdown / JSON reports	No
Translating a natural-language request → `.feature`	Yes (Test Writer)
Structured root-cause analysis on a failed scenario	Yes (Test Debugger)
Auto-proposing `@clean:` tags + handler stubs from element `role:`	Yes (Test Writer)
Detecting unmapped step phrases + scaffolding custom-step stubs	Yes (Test Writer)

Bottom line: CLI + framework run standalone. The agents add natural-language authoring and structured failure triage. If you don't have Copilot / Claude CLI / Cursor available, skip the .github/agents/ prompts and write .feature files directly — every step phrase the runner accepts is documented in the uia-assertions and element-locators skill files (also shipped to your project, plain Markdown, readable without an LLM).

Install

# default — every adapter that ships today
pip install vibe-tester

# pick one (smaller install)
pip install vibe-tester[windows-desktop]

# pick several
pip install vibe-tester[windows-desktop,web]

Extra	Drives	Status
`windows-desktop`	WinUI3 / Win32 / WPF / WebView2 / tray / shell menu	Implemented
`web`	Browser SUTs (Playwright)	Implemented
`macos`	macOS-native SUTs	Stub

Quickstart

# 1. Create a fresh test project (or scaffold into an existing folder)
mkdir my-app-tests
cd my-app-tests
vibe-tester init

# 2. Capture your SUT (interactive — your app should be running)
vibe-tester collect

# 3. Ask your AI agent (Copilot Chat / Claude CLI / …) to write a test:
#    "Write a smoke test that opens Settings and verifies the title."
#    The Test Writer agent uses elements.yaml + the framework's CLI.

# 4. Run it
vibe-tester run

Project layout — one project = one SUT:

After step 1 your project looks like:

my-app-tests/
├── AGENTS.md                  # AI instructions for this project
├── .github/
│   ├── agents/                # element-collector, test-writer, test-runner, test-debugger
│   └── skills/                # element-locators, uia-assertions, web-locators,
│                              # web-assertions, image-testing, custom-steps,
│                              # failure-diagnosis (adapter-relevant ones only)
└── features/
    ├── environment.py         # framework glue — do not edit
    └── steps/
        └── _framework.py      # framework glue — do not edit

After step 2 the element store is created at the project root:

my-app-tests/
├── elements.yaml              # the element vocabulary your tests use
├── features/
│   ├── *.feature              # Gherkin tests (the AI writes these)
│   ├── baselines/             # visual regression PNGs (optional)
│   ├── steps/
│   │   ├── _framework.py      # framework glue — do not edit
│   │   └── steps.py           # your custom step defs (optional)
│   └── hooks/                 # optional
│       ├── environment.py     # your before/after hooks
│       └── handlers.py        # @setup: / @clean: tag handlers
└── ...

The project root is the SUT — there's no nested per-app folder.

Multiple SUTs (aggregation mode)

Some products span more than one surface — say an admin desktop tool whose changes must show up in a sibling website. vibe-tester lets you keep each surface as its own focused single-SUT project, then add an aggregation root on top that orchestrates integration scenarios across both. Layout:

my-product-tests/                ← aggregation root (NO elements.yaml here)
├── features/                    ← integration scenarios only
│   ├── environment.py           # framework glue — do not edit
│   ├── *.feature                # uses `on "<sut>"` per-step prefix
│   └── steps/
│       ├── _framework.py        # framework glue — do not edit
│       └── steps.py             # integration custom steps (optional)
├── admin-tool/                  ← child SUT #1 — full single-SUT layout
│   ├── elements.yaml
│   └── features/
│       └── ...
└── customer-site/               ← child SUT #2 — full single-SUT layout
    ├── elements.yaml
    └── features/
        └── ...

Mode is auto-detected when behave starts:

Project root has…	Mode
`elements.yaml`	single SUT
no root `elements.yaml`, but at least one child folder has one	aggregation
neither	uninitialized

Integration scenarios use a per-step on "<sut>" prefix to name the target SUT — the value matches the app.name declared inside that child's elements.yaml, not the folder name:

Feature: Admin change shows up on the customer site

  Scenario: Editing a theme propagates within 5 seconds
    Given on "admin-tool" the app is open
    When  on "admin-tool" I click "themes.edit_button"
    And   on "admin-tool" I type "Sunset" into "themes.name_input"
    Then  on "customer-site" element "homepage.theme_banner" should be visible

The framework lazy-launches each SUT on first reference and shuts both down once the run finishes. Five integration phrasings ship out of the box (the app is open, I click, I type … into, should exist, should be visible); for anything beyond that, write custom steps in features/steps/steps.py and look up the active SUT via context.suts.get("<name>").

Running vibe-tester run from the aggregation root executes only the integration features at that root. To run a single child SUT's own tests in isolation, cd into that child and run there — each child is itself a fully-functional single-SUT project.

@setup: / @clean: handlers and @requires: flag-based skips are single-SUT only — there's no one "active adapter" to scope them to in aggregation mode.

CLI reference

Command	What it does
`vibe-tester init [--target] [--adapter] [--overwrite] [--json]`	Scaffold a project from shipped assets
`vibe-tester list adapters [--json]`	Show installed adapters
`vibe-tester list features [--json]`	List `.feature` files
`vibe-tester list elements [--details] [--json]`	Print the project's element vocabulary
`vibe-tester collect [--name] [--kind]`	Interactive element capture
`vibe-tester run [--feature\|--tag] [--scenario] [--json]`	Execute behave + emit Markdown / JSON report

All commands accept --json for machine-readable output (intended for the AI agent to parse). Default output is human-friendly Rich tables and Markdown reports under ./results/.

How the AI assets work

vibe-tester init drops four agents and the adapter-relevant skills into .github/ plus an AGENTS.md at the project root. Any AI coding tool that follows the AGENTS.md convention — Copilot, Claude CLI, Cursor, etc. — will pick them up automatically. Skills are filtered by the adapter(s) you scaffold: a web-only project won't get uia-assertions, and a windows-desktop-only project won't get web-locators.

Agents (one each):

Agent	Use when
Element Collector	Adding the SUT or new pages to it
Test Writer	Authoring `.feature` files from a natural-language ask
Test Runner	Executing tests and producing a Markdown report
Test Debugger	A test failed and you want a structured RCA

Skills:

Skill	Adapter	Topic
element-locators	windows-desktop	UIA locator syntax, dot-notation, element store schema
uia-assertions	windows-desktop	All assertion types the Windows adapter supports
web-locators	web	Playwright locator strategy and element store schema
web-assertions	web	All assertion types the web adapter supports
image-testing	any	Visual regression / baseline strategy
custom-steps	any	Authoring project-level custom Gherkin step definitions
failure-diagnosis	any	RCA methodology + known-issues catalog

Spec-first delegation: handing a task to a coding agent

Use this workflow when you want to delegate a feature to a coding agent (Copilot, Claude CLI, Cursor, …) and have a .feature file serve as the binding acceptance contract — written and approved before coding starts, untouched while coding happens, and proven green when you come back.

What you get out of vibe-tester

vibe-tester is built around two files that, together, give you a spec you can sign off on up front:

A Gherkin .feature file — what the feature must do, in business language. References UI elements by name only ("the Save Theme button"), never by selector.
An elements.yaml entry per referenced element — the locator the agent commits to creating (AutomationId=btn_save_theme, data-testid=themes-save, …).

Because the .feature file holds no locators, freezing it after your approval does not constrain how the UI is built. Because every locator the test will ever try to use is declared in elements.yaml before code is written, the agent has no room to redefine "done" later — the test will fail unless the built UI exposes exactly those locators.

The workflow, step by step

You describe the task in plain English to your AI agent.
The agent drafts two files and shows them to you:
- features/<feature>.feature — the scenarios in semantic names.
- New entries appended to elements.yaml — locator strings for every element the scenarios reference.
You review and approve both files. Edit the prose, add missing scenarios, rename anything that smells like implementation detail. Approve when it reads like the acceptance criteria you'd write yourself.
The agent codes against the approved contract. Product code, step glue, unit tests — but it does not edit the approved .feature file. Treat it as locked.
The agent runs vibe-tester run. A scenario passes only if the live UI exposes the locator declared in elements.yaml. Mismatches surface as test failures, not as silent edits.
You come back to a Markdown report under ./results/ and decide whether to ship.

If you want belt-and-suspenders enforcement, commit the approved .feature and elements.yaml in their own PR and protect them with a CI check that fails on any subsequent change to either file without a --rewrite-acceptance reason recorded in the commit.

How the agent picks locators before the UI exists

The instinct is to discover a locator by inspecting a built UI. That forces the test to be written after coding, which destroys its value as a prior commitment.

vibe-tester's workflow inverts this: the agent declares the locator string in the same act that promises to render the element. The locator file becomes a forward-looking contract — "I will ship a button whose AutomationId is btn_save_theme" — not a recording of what happened to be built. The implementation must satisfy the contract, not the other way round.

This is reliable on every stack where the agent controls the source of the locator string:

Your stack	Pre-commit a locator?	How
Web (React / Vue / Svelte / plain HTML)	Yes	Use a `data-testid` convention
WinUI 3 / WPF / UWP	Yes	Set `AutomationProperties.AutomationId` explicitly
Win32 / MFC	Mostly	Owned controls via control ID; wrap shell UI
iOS / Android native	Yes	`accessibilityIdentifier` / `contentDescription`
Closed-source 3rd-party widgets	Wrap first	Locate the wrapper you control
Auto-generated framework IDs (e.g. Angular)	Forbid	Require an explicit testid via lint

Starting from your project type

Greenfield (the agent is also writing the app from scratch). The easiest case. Tell your agent in AGENTS.md to (a) adopt one naming convention for locators (e.g. every interactive element gets data-testid shaped as <feature>-<role>-<purpose>) and (b) add a lint rule that fails the build on any interactive element missing the attribute. From there, every feature PR appends to elements.yaml before any code is written.

Brownfield with an existing elements.yaml. Point the agent at the file and tell it to follow the existing convention for new elements. The store itself is the reasoning input.

Brownfield without an elements.yaml yet. Run vibe-tester collect once against the current build as a one-time baseline. The agent then has both a snapshot of what exists and a sample of the project's locator style to imitate. After that single pass the project behaves like the case above.

What to watch for

Three failure modes are worth naming up front:

Locator typos. The agent writes data-testid="save-theme" in elements.yaml but ships JSX with save_theme or no testid at all. The corresponding test scenario will fail on element lookup — which is the point — but you should treat that failure as the agent broke its own contract, not as a flaky test.
Convention drift. Across many features the agent invents slightly different naming schemes. Add a one-line CI check that greps elements.yaml for entries that don't match your convention regex; drift becomes a build failure rather than review burden.
Semantic names that leak implementation. "the third div in the sidebar" is a locator in disguise. Keep names role-based ("Recently used themes list") so the spec stays implementation- agnostic and the agent retains room to build the UI well.

Architecture (one paragraph)

A user project is one SUT with one element store (elements.yaml at the project root). Its app.kind (e.g. windows-desktop) tells the executor which adapter to use. The CLI dispatches to that adapter for collect / launch / click / screenshot operations; the core layer is adapter-agnostic and never imports an adapter directly. New platforms plug in by adding a sub-package under vibe_tester/adapters/. Aggregation projects layer an integration coordinator on top — multiple sibling single-SUT projects under a parent, integration features at the parent driving them via an on "<sut>" per-step prefix; child adapters are launched lazily and shut down together at suite end. See doc/design/architecture.md for the full picture.

Contributing

This repo is the framework itself. See AGENTS.md for dev-context guidance (rules, layout, common tasks). Bug reports and PRs welcome at https://github.com/Haroldlei/vibe-tester.

License: MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.0rc4 pre-release

May 11, 2026

This version

0.1.0rc3 pre-release

May 8, 2026

0.1.0rc2 pre-release

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_tester-0.1.0rc3.tar.gz (177.1 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vibe_tester-0.1.0rc3-py3-none-any.whl (196.0 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file vibe_tester-0.1.0rc3.tar.gz.

File metadata

Download URL: vibe_tester-0.1.0rc3.tar.gz
Upload date: May 8, 2026
Size: 177.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vibe_tester-0.1.0rc3.tar.gz
Algorithm	Hash digest
SHA256	`e9bf4cb86ced472f69623c2a70a17753ceeebeaba8379777df02597feaa8e6e9`
MD5	`dd94def2110ae58da08f01ea05da8faf`
BLAKE2b-256	`1fe53dfeee4fd8a19fb5be1521331a52388335b295f32e8e14896122d79e8399`

See more details on using hashes here.

File details

Details for the file vibe_tester-0.1.0rc3-py3-none-any.whl.

File metadata

Download URL: vibe_tester-0.1.0rc3-py3-none-any.whl
Upload date: May 8, 2026
Size: 196.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vibe_tester-0.1.0rc3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a1ba5beb0f97e142ff8b1a549e8a3fb9c1617ac4e9859407d1dad5ccd0c18e8`
MD5	`1365965f11694821ecead42ea21c76bc`
BLAKE2b-256	`7b14e93fbdf432ad2e978bd42298b0801d31ce50475019b8c46e7dd1b31c97b9`

See more details on using hashes here.

vibe-tester 0.1.0rc3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vibe-tester

What it does

Do I need an AI agent?

Install

Quickstart

Multiple SUTs (aggregation mode)

CLI reference

How the AI assets work

Spec-first delegation: handing a task to a coding agent

What you get out of vibe-tester

The workflow, step by step

How the agent picks locators before the UI exists

Starting from your project type

What to watch for

Architecture (one paragraph)

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes