Skip to main content

AI-driven UI automation testing framework with pluggable platform adapters.

Project description

vibe-tester

AI-driven UI automation testing for desktop and (soon) web apps — Cucumber-style tests, pluggable platform adapters, ships with the AI assets your coding agent needs to author and run them.

Status: alpha. Public API may change. The Windows desktop adapter is scaffolded but not yet implemented.


What it does

  1. Lets you describe a UI test in natural language (in Copilot Chat, Claude CLI, Cursor, …) and generates a runnable Gherkin .feature file using real element locators from your project's element store.
  2. Executes scenarios at any granularity (one feature, one app, everything, or a tag expression) and produces a Markdown report plus optional JSON output for the AI to parse.
  3. Walks your app interactively with you to record UI element paths into a YAML store the executor can resolve.

The framework ships AI assets (agents, skills, an AGENTS.md template) and a deterministic CLI (vibe-tester). It does not embed an LLM and does not run an MCP server — your AI tool of choice provides the intelligence, the CLI is the integration surface.


Do I need an AI agent?

No. The framework is a Cucumber/behave runner with a UI-automation adapter and a YAML element vocabulary — you can author and run tests entirely by hand. The shipped agents are productivity multipliers, not runtime dependencies.

Capability AI needed?
Run tests (vibe-tester run …) No
Write .feature files by hand using elements.yaml No
Element collection — basic capture (vibe-tester collect …) No
Element collection — interactive "navigate to the next page" loop Recommended (agent)
Per-SUT customizations (hooks/handlers.py, hooks/steps.py) No
Visual regression baselines + assertions No
@setup: / @clean: tag-driven scenario isolation No
Markdown / JSON reports No
Translating a natural-language request → .feature Yes (Test Writer)
Structured root-cause analysis on a failed scenario Yes (Test Debugger)
Auto-proposing @clean: tags + handler stubs from element role: Yes (Test Writer)
Detecting unmapped step phrases + scaffolding custom-step stubs Yes (Test Writer)

Bottom line: CLI + framework run standalone. The agents add natural-language authoring and structured failure triage. If you don't have Copilot / Claude CLI / Cursor available, skip the .github/agents/ prompts and write .feature files directly — every step phrase the runner accepts is documented in the uia-assertions and element-locators skill files (also shipped to your project, plain Markdown, readable without an LLM).


Install

# default — every adapter that ships today
pip install vibe-tester

# pick one (smaller install)
pip install vibe-tester[windows-desktop]

# pick several
pip install vibe-tester[windows-desktop,web]
Extra Drives Status
windows-desktop WinUI3 / Win32 / WPF / WebView2 / tray / shell menu In progress
web Browser SUTs Stub
macos macOS-native SUTs Stub

Quickstart

# 1. Create a fresh test project (or scaffold into an existing folder)
mkdir my-app-tests
cd my-app-tests
vibe-tester init

# 2. Capture your first SUT (interactive — your app should be running)
vibe-tester collect --app my-app

# 3. Ask your AI agent (Copilot Chat / Claude CLI / …) to write a test:
#    "Write a smoke test that opens Settings and verifies the title."
#    The Test Writer agent uses elements.yaml + the framework's CLI.

# 4. Run it
vibe-tester run --app my-app

After step 1 your project looks like:

my-app-tests/
├── AGENTS.md                  # AI instructions for this project
├── .github/
│   ├── agents/                # element-collector, test-writer, test-runner, test-debugger
│   └── skills/                # element-locators, uia-assertions, image-testing, failure-diagnosis
└── features/
    ├── environment.py         # framework glue — do not edit
    └── steps/
        └── _framework.py      # framework glue — do not edit

After step 2 a SUT subfolder is added:

features/
└── my-app/
    ├── elements.yaml          # the element vocabulary your tests use
    ├── *.feature              # Gherkin tests (the AI writes these)
    ├── baselines/             # visual regression PNGs (optional)
    └── hooks/
        ├── environment.py     # per-SUT cleanup (optional)
        └── steps.py           # per-SUT custom step defs (optional)

CLI reference

Command What it does
vibe-tester init [--target] [--adapter] [--overwrite] [--json] Scaffold a project from shipped assets
vibe-tester list adapters [--json] Show installed adapters
vibe-tester list features [--json] List .feature files and their @app: tag
vibe-tester list elements --app <name> [--details] [--json] Print the element vocabulary for one SUT
vibe-tester collect --app <name> [--kind] Interactive element capture
vibe-tester run [--feature|--app|--tag] [--scenario] [--json] Execute behave + emit Markdown / JSON report

All commands accept --json for machine-readable output (intended for the AI agent to parse). Default output is human-friendly Rich tables and Markdown reports under ./results/.


How the AI assets work

vibe-tester init drops four agents and four skills into .github/ plus an AGENTS.md at the project root. Any AI coding tool that follows the AGENTS.md convention — Copilot, Claude CLI, Cursor, etc. — will pick them up automatically.

Agents (one each):

Agent Use when
Element Collector Adding a new SUT or new pages to an existing one
Test Writer Authoring .feature files from a natural-language ask
Test Runner Executing tests and producing a Markdown report
Test Debugger A test failed and you want a structured RCA

Skills:

Skill Topic
element-locators Locator syntax, dot-notation, element store schema
uia-assertions All assertion types the framework supports
image-testing Visual regression / baseline strategy
failure-diagnosis RCA methodology + known-issues catalog

Architecture (one paragraph)

A user project has one element store (elements.yaml) per SUT. Its app.kind (e.g. windows-desktop) tells the executor which adapter to use. The CLI dispatches to that adapter for collect / launch / click / screenshot operations; the core layer is adapter-agnostic and never imports an adapter directly. New platforms plug in by adding a sub-package under vibe_tester/adapters/. See doc/design/architecture.md for the full picture.


Contributing

This repo is the framework itself. See AGENTS.md for dev-context guidance (rules, layout, common tasks). Bug reports and PRs welcome at https://github.com/Haroldlei/vibe-tester.

License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_tester-0.1.0rc2.tar.gz (158.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_tester-0.1.0rc2-py3-none-any.whl (181.8 kB view details)

Uploaded Python 3

File details

Details for the file vibe_tester-0.1.0rc2.tar.gz.

File metadata

  • Download URL: vibe_tester-0.1.0rc2.tar.gz
  • Upload date:
  • Size: 158.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vibe_tester-0.1.0rc2.tar.gz
Algorithm Hash digest
SHA256 13c3a746165a43c18942d00d70c5eabd6afb4954fce316b503b13ffeddc95a51
MD5 55ac94a5b1c86de1205dc56513b1ec16
BLAKE2b-256 a781b29da3453c56396a4a314528297e2dabeb03bbb099b386934bec878b3da7

See more details on using hashes here.

File details

Details for the file vibe_tester-0.1.0rc2-py3-none-any.whl.

File metadata

  • Download URL: vibe_tester-0.1.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 181.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vibe_tester-0.1.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 77f838c2fa8ea043bd7b9a7fa6244de3860f3b5694da3a62a2f23aac6c20d0cd
MD5 f626e655646c588b94cfd2f449259c43
BLAKE2b-256 3c8fab96d969d7c581afcadd69a2fb4ed4ad88dac15b7d2267b29a5b0a7e6e5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page