AI-powered exploratory testing agent that discovers bugs like a real user
Project description
Argus
AI-powered exploratory QA agent. Give it a URL, it explores your app like a real user — clicking buttons, filling forms, trying edge cases — and finds bugs that scripted tests miss.
Unlike Playwright or Cypress, you don't write test scripts. Argus discovers bugs you didn't think to test for.
What It Catches
Argus runs 16 types of detection across every page it visits:
| Category | What it finds |
|---|---|
| Runtime Errors | Console exceptions, HTTP 4xx/5xx, crashes |
| Logic Bugs | Fake delete/edit (says "Saved!" but data didn't persist), misleading success toasts |
| Data Issues | Count mismatches, broken dates ("1.52 days ago"), NaN, eternal "Loading..." |
| Dead Links | Crawls all internal links, finds 404s and 5xx |
| Broken Images | Images that failed to load |
| SEO | Missing meta description, OG tags, heading hierarchy issues |
| Accessibility | Missing alt text, unlabeled form inputs, no lang attribute |
| Performance | Slow page loads (>3s), large resources (>500KB), too many requests |
| Security | Mixed content (HTTP resources on HTTPS pages), XSS reflection |
Quick Start (MCP Server for Claude Code)
The recommended way to use Argus. Claude Code becomes the AI brain — no API key needed.
pip install argus-testing
playwright install chromium
claude mcp add argus -- argus-mcp
Then in Claude Code:
"Test my app at http://localhost:3000, focus on the checkout flow"
Claude Code will explore your app, try edge cases, verify that actions persist, and generate an HTML bug report.
MCP Tools (14)
| Tool | What it does |
|---|---|
start_session(url) |
Launch browser, navigate to URL |
get_page_state() |
See interactive elements + page text + counts + toasts + meta tags + a11y issues |
click(index) |
Click an element |
type_text(index, text) |
Type into an input field |
select_option(index, value) |
Select from dropdown |
navigate(url) |
Go to a URL |
go_back() |
Browser back |
scroll_down() |
Scroll the page |
screenshot(name) |
Capture the current page |
get_errors() |
Run all 12 passive detectors (console, network, text, images, SEO, a11y, mixed content...) |
verify_action(type, text, url) |
Verify a delete/edit actually persisted |
check_links() |
Crawl all internal links, find dead ones |
check_performance() |
Measure load time, find large resources |
end_session() |
Close browser, generate HTML report |
Alternative: Standalone CLI
Bring your own LLM API key. Argus has a built-in AI planner that decides what to explore.
pip install argus-testing
playwright install chromium
export DEEPSEEK_API_KEY=sk-... # or OPENAI_API_KEY, ANTHROPIC_API_KEY
argus http://localhost:3000 --model deepseek/deepseek-chat -n 50
argus http://localhost:3000 -f "test login with edge cases" --headed
Supports 100+ models via LiteLLM: OpenAI, Anthropic, DeepSeek, Gemini, Ollama (free/local), etc.
Tested on Real Sites
| Site | Bugs found | Examples |
|---|---|---|
| vanlifeyvr.com | 1 | Missing og:image |
| nalifex.com | 3 | Unlabeled search input, 1.5MB uncompressed image, missing og:image |
| BuggyTasks (test app) | 15+ | Fake delete, fake edit, broken dates, count mismatch, XSS, auth bypass |
Bug Report
Each session generates a self-contained HTML report with:
- Bug cards with severity, type, description, and reproduction steps
- Embedded screenshots (base64 — no external files needed)
- Testing timeline showing every page visited
- Console and network error logs
How It Works
You give a URL
-> Argus opens a real browser (Playwright)
-> AI explores: clicks, types, navigates, tries edge cases
-> 12 passive detectors analyze every page automatically
-> On-demand: link crawling + performance metrics
-> Generates HTML report with all bugs found
Requirements
- Python 3.10+
- Chromium (auto-installed via
playwright install chromium)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_testing-0.2.2.tar.gz.
File metadata
- Download URL: argus_testing-0.2.2.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b69563a9936a3ea591b9ef2ec959d189048254751e650c470dd51ac5c11d478
|
|
| MD5 |
c8b9ff89cb46baee4d1b4e8301b6ce80
|
|
| BLAKE2b-256 |
b1aa26f0a0b74fb978e7abbc11c3fd6dbbceb45376a377899bc3f6e446977485
|
File details
Details for the file argus_testing-0.2.2-py3-none-any.whl.
File metadata
- Download URL: argus_testing-0.2.2-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b54bbd44adbcdee1f86e9d0f761b87fec6bb4020e386e28a053c94c6eacf370a
|
|
| MD5 |
3abab4b8c67a1f3a69d1dc3c9ca7c881
|
|
| BLAKE2b-256 |
998a16b9c61a2d466af6db49de1ffe167c115f2581efdf61990066f1a6648a9b
|