AI-powered exploratory testing agent that discovers bugs like a real user
Project description
Argus
AI-powered exploratory QA agent. Give it a URL, it explores your app like a real user — clicking buttons, filling forms, trying edge cases — and finds bugs that scripted tests miss.
Unlike Playwright or Cypress, you don't write test scripts. Argus discovers bugs you didn't think to test for.
What It Catches
Traditional test tools only catch what you explicitly script. Argus catches:
- Console errors & crashes — JS exceptions, unhandled promises
- Broken API calls — HTTP 4xx/5xx responses
- Fake success operations — "Saved!" toast shown but server returned 500
- Silent data loss — delete says "Deleted!" but item still exists on refresh
- Broken date formatting — "1.52 days ago" instead of "2 days ago"
- Count mismatches — dashboard says "7 tasks" but there are actually 8
- Dead links — navigation links pointing to 404 pages
- XSS vulnerabilities — unsanitized user input reflected in HTML
Quick Start (MCP Server for Claude Code)
The recommended way to use Argus. Claude Code becomes the AI brain — no API key needed.
pip install argus-testing
playwright install chromium
claude mcp add argus -- argus-mcp
Then in Claude Code:
"Test my app at http://localhost:3000, focus on the checkout flow"
Claude Code will explore your app using Argus tools, try edge cases, verify that actions actually persist, and generate an HTML bug report.
MCP Tools
| Tool | What it does |
|---|---|
start_session(url) |
Launch browser, navigate to URL |
get_page_state() |
See all interactive elements + page content + counts + toasts |
click(index) |
Click an element |
type_text(index, text) |
Type into an input field |
navigate(url) |
Go to a URL |
screenshot(name) |
Capture the current page |
get_errors() |
Get console/network errors + smart detection (broken dates, count mismatches, misleading toasts) |
verify_action(type, text, url) |
Verify a delete/edit actually persisted by navigating and checking |
end_session() |
Close browser, generate HTML report |
Alternative: Standalone CLI
Bring your own LLM API key. Argus has a built-in AI planner that decides what to explore.
pip install argus-testing
playwright install chromium
export DEEPSEEK_API_KEY=sk-... # or OPENAI_API_KEY, ANTHROPIC_API_KEY
argus http://localhost:3000 --model deepseek/deepseek-chat -n 50
argus http://localhost:3000 -f "test login with edge cases" --headed
Supports 100+ models via LiteLLM: OpenAI, Anthropic, DeepSeek, Gemini, Ollama (free/local), etc.
Smart Detection
Argus goes beyond console errors. It extracts page content and analyzes it:
| Detection | How it works | Example bug found |
|---|---|---|
| State verification | After delete/edit, navigates back and checks if the change persisted | "Deleted!" shown but item reappears on refresh |
| Toast + error cross-check | Correlates success toasts with HTTP 500 responses | Settings shows "Saved!" but server crashed |
| Text anomaly scanning | Regex patterns on visible page text | "NaN", "1.52 days ago", eternal "Loading..." |
| Count consistency | Compares displayed counts against actual rendered items | Header says "7 tasks" but 8 items visible |
| CSS state analysis | Checks semantic CSS classes for contradictory states | "0 remaining" shown in red (should be success) |
Bug Report
Each session generates a self-contained HTML report with:
- Bug cards with severity, type, description, and reproduction steps
- Embedded screenshots (base64 — no external files needed)
- Testing timeline showing every page visited
- Console and network error logs
How It Works
You give a URL
→ Argus opens a real browser (Playwright)
→ AI explores: clicks, types, navigates, tries edge cases
→ Smart detection analyzes page content after every action
→ Generates HTML report with all bugs found
Architecture
argus/
├── browser.py # Playwright driver + DOM extraction + page content analysis
├── detector.py # Smart bug detection (5 detection methods)
├── mcp_server.py # MCP server (12 tools) for Claude Code
├── explorer.py # CLI exploration loop
├── planner.py # LLM planner for CLI mode (LiteLLM)
├── reporter.py # Self-contained HTML report generator
├── models.py # Data models
├── config.py # YAML config + focus areas
└── cli.py # CLI entry point
Requirements
- Python 3.10+
- Chromium (auto-installed via
playwright install chromium)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_testing-0.1.0.tar.gz.
File metadata
- Download URL: argus_testing-0.1.0.tar.gz
- Upload date:
- Size: 37.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8f87d4f02e56d7acccc37e06edcbe67374f61b9cfc9f57523af0b08decb1870
|
|
| MD5 |
0b5a8d429a010677c0e33c6242b4095d
|
|
| BLAKE2b-256 |
73e36f878ac0a9811fd92914a89c15bb4cda1c13c75b9af597f38492c02c6197
|
File details
Details for the file argus_testing-0.1.0-py3-none-any.whl.
File metadata
- Download URL: argus_testing-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a1082d74aa478ae1e9c4879fed3d7d9868941a746200f357946744458a7edd3
|
|
| MD5 |
7432f5e60ad3459d3b0a34c6296357a5
|
|
| BLAKE2b-256 |
a737fee86e217b5edddff77987e902fbcd217e063472fffc942b6404c4f2a0d5
|