Skip to main content

AI-powered exploratory testing agent that discovers bugs like a real user

Project description

Argus

AI-powered exploratory QA agent. Give it a URL, it explores your app like a real user — clicking buttons, filling forms, trying edge cases — and finds bugs that scripted tests miss.

Unlike Playwright or Cypress, you don't write test scripts. Argus discovers bugs you didn't think to test for.

What It Catches

Traditional test tools only catch what you explicitly script. Argus catches:

  • Console errors & crashes — JS exceptions, unhandled promises
  • Broken API calls — HTTP 4xx/5xx responses
  • Fake success operations — "Saved!" toast shown but server returned 500
  • Silent data loss — delete says "Deleted!" but item still exists on refresh
  • Broken date formatting — "1.52 days ago" instead of "2 days ago"
  • Count mismatches — dashboard says "7 tasks" but there are actually 8
  • Dead links — navigation links pointing to 404 pages
  • XSS vulnerabilities — unsanitized user input reflected in HTML

Quick Start (MCP Server for Claude Code)

The recommended way to use Argus. Claude Code becomes the AI brain — no API key needed.

pip install argus-testing
playwright install chromium
claude mcp add argus -- argus-mcp

Then in Claude Code:

"Test my app at http://localhost:3000, focus on the checkout flow"

Claude Code will explore your app using Argus tools, try edge cases, verify that actions actually persist, and generate an HTML bug report.

MCP Tools

Tool What it does
start_session(url) Launch browser, navigate to URL
get_page_state() See all interactive elements + page content + counts + toasts
click(index) Click an element
type_text(index, text) Type into an input field
navigate(url) Go to a URL
screenshot(name) Capture the current page
get_errors() Get console/network errors + smart detection (broken dates, count mismatches, misleading toasts)
verify_action(type, text, url) Verify a delete/edit actually persisted by navigating and checking
end_session() Close browser, generate HTML report

Alternative: Standalone CLI

Bring your own LLM API key. Argus has a built-in AI planner that decides what to explore.

pip install argus-testing
playwright install chromium

export DEEPSEEK_API_KEY=sk-...   # or OPENAI_API_KEY, ANTHROPIC_API_KEY

argus http://localhost:3000 --model deepseek/deepseek-chat -n 50
argus http://localhost:3000 -f "test login with edge cases" --headed

Supports 100+ models via LiteLLM: OpenAI, Anthropic, DeepSeek, Gemini, Ollama (free/local), etc.

Smart Detection

Argus goes beyond console errors. It extracts page content and analyzes it:

Detection How it works Example bug found
State verification After delete/edit, navigates back and checks if the change persisted "Deleted!" shown but item reappears on refresh
Toast + error cross-check Correlates success toasts with HTTP 500 responses Settings shows "Saved!" but server crashed
Text anomaly scanning Regex patterns on visible page text "NaN", "1.52 days ago", eternal "Loading..."
Count consistency Compares displayed counts against actual rendered items Header says "7 tasks" but 8 items visible
CSS state analysis Checks semantic CSS classes for contradictory states "0 remaining" shown in red (should be success)

Bug Report

Each session generates a self-contained HTML report with:

  • Bug cards with severity, type, description, and reproduction steps
  • Embedded screenshots (base64 — no external files needed)
  • Testing timeline showing every page visited
  • Console and network error logs

How It Works

You give a URL
  → Argus opens a real browser (Playwright)
  → AI explores: clicks, types, navigates, tries edge cases
  → Smart detection analyzes page content after every action
  → Generates HTML report with all bugs found

Architecture

argus/
├── browser.py       # Playwright driver + DOM extraction + page content analysis
├── detector.py      # Smart bug detection (5 detection methods)
├── mcp_server.py    # MCP server (12 tools) for Claude Code
├── explorer.py      # CLI exploration loop
├── planner.py       # LLM planner for CLI mode (LiteLLM)
├── reporter.py      # Self-contained HTML report generator
├── models.py        # Data models
├── config.py        # YAML config + focus areas
└── cli.py           # CLI entry point

Requirements

  • Python 3.10+
  • Chromium (auto-installed via playwright install chromium)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_testing-0.1.0.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_testing-0.1.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file argus_testing-0.1.0.tar.gz.

File metadata

  • Download URL: argus_testing-0.1.0.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for argus_testing-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a8f87d4f02e56d7acccc37e06edcbe67374f61b9cfc9f57523af0b08decb1870
MD5 0b5a8d429a010677c0e33c6242b4095d
BLAKE2b-256 73e36f878ac0a9811fd92914a89c15bb4cda1c13c75b9af597f38492c02c6197

See more details on using hashes here.

File details

Details for the file argus_testing-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: argus_testing-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for argus_testing-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a1082d74aa478ae1e9c4879fed3d7d9868941a746200f357946744458a7edd3
MD5 7432f5e60ad3459d3b0a34c6296357a5
BLAKE2b-256 a737fee86e217b5edddff77987e902fbcd217e063472fffc942b6404c4f2a0d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page