Skip to main content

AI-powered exploratory testing agent that discovers bugs like a real user

Project description

Argus

AI-powered exploratory QA agent. Give it a URL, it explores your app like a real user — clicking buttons, filling forms, trying edge cases — and finds bugs that scripted tests miss.

Unlike Playwright or Cypress, you don't write test scripts. Argus discovers bugs you didn't think to test for.

What It Catches

Argus runs 16 types of detection across every page it visits:

Category What it finds
Runtime Errors Console exceptions, HTTP 4xx/5xx, crashes
Logic Bugs Fake delete/edit (says "Saved!" but data didn't persist), misleading success toasts
Data Issues Count mismatches, broken dates ("1.52 days ago"), NaN, eternal "Loading..."
Dead Links Crawls all internal links, finds 404s and 5xx
Broken Images Images that failed to load
SEO Missing meta description, OG tags, heading hierarchy issues
Accessibility Missing alt text, unlabeled form inputs, no lang attribute
Performance Slow page loads (>3s), large resources (>500KB), too many requests
Security Mixed content (HTTP resources on HTTPS pages), XSS reflection

Quick Start (MCP Server for Claude Code)

The recommended way to use Argus. Claude Code becomes the AI brain — no API key needed.

pip install argus-testing
playwright install chromium
claude mcp add argus -- argus-mcp

Then in Claude Code:

"Test my app at http://localhost:3000, focus on the checkout flow"

Claude Code will explore your app, try edge cases, verify that actions persist, and generate an HTML bug report.

MCP Tools (15)

Tool What it does
start_session(url) Launch browser, navigate to URL
get_page_state() See interactive elements + page text + counts + toasts + meta tags + a11y issues
click(index) Click an element
type_text(index, text) Type into an input field
select_option(index, value) Select from dropdown
navigate(url) Go to a URL
go_back() Browser back
scroll_down() Scroll the page
screenshot(name) Capture the current page
get_errors() Run all passive detectors (console, network, text, images, SEO, a11y, mixed content...)
verify_action(type, text, url) Verify a delete/edit actually persisted
check_links() Crawl all internal links, find dead ones
check_performance() Measure load time, find large resources
crawl_site(max_pages) Auto-crawl entire site: visit all internal pages, run all detectors, one command
end_session() Close browser, generate HTML report

Alternative: Standalone CLI

Bring your own LLM API key. Argus has a built-in AI planner that decides what to explore.

pip install argus-testing
playwright install chromium

export DEEPSEEK_API_KEY=sk-...   # or OPENAI_API_KEY, ANTHROPIC_API_KEY

argus http://localhost:3000 --model deepseek/deepseek-chat -n 50
argus http://localhost:3000 -f "test login with edge cases" --headed

Supports 100+ models via LiteLLM: OpenAI, Anthropic, DeepSeek, Gemini, Ollama (free/local), etc.

Tested on Real Sites

Site Bugs found Examples
vanlifeyvr.com 1 Missing og:image
nalifex.com 3 Unlabeled search input, 1.5MB uncompressed image, missing og:image
BuggyTasks (test app) 15+ Fake delete, fake edit, broken dates, count mismatch, XSS, auth bypass

Bug Report

Each session generates a self-contained HTML report with:

  • Bug cards with severity, type, description, and reproduction steps
  • Embedded screenshots (base64 — no external files needed)
  • Testing timeline showing every page visited
  • Console and network error logs

How It Works

You give a URL
  -> Argus opens a real browser (Playwright)
  -> AI explores: clicks, types, navigates, tries edge cases
  -> 12 passive detectors analyze every page automatically
  -> On-demand: link crawling + performance metrics
  -> Generates HTML report with all bugs found

Requirements

  • Python 3.10+
  • Chromium (auto-installed via playwright install chromium)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_testing-0.4.0.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_testing-0.4.0-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file argus_testing-0.4.0.tar.gz.

File metadata

  • Download URL: argus_testing-0.4.0.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for argus_testing-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e4b06a84ea9d3acf8582d0d1b8a0a7cbfb673f05de73384d56a7d75c9e52ebe1
MD5 d383ce0c54d89cc22accf1e1692f8e20
BLAKE2b-256 2ec4cf99a6a6243b81fab8fb227740dad1c96657b0a6d9f5f401bd05a13a3136

See more details on using hashes here.

File details

Details for the file argus_testing-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: argus_testing-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for argus_testing-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 556ca90a380f9025b8e2bb2c6e96b31b47b9d739969d2e01c3ec72c18e98c690
MD5 b0b8975d66b3e871f0aac3038044509e
BLAKE2b-256 c10149631463faaa0c7661f4167b4d322c449ec382013856d909f0a008fe978f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page