Skip to main content

Autonomous web testing agent: crawls a live site and generates a runnable Playwright E2E/functional test suite plus a human-readable report.

Project description

Anjalikastra

An open-source CLI that points at a live website URL and produces:

  1. A human-readable report — pass/fail per page and endpoint, coverage %, and a drafted list of likely bugs, queued for human review.
  2. A runnable end-to-end / functional test suite (TypeScript + Playwright Test) — organized test files, config, README, and dependency manifest — that installs and runs with a single command on first try.

The tool discovers what to test by crawling from the URL you give it. It has no access to the target's source code, repo, or CI — everything is inferred from the live site, treating every target as black-box.

These are end-to-end / functional / smoke tests, not unit tests — unit tests require source access, and this tool never has that.

Full documentation: https://khushaljethava.github.io/Anjalikastra/ (built from docs/ — deploys automatically from main).

Install

pip install -e .
playwright install --with-deps chromium   # for the tool's own crawler/endpoint capture

Configure an LLM for full-quality classification and generation: set ANTHROPIC_API_KEY for Claude, or OPENAI_API_KEY/OPENAI_BASE_URL for any OpenAI-compatible provider — OpenAI, Ollama, OpenRouter, Gemini, and more (see "Configuring LLM models" below). Without any key, the tool still runs end-to-end using a heuristic/template-only fallback — smaller and less targeted, but still a working suite.

Usage

anjalikastra https://example.com
anjalikastra <url> [options]

  --output-dir PATH     Where run artifacts are written (default: output/)
  --max-pages N         Cap on pages crawled (default: 40)
  --throttle-ms N       Minimum delay between requests, in ms (default: 500)
  --openapi PATH        Optional OpenAPI spec to enrich endpoint discovery
  --public-only / --allow-auth   v1: only crawl unauthenticated pages (default: on)
  --dry-run             Print the plan and exit without making network requests
  --resume RUN_ID        Resume a previous run (output/<run-id>), skipping discovery if it already finished
  --cheap-model NAME     Model for classification/summaries (default: claude-haiku-4-5-20251001)
  --capable-model NAME   Model for test generation/triage (default: claude-sonnet-5)
  --llm-provider NAME    'anthropic' or 'openai' (any OpenAI-compatible endpoint); auto-detected by default
  --verbose, -v          Verbose logging

Run anjalikastra <url> --dry-run first on a new target — it prints exactly what the tool would do (crawl scope, throttle, model routing) without making a single network request.

Resuming a crashed or interrupted run

Discovery (crawling + endpoint capture) is checkpointed to output/<run-id>/checkpoint.json. If the process crashes or is interrupted after discovery completes, resume with:

anjalikastra <url> --resume <run-id>

This skips re-crawling the site — avoiding hitting the target's infrastructure a second time — and picks up from classification/generation. Classification results are also cached independently by content-hash (see "How it works" below), so even a fresh run against an unchanged site is cheap.

What you get

output/<run-id>/
├── report.md          # human-readable: coverage, failures, drafted bugs, cost
├── report.json         # same data, machine-readable
└── suite/               # the deliverable — yours to keep and maintain
    ├── package.json
    ├── playwright.config.ts
    ├── README.md
    └── tests/
        ├── pages/*.spec.ts
        └── api/*.spec.ts

Run the generated suite:

cd output/<run-id>/suite
npm install && npx playwright install --with-deps chromium
npm test

Coverage honesty

The report always states "tested N of M known pages" and lists what wasn't reached — auth-gated, blocked by robots.txt, sitemap-only, or crawl-truncated — next to why. There is no bare green checkmark implying full coverage.

Scope (v1)

  • Crawls and tests public pages only. Login-gated flows are not tested; they're reported as "not covered," never silently skipped or falsely passed. See anjalikastra/discovery/auth.py for the v2 design.
  • No bot-detection evasion. If a target blocks the crawler, the tool tells you to allowlist its User-Agent on your own site rather than trying to get around it.
  • Nothing is auto-filed or auto-fixed. The tool drafts a bug list; a human decides.

How it works

URL -> Discovery (sitemap + crawl + network capture)
    -> Analysis (page/endpoint classification)
    -> Test generation (assertions -> Playwright files -> review gate)
    -> Execution (run the suite, capture baseline)
    -> Triage (classify failures, draft bug reports)
    -> Reporting (report.md / report.json)

Classification and routine summaries use a cheap/small model; test generation and failure triage use a more capable model. A content-hash cache under output/.cache/ means a second run against an unchanged page skips re-classifying and re-generating it — see the "Cost" section of report.md for the token delta.

Configuring LLM models

The tool splits LLM work across two tiers, each independently configurable:

Tier Used for Default Override
cheap page/endpoint classification, routine summaries claude-haiku-4-5-20251001 --cheap-model flag or ANJALIKASTRA_CHEAP_MODEL env var
capable test generation, failure triage claude-sonnet-5 --capable-model flag or ANJALIKASTRA_CAPABLE_MODEL env var
export ANTHROPIC_API_KEY=sk-ant-...
anjalikastra https://example.com \
  --cheap-model claude-haiku-4-5-20251001 \
  --capable-model claude-opus-4-8

CLI flags take precedence over env vars; env vars take precedence over the defaults. --dry-run shows exactly which models a run would use.

Supported providers

Two backends are supported natively, selected with --llm-provider (or ANJALIKASTRA_LLM_PROVIDER), and auto-detected from your credentials if you don't specify one:

Provider Covers Credentials
anthropic Claude models via the Anthropic API ANTHROPIC_API_KEY
openai any OpenAI-compatible endpoint: OpenAI, Ollama (local models), OpenRouter, Gemini, vLLM, LM Studio, ... OPENAI_API_KEY and/or OPENAI_BASE_URL

Auto-detection: ANTHROPIC_API_KEY set → anthropic; otherwise OPENAI_API_KEY or OPENAI_BASE_URL set → openai; neither → heuristic mode. When using a non-Anthropic provider, pass model names your endpoint actually serves via --cheap-model / --capable-model.

OpenAI:

export OPENAI_API_KEY=sk-...
anjalikastra https://example.com --cheap-model gpt-5-mini --capable-model gpt-5

Ollama (local models, no API key needed):

export OPENAI_BASE_URL=http://localhost:11434/v1
anjalikastra https://example.com --cheap-model llama3.2 --capable-model qwen2.5-coder:32b

OpenRouter (one key, hundreds of models):

export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_API_KEY=sk-or-...
anjalikastra https://example.com \
  --cheap-model google/gemini-2.5-flash --capable-model anthropic/claude-sonnet-4.5

Gemini (via Google's OpenAI-compatible endpoint):

export OPENAI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
export OPENAI_API_KEY=your-gemini-api-key
anjalikastra https://example.com --cheap-model gemini-2.5-flash --capable-model gemini-2.5-pro

Any other server that speaks the OpenAI Chat Completions protocol (vLLM, LM Studio, LiteLLM proxy, Together, Groq, ...) works the same way: set OPENAI_BASE_URL to its address and pass its model names. --dry-run shows the resolved provider, base URL, and models before anything runs.

No key at all? The tool still runs end-to-end in heuristic/template-only mode — classification falls back to URL/DOM heuristics and generation uses the deterministic templates. The suite is smaller and less targeted but still valid and runnable.

Developing this tool

pip install -e ".[dev]"
pytest

anjalikastra/ is the Python orchestrator; it emits TypeScript/Playwright as output artifacts. See anjalikastra/generation/review_gate.py for the minimal-code discipline applied to every generated test file before it ships.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anjalikastra-0.1.0.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anjalikastra-0.1.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file anjalikastra-0.1.0.tar.gz.

File metadata

  • Download URL: anjalikastra-0.1.0.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for anjalikastra-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c024ed34969ab210b012528432337a59f0d38275675e5b4bfb4f2d28c148db26
MD5 cab7c10db2ef0484ff41e9d7fffed545
BLAKE2b-256 ad2d98086ee1b4ae5bd92dcd4525589e85972c298edb123761eea4ad4992c1b3

See more details on using hashes here.

File details

Details for the file anjalikastra-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: anjalikastra-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for anjalikastra-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2180fa897df830a3f0a5dea4ab510c42c6bdc39d1688963a09a3cf7fbfd68b40
MD5 e7080e86ea315b88338e9dbd2d5ea583
BLAKE2b-256 bda8e3213d4a818a2bc1be5aa8e8b9af5c41ec277f0587e0f8508250e1f734fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page