Skip to main content

A local-first universal web agent that autonomously explores web apps using browser-use, Playwright, and a local Vision-Language Model.

Project description

Peruse-AI

A local-first universal web agent that autonomously explores web applications and produces structured reports, powered by Playwright and a local Vision-Language Model (e.g. Qwen, Gemma via Ollama, LM Studio, or Jina VLM).


Features

  • Autonomous Web Exploration — Give it a URL and a goal; it figures out the rest.
  • Dual-Channel Perception — Combines DOM extraction and visual screenshots for robust element detection.
  • 100% Local — Your data never leaves your machine. Runs on Ollama, LM Studio, or any OpenAI-compatible local endpoint.
  • Custom Personas — Assign the agent a specific role or perspective (e.g. "a senior UX designer", "a QA engineer") to shape how it evaluates pages.
  • Focus Groups — Run multiple personas concurrently against the same URL to gather diverse feedback in a single session.
  • Smart Loop Recovery — Instead of hard-stopping when stuck, the agent receives nudge messages suggesting alternative actions before eventually giving up.
  • Multi-Output Pipeline — Generates three report types from a single session:
    • Data Insights — Summaries of charts, tables, and visible data.
    • UX/UI Review — Contrast, layout, accessibility, and usability critique.
    • Bug Report — Console errors, failed requests, and reproduction steps.

Quickstart

Prerequisites

  1. Python 3.10+
  2. Ollama installed and running (install guide)
  3. Pull the VLM model:
    ollama pull qwen2.5-vl:7b
    

Install

pip install peruse-ai
playwright install chromium

Run

# Full exploration
peruse run --url "https://example.com/dashboard" \
           --task "Explore the dashboard and summarize all visible data"

# With a persona
peruse run --url "https://example.com/dashboard" \
           --task "Evaluate the dashboard" \
           --persona "a senior UX designer with 15 years of experience"

# With extra instructions
peruse run --url "https://example.com/dashboard" \
           --task "Review the dashboard" \
           --extra-instructions "Pay special attention to color contrast and WCAG compliance"

# Bug scan only
peruse scan --url "https://example.com" \
            --task "Click every link and report errors"

# Focus group — multiple personas in parallel
peruse focus-group --url "https://example.com/dashboard" \
                   --task "Evaluate the dashboard and identify issues" \
                   --personas "a senior UX designer,a data analyst,a QA engineer"

# Check VLM connectivity
peruse check-vlm

Python API

import asyncio
from peruse_ai import PeruseAgent, PeruseConfig

config = PeruseConfig(
    vlm_model="qwen2.5-vl:7b",
    persona="an experienced data analyst",
    extra_instructions="Focus on data accuracy and chart readability",
)
agent = PeruseAgent(
    config=config,
    url="https://example.com/dashboard",
    task="Summarize the visible data and flag any UI issues",
)
result = asyncio.run(agent.run())
print(result.final_summary)

Focus Group API

import asyncio
from peruse_ai import FocusGroup, PeruseConfig

config = PeruseConfig(vlm_model="qwen2.5-vl:7b")
fg = FocusGroup(
    personas=[
        "a senior UX designer",
        "a data analyst",
        "a QA engineer",
    ],
    url="https://example.com/dashboard",
    task="Evaluate the dashboard and identify issues",
    config=config,
)
result = asyncio.run(fg.run())

for persona, agent_result in result.persona_map.items():
    print(f"{persona}: {agent_result.final_summary}")

Each persona runs concurrently with its own browser and VLM instance. Reports are saved to separate sub-directories under the output path (e.g. ./peruse_output/a-senior-ux-designer/).


CLI Reference

peruse run

Full exploration session with all reports.

peruse run [OPTIONS]
Option Short Default Description
--url (required) Starting URL to explore
--task (required) High-level goal for the agent
--model qwen3-vl:6b VLM model name
--backend ollama VLM backend: ollama, lmstudio, openai_compat, jina
--base-url (auto-detected) VLM API base URL
--output -o ./peruse_output Output directory for reports and screenshots
--max-steps 50 Maximum agent loop iterations
--headless/--no-headless --headless Run browser in headless mode
--reports all Reports to generate: insights, ux, bugs, all (comma-separated)
--persona (none) Agent persona prepended to the system prompt
--extra-instructions (none) Additional instructions appended to the agent prompt
--max-report-screenshots 10 Max unique screenshots for VLM reports (0 = use all)
--verbose -v off Enable debug logging

peruse scan

Lightweight bug scan (bug report only, no VLM-powered analysis).

peruse scan [OPTIONS]
Option Short Default Description
--url (required) Starting URL to scan
--task "Navigate all links and report any errors encountered." Scan goal
--model qwen3-vl:6b VLM model name
--backend ollama VLM backend
--base-url (auto-detected) VLM API base URL
--output -o ./peruse_output Output directory
--max-steps 30 Maximum steps for scan
--persona (none) Agent persona
--extra-instructions (none) Additional instructions
--verbose -v off Enable debug logging

peruse focus-group

Run multiple personas concurrently against the same URL. Each persona gets its own browser, VLM, and output sub-directory.

peruse focus-group [OPTIONS]
Option Short Default Description
--url (required) Starting URL to explore
--task (required) High-level goal for all agents
--personas (required) Comma-separated personas or path to a text file (one per line)
--model qwen3-vl:6b VLM model name
--backend ollama VLM backend
--base-url (auto-detected) VLM API base URL
--output -o ./peruse_output Base output directory (each persona gets a sub-directory)
--max-steps 50 Maximum agent iterations per persona
--headless/--no-headless --headless Run browsers in headless mode
--reports all Reports to generate per persona
--extra-instructions (none) Additional instructions for all agents
--max-report-screenshots 10 Max unique screenshots for VLM reports (0 = use all)
--verbose -v off Enable debug logging

Personas from a file:

# personas.txt (one per line)
a senior UX designer
a data analyst specializing in dashboards
a QA engineer focused on accessibility

peruse focus-group --url "https://example.com" \
                   --task "Evaluate the application" \
                   --personas personas.txt

peruse check-vlm

Verify VLM backend connectivity.

peruse check-vlm [OPTIONS]
Option Short Default Description
--model qwen3-vl:6b VLM model name
--backend ollama VLM backend
--base-url (auto-detected) VLM API base URL
--verbose -v off Enable debug logging

Configuration

All settings can be passed via constructor, environment variables (PERUSE_*), or a .env file.

Setting Env Var Default Description
vlm_backend PERUSE_VLM_BACKEND "ollama" "ollama", "lmstudio", "openai_compat", or "jina"
vlm_model PERUSE_VLM_MODEL "qwen3-vl:6b" Model identifier
vlm_base_url PERUSE_VLM_BASE_URL "http://localhost:11434" API endpoint
vlm_num_ctx PERUSE_VLM_NUM_CTX 32768 Context window size (tokens) for Ollama
vlm_retries PERUSE_VLM_RETRIES 2 Retry attempts on VLM crash
vlm_cooldown PERUSE_VLM_COOLDOWN 3.0 Seconds to wait before retry
headless PERUSE_HEADLESS True Run browser headless
max_steps PERUSE_MAX_STEPS 50 Max agent loop iterations
max_dom_elements PERUSE_MAX_DOM_ELEMENTS 100 Max DOM elements per step (0 = unlimited)
output_dir PERUSE_OUTPUT_DIR "./peruse_output" Report output directory
persona PERUSE_PERSONA "" Agent persona prepended to the system prompt
extra_instructions PERUSE_EXTRA_INSTRUCTIONS "" Additional instructions appended to the agent prompt
max_nudges PERUSE_MAX_NUDGES 3 Max nudge messages before hard-stopping on loops
max_report_screenshots PERUSE_MAX_REPORT_SCREENSHOTS 10 Max unique screenshots for VLM reports (0 = use all)

Persona

The persona setting prepends an identity to the agent's system prompt. This shapes how the agent interprets and evaluates what it sees, without affecting its core browsing capabilities.

peruse run --url "https://example.com" \
           --task "Review the application" \
           --persona "an extremely experienced AD for a prestigious american sports focused university"

Or via environment variable:

export PERUSE_PERSONA="a senior accessibility auditor"
peruse run --url "https://example.com" --task "Audit this page"

Extra Instructions

The extra_instructions setting appends domain-specific guidance to the agent prompt without replacing the base prompt's JSON format rules and action definitions.

peruse run --url "https://example.com" \
           --task "Explore the dashboard" \
           --extra-instructions "Focus on data tables. Ignore the navigation sidebar."

Loop Recovery (Nudges)

When the agent gets stuck repeating the same action, it receives a nudge message suggesting alternative approaches instead of immediately stopping. The max_nudges setting controls how many recovery attempts are allowed before the agent hard-stops (default: 3).

# Allow more recovery attempts
peruse run --url "https://example.com" --task "Explore" \
           PERUSE_MAX_NUDGES=5

Or in Python:

config = PeruseConfig(max_nudges=5)

Intel ARC GPU (Vulkan / IPEX-LLM)

Peruse-AI can run on Intel ARC GPUs via the IPEX-LLM project, but this backend is experimental and unstable. The Ollama model runner frequently crashes with:

model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error (status code: 500)

Known issues:

  • The Vulkan backend crashes when receiving rapid back-to-back VLM calls
  • Shader compilation on first run can cause startup timeouts
  • Large context windows (vlm_num_ctx > 8192) may exhaust GPU memory

Workarounds:

  1. Warm up the model first — Run ollama run gemma3:4b "hello" in your terminal before using peruse run. This pre-compiles shaders and loads the model into VRAM.
  2. Use a smaller context window — Keep vlm_num_ctx at 4096 or 8192.
  3. Increase retries — Set vlm_retries=3 and vlm_cooldown=5.0 to give the GPU time to recover between crashes.
  4. Prefer NVIDIA/CUDA — If available, an NVIDIA GPU with the standard Ollama build is significantly more stable.

Development

git clone https://github.com/rajas/peruse-ai.git
cd peruse-ai
pip install -e ".[dev]"
playwright install chromium
pytest tests/ -v

To use the Jina VLM cloud backend, set your API key:

# .env file
PERUSE_VLM_API_KEY=jina_xxxxxxxxxxxx
peruse run --url "https://example.com" --task "Explore" --backend jina

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peruse_ai-0.1.0.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peruse_ai-0.1.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file peruse_ai-0.1.0.tar.gz.

File metadata

  • Download URL: peruse_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for peruse_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 47e5da6ce07d176e87dc15a16e956723f953cb7f10b9c3dff3741f25abdca762
MD5 b6520e77685cf5aa15ed9782b4f5cde2
BLAKE2b-256 93d8de8bc624e4cf7f1753b90141cf3b1d036e94be61f394e7e0515fbdb5b1db

See more details on using hashes here.

File details

Details for the file peruse_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peruse_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for peruse_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c58fe78e5a064e8410f764ea09ffa50767c3cf704f34ab2fee63538d4cd285ab
MD5 62e613504ddc77c4643f34c462e73582
BLAKE2b-256 acb5f3f6250f3f2459ef11fa462a69b318cfa87666db38f76d63781c928039e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page