A local-first universal web agent that autonomously explores web apps using browser-use, Playwright, and a local Vision-Language Model.
Project description
Peruse-AI
A local-first universal web agent that autonomously explores web applications and produces structured reports, powered by Playwright and a local Vision-Language Model (e.g. Qwen, Gemma via Ollama, LM Studio, or Jina VLM).
Features
- Autonomous Web Exploration — Give it a URL and a goal; it figures out the rest.
- Dual-Channel Perception — Combines DOM extraction and visual screenshots for robust element detection.
- 100% Local — Your data never leaves your machine. Runs on Ollama, LM Studio, or any OpenAI-compatible local endpoint.
- Custom Personas — Assign the agent a specific role or perspective (e.g. "a senior UX designer", "a QA engineer") to shape how it evaluates pages.
- Focus Groups — Run multiple personas concurrently against the same URL to gather diverse feedback in a single session.
- Smart Loop Recovery — Instead of hard-stopping when stuck, the agent receives nudge messages suggesting alternative actions before eventually giving up.
- Multi-Output Pipeline — Generates three report types from a single session:
- Data Insights — Summaries of charts, tables, and visible data.
- UX/UI Review — Contrast, layout, accessibility, and usability critique.
- Bug Report — Console errors, failed requests, and reproduction steps.
Quickstart
Prerequisites
- Python 3.10+
- Ollama installed and running (install guide)
- Pull the VLM model:
ollama pull qwen2.5-vl:7b
Install
pip install peruse-ai
playwright install chromium
Run
# Full exploration
peruse run --url "https://example.com/dashboard" \
--task "Explore the dashboard and summarize all visible data"
# With a persona
peruse run --url "https://example.com/dashboard" \
--task "Evaluate the dashboard" \
--persona "a senior UX designer with 15 years of experience"
# With extra instructions
peruse run --url "https://example.com/dashboard" \
--task "Review the dashboard" \
--extra-instructions "Pay special attention to color contrast and WCAG compliance"
# Bug scan only
peruse scan --url "https://example.com" \
--task "Click every link and report errors"
# Focus group — multiple personas in parallel
peruse focus-group --url "https://example.com/dashboard" \
--task "Evaluate the dashboard and identify issues" \
--personas "a senior UX designer,a data analyst,a QA engineer"
# Check VLM connectivity
peruse check-vlm
Python API
import asyncio
from peruse_ai import PeruseAgent, PeruseConfig
config = PeruseConfig(
vlm_model="qwen2.5-vl:7b",
persona="an experienced data analyst",
extra_instructions="Focus on data accuracy and chart readability",
)
agent = PeruseAgent(
config=config,
url="https://example.com/dashboard",
task="Summarize the visible data and flag any UI issues",
)
result = asyncio.run(agent.run())
print(result.final_summary)
Focus Group API
import asyncio
from peruse_ai import FocusGroup, PeruseConfig
config = PeruseConfig(vlm_model="qwen2.5-vl:7b")
fg = FocusGroup(
personas=[
"a senior UX designer",
"a data analyst",
"a QA engineer",
],
url="https://example.com/dashboard",
task="Evaluate the dashboard and identify issues",
config=config,
)
result = asyncio.run(fg.run())
for persona, agent_result in result.persona_map.items():
print(f"{persona}: {agent_result.final_summary}")
Each persona runs concurrently with its own browser and VLM instance. Reports are saved to separate sub-directories under the output path (e.g. ./peruse_output/a-senior-ux-designer/).
CLI Reference
peruse run
Full exploration session with all reports.
peruse run [OPTIONS]
| Option | Short | Default | Description |
|---|---|---|---|
--url |
(required) | Starting URL to explore | |
--task |
(required) | High-level goal for the agent | |
--model |
qwen3-vl:6b |
VLM model name | |
--backend |
ollama |
VLM backend: ollama, lmstudio, openai_compat, jina |
|
--base-url |
(auto-detected) | VLM API base URL | |
--output |
-o |
./peruse_output |
Output directory for reports and screenshots |
--max-steps |
50 |
Maximum agent loop iterations | |
--headless/--no-headless |
--headless |
Run browser in headless mode | |
--reports |
all |
Reports to generate: insights, ux, bugs, all (comma-separated) |
|
--persona |
(none) | Agent persona prepended to the system prompt | |
--extra-instructions |
(none) | Additional instructions appended to the agent prompt | |
--max-report-screenshots |
10 |
Max unique screenshots for VLM reports (0 = use all) | |
--verbose |
-v |
off | Enable debug logging |
peruse scan
Lightweight bug scan (bug report only, no VLM-powered analysis).
peruse scan [OPTIONS]
| Option | Short | Default | Description |
|---|---|---|---|
--url |
(required) | Starting URL to scan | |
--task |
"Navigate all links and report any errors encountered." |
Scan goal | |
--model |
qwen3-vl:6b |
VLM model name | |
--backend |
ollama |
VLM backend | |
--base-url |
(auto-detected) | VLM API base URL | |
--output |
-o |
./peruse_output |
Output directory |
--max-steps |
30 |
Maximum steps for scan | |
--persona |
(none) | Agent persona | |
--extra-instructions |
(none) | Additional instructions | |
--verbose |
-v |
off | Enable debug logging |
peruse focus-group
Run multiple personas concurrently against the same URL. Each persona gets its own browser, VLM, and output sub-directory.
peruse focus-group [OPTIONS]
| Option | Short | Default | Description |
|---|---|---|---|
--url |
(required) | Starting URL to explore | |
--task |
(required) | High-level goal for all agents | |
--personas |
(required) | Comma-separated personas or path to a text file (one per line) | |
--model |
qwen3-vl:6b |
VLM model name | |
--backend |
ollama |
VLM backend | |
--base-url |
(auto-detected) | VLM API base URL | |
--output |
-o |
./peruse_output |
Base output directory (each persona gets a sub-directory) |
--max-steps |
50 |
Maximum agent iterations per persona | |
--headless/--no-headless |
--headless |
Run browsers in headless mode | |
--reports |
all |
Reports to generate per persona | |
--extra-instructions |
(none) | Additional instructions for all agents | |
--max-report-screenshots |
10 |
Max unique screenshots for VLM reports (0 = use all) | |
--verbose |
-v |
off | Enable debug logging |
Personas from a file:
# personas.txt (one per line)
a senior UX designer
a data analyst specializing in dashboards
a QA engineer focused on accessibility
peruse focus-group --url "https://example.com" \
--task "Evaluate the application" \
--personas personas.txt
peruse check-vlm
Verify VLM backend connectivity.
peruse check-vlm [OPTIONS]
| Option | Short | Default | Description |
|---|---|---|---|
--model |
qwen3-vl:6b |
VLM model name | |
--backend |
ollama |
VLM backend | |
--base-url |
(auto-detected) | VLM API base URL | |
--verbose |
-v |
off | Enable debug logging |
Configuration
All settings can be passed via constructor, environment variables (PERUSE_*), or a .env file.
| Setting | Env Var | Default | Description |
|---|---|---|---|
vlm_backend |
PERUSE_VLM_BACKEND |
"ollama" |
"ollama", "lmstudio", "openai_compat", or "jina" |
vlm_model |
PERUSE_VLM_MODEL |
"qwen3-vl:6b" |
Model identifier |
vlm_base_url |
PERUSE_VLM_BASE_URL |
"http://localhost:11434" |
API endpoint |
vlm_num_ctx |
PERUSE_VLM_NUM_CTX |
32768 |
Context window size (tokens) for Ollama |
vlm_retries |
PERUSE_VLM_RETRIES |
2 |
Retry attempts on VLM crash |
vlm_cooldown |
PERUSE_VLM_COOLDOWN |
3.0 |
Seconds to wait before retry |
headless |
PERUSE_HEADLESS |
True |
Run browser headless |
max_steps |
PERUSE_MAX_STEPS |
50 |
Max agent loop iterations |
max_dom_elements |
PERUSE_MAX_DOM_ELEMENTS |
100 |
Max DOM elements per step (0 = unlimited) |
output_dir |
PERUSE_OUTPUT_DIR |
"./peruse_output" |
Report output directory |
persona |
PERUSE_PERSONA |
"" |
Agent persona prepended to the system prompt |
extra_instructions |
PERUSE_EXTRA_INSTRUCTIONS |
"" |
Additional instructions appended to the agent prompt |
max_nudges |
PERUSE_MAX_NUDGES |
3 |
Max nudge messages before hard-stopping on loops |
max_report_screenshots |
PERUSE_MAX_REPORT_SCREENSHOTS |
10 |
Max unique screenshots for VLM reports (0 = use all) |
Persona
The persona setting prepends an identity to the agent's system prompt. This shapes how the agent interprets and evaluates what it sees, without affecting its core browsing capabilities.
peruse run --url "https://example.com" \
--task "Review the application" \
--persona "an extremely experienced AD for a prestigious american sports focused university"
Or via environment variable:
export PERUSE_PERSONA="a senior accessibility auditor"
peruse run --url "https://example.com" --task "Audit this page"
Extra Instructions
The extra_instructions setting appends domain-specific guidance to the agent prompt without replacing the base prompt's JSON format rules and action definitions.
peruse run --url "https://example.com" \
--task "Explore the dashboard" \
--extra-instructions "Focus on data tables. Ignore the navigation sidebar."
Loop Recovery (Nudges)
When the agent gets stuck repeating the same action, it receives a nudge message suggesting alternative approaches instead of immediately stopping. The max_nudges setting controls how many recovery attempts are allowed before the agent hard-stops (default: 3).
# Allow more recovery attempts
peruse run --url "https://example.com" --task "Explore" \
PERUSE_MAX_NUDGES=5
Or in Python:
config = PeruseConfig(max_nudges=5)
Intel ARC GPU (Vulkan / IPEX-LLM)
Peruse-AI can run on Intel ARC GPUs via the IPEX-LLM project, but this backend is experimental and unstable. The Ollama model runner frequently crashes with:
model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error (status code: 500)
Known issues:
- The Vulkan backend crashes when receiving rapid back-to-back VLM calls
- Shader compilation on first run can cause startup timeouts
- Large context windows (
vlm_num_ctx> 8192) may exhaust GPU memory
Workarounds:
- Warm up the model first — Run
ollama run gemma3:4b "hello"in your terminal before usingperuse run. This pre-compiles shaders and loads the model into VRAM. - Use a smaller context window — Keep
vlm_num_ctxat4096or8192. - Increase retries — Set
vlm_retries=3andvlm_cooldown=5.0to give the GPU time to recover between crashes. - Prefer NVIDIA/CUDA — If available, an NVIDIA GPU with the standard Ollama build is significantly more stable.
Development
git clone https://github.com/rajas/peruse-ai.git
cd peruse-ai
pip install -e ".[dev]"
playwright install chromium
pytest tests/ -v
To use the Jina VLM cloud backend, set your API key:
# .env file
PERUSE_VLM_API_KEY=jina_xxxxxxxxxxxx
peruse run --url "https://example.com" --task "Explore" --backend jina
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peruse_ai-0.1.0.tar.gz.
File metadata
- Download URL: peruse_ai-0.1.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47e5da6ce07d176e87dc15a16e956723f953cb7f10b9c3dff3741f25abdca762
|
|
| MD5 |
b6520e77685cf5aa15ed9782b4f5cde2
|
|
| BLAKE2b-256 |
93d8de8bc624e4cf7f1753b90141cf3b1d036e94be61f394e7e0515fbdb5b1db
|
File details
Details for the file peruse_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: peruse_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58fe78e5a064e8410f764ea09ffa50767c3cf704f34ab2fee63538d4cd285ab
|
|
| MD5 |
62e613504ddc77c4643f34c462e73582
|
|
| BLAKE2b-256 |
acb5f3f6250f3f2459ef11fa462a69b318cfa87666db38f76d63781c928039e3
|