Give an LLM a URL and a goal — it drives a real browser, fills forms, and returns structured data. The browser that scripts itself.
Project description
🦾 Browsewright
The browser that scripts itself.
Give an LLM a URL and a goal. It drives a real Chrome, fills out forms, gets past bot walls, and hands you structured data not raw HTML.
Playwright automates a browser you script. Browsewright is the browser that scripts itself.
You don't write selectors. You don't maintain scrapers that break every time a site ships a redesign. You give it intent — "find the pricing", "enrich this lead", "fill out this form" — and an LLM drives a real browser to get it done.
pip install browsewright
bw "https://stripe.com" "what does this company do and who is it for"
============================================================
RESULT [api] 412 tokens 3.1s
------------------------------------------------------------
Stripe is financial infrastructure for the internet. It provides
payment processing, billing, and treasury APIs for businesses from
startups to enterprises like Amazon and Shopify...
============================================================
🤯 It doesn't just read the web. It does things on the web.
Most "AI scrapers" hand you text. Browsewright acts. Point it at a real government records form with no API, give it a profile, and walk away:
bw-tasks form \
"https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx" \
--profile examples/sample_profile.json
It read the field labels, mapped your profile onto the form with an LLM, picked valid dropdown options, submitted it, and came back with:
Page 1 of 815 results — real names and dates, extracted as JSON.
No selectors. No XPath. No API. The form has none — it's a 20-year-old ASP.NET page that's invisible to every HTTP scraper. Browsewright drives it like a human.
💸 And it's almost free
Benchmark — 50 real, diverse websites in one run: 50 / 50 extracted successfully ·
$0.047total · ~1,200 tokens & ~20s median per site. 28% were answered by the free API/archive shortcut with no browser at all. (Reproduce it:python examples/batch_test.py.)
It tries the cheapest path first — open APIs, RSS, public archives — and only spins up Chrome when a page actually needs it. You pay pennies for the easy 80% and a real browser for the hard 20%.
How it stacks up
| Browsewright | Firecrawl | Browser-Use | Tavily | |
|---|---|---|---|---|
| Returns structured JSON from intent | ✅ | ✅ | ⚠️ scripted | ✅ |
| Fills & submits real forms | ✅ | ❌ | ✅ | ❌ |
| Drives a real Chrome (human motor layer) | ✅ | ❌ | ✅ | ❌ |
| Gets past Cloudflare/DataDome bot walls | ✅ | ⚠️ | ⚠️ | ❌ |
| Free API/archive shortcut before any browser | ✅ | ❌ | ❌ | ❌ |
| Runs fully local, your own API key | ✅ | ❌ SaaS | ✅ | ❌ SaaS |
| 5 ready-made business tasks built in | ✅ | ❌ | ❌ | ❌ |
| MIT, self-hostable | ✅ | partial | ✅ | ❌ |
Comparisons reflect typical default usage; all four are good tools. Browsewright's bet is intent in → action + structured data out, run locally for pennies.
Install
pip install browsewright # core
pip install "browsewright[mcp]" # + MCP server (Claude Desktop / Code / any client)
Or from source:
git clone https://github.com/krishnashakula/browsewright && cd browsewright
python -m venv .venv && . .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
Add your Anthropic API key:
cp .env.example .env
# edit .env and paste your key from https://console.anthropic.com/settings/keys
The first browser run launches Chrome via nodriver (Chrome must be installed).
bw"not recognized" after install? pip put the scripts in a folder that isn't on your PATH (common on Windows). Use the module form, which always works:python -m browsewright "<url>" "<goal>"·python -m browsewright.tasks_cli enrich "<url>"
Use it
CLI
bw "https://news.ycombinator.com" "the top story right now"
bw "https://example.com" "find the pricing" --json
bw "https://example.com" "debug this" --no-headless --verbose
Python
import asyncio
from browsewright import search
res = asyncio.run(search("https://stripe.com", "what does this company do"))
print(res.answer) # synthesized answer
print(res.stage) # "api" | "browser" | "common_crawl" | "blocked" | "error"
print(res.tokens_total, res.elapsed_s)
As an MCP tool (Claude Desktop / Claude Code / any MCP client)
{ "mcpServers": { "browsewright": { "command": "bw-mcp" } } }
Your LLM now has a read_page(url, goal) tool.
The 5 built-in tasks — bw-tasks
One pipeline — fetch → structured extract (JSON) → diff/aggregate → action — exposed as five business workflows. Each is a CLI subcommand and a library function.
| Task | Command | Output |
|---|---|---|
| 🕵️ Competitor watch | bw-tasks watch <url> |
Baseline now, change alerts later |
| 🎯 Lead enrichment | bw-tasks enrich <url> |
CRM fields + a personalized cold-email line |
| 📝 Agentic form fill | bw-tasks form <url> --profile p.json |
Understands fields, fills, submits, reads results |
| 💰 Price/stock tracking | bw-tasks track <url> |
Price & availability change alerts |
| 📣 Brand monitoring | bw-tasks brand <name> <urls…> |
Mentions + sentiment digest |
Common flags: --json, --out FILE, --slack <webhook>, --no-headless, --aggressive.
Real enrich output (trimmed):
{
"company_name": "Tavily",
"industry": "AI/SaaS - Developer Tools",
"tech_stack_or_integrations": ["OpenAI", "Anthropic", "Groq", "Databricks"],
"recent_news_or_signals": ["Raised $25M Series A", "Databricks MCP partnership"],
"icp_fit_score_1_to_10": 7,
"personalized_cold_email_first_line": "I noticed Tavily just partnered with Databricks on the MCP Marketplace—looks like you're doubling down on enterprise adoption after your $25M Series A."
}
Build your own task with the core primitive
Every task is a thin wrapper over extract_structured(url, schema). Define any
schema, get JSON back:
import asyncio
from browsewright import extract_structured
schema = {"headline": "string",
"open_roles": [{"title": "string", "team": "string", "location": "string"}]}
data = asyncio.run(extract_structured(
"https://example.com/careers", schema,
instruction="Extract the page headline and every open job posting."))
print(data["open_roles"])
Scheduling
Tasks are single-shot; snapshot/diff state persists between runs, so change
detection works across invocations. Run on cron, n8n/Make/Zapier, or /loop:
# every 6h, alert on competitor pricing changes
0 */6 * * * bw-tasks watch "https://competitor.com/pricing" --slack https://hooks.slack.com/services/XXX
How it works
search(url, goal)
│
├─ Polite gate ........ robots.txt check + per-host rate limit
│
├─ Pre-flight pipeline (cheapest path first)
│ 1. Common Crawl ... public archive (opt-in)
│ 2. Open API ....... RSS / wp-json / *.json (no browser, ~1.5k tokens)
│ 3. Origin IP ...... CDN bypass (skipped in polite mode)
│ 4. Classifier ..... detect Cloudflare/Akamai/DataDome/…
│
└─ Browser session (only if no shortcut hit)
• real headless Chrome via nodriver (native TLS fingerprint)
• human motor layer — Bézier mouse, typing cadence, scroll pacing
• LLM decides actions only at junctions (~1 call/page)
• blind-scene shortcut: extract directly when the DOM scan is blocked
• visual recovery: a vision call clears interstitials/challenges
Polite by default
Polite mode is the default and what you should ship. It checks robots.txt,
rate-limits per host, and does not bypass CDN bot protection. --aggressive
(polite=False) enables origin-IP discovery and ignores robots — use it only
on targets you own or are authorized to test.
⚠️ You are responsible for complying with each site's Terms of Service, applicable law (CFAA and equivalents), and data-protection rules (GDPR/CCPA). Browsewright is for authorized research, your own properties, and sites whose terms permit automated access. The authors accept no liability for misuse.
⭐ Star it / contribute
If Browsewright saved you a scraper, drop a star — it's the whole reason this is open source. Issues and PRs welcome: pre-flight vendors, new tasks, more sites in the benchmark.
MIT licensed. Built on nodriver
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browsewright-0.1.0.tar.gz.
File metadata
- Download URL: browsewright-0.1.0.tar.gz
- Upload date:
- Size: 53.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d70a7e97be4cf7f72ba3cf9f1d8d758dff0e53b458d3745e84a17a4268a6fc
|
|
| MD5 |
02f67356748d0a93520b993f0ddaed6e
|
|
| BLAKE2b-256 |
a9f9daea876849f61288d1709e59cdd0e9f411c02d0bcbbf97a8eb3c11bf7746
|
Provenance
The following attestation bundles were made for browsewright-0.1.0.tar.gz:
Publisher:
publish.yml on krishnashakula/browsewright
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browsewright-0.1.0.tar.gz -
Subject digest:
97d70a7e97be4cf7f72ba3cf9f1d8d758dff0e53b458d3745e84a17a4268a6fc - Sigstore transparency entry: 1843136410
- Sigstore integration time:
-
Permalink:
krishnashakula/browsewright@4aa2f9125db0d5ba9f710f827f206e1fc4632f28 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/krishnashakula
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa2f9125db0d5ba9f710f827f206e1fc4632f28 -
Trigger Event:
release
-
Statement type:
File details
Details for the file browsewright-0.1.0-py3-none-any.whl.
File metadata
- Download URL: browsewright-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a437044d799e19c26a12c3d84369e657795343c027bd11a7efa5a16f1bf608
|
|
| MD5 |
1d558a3247271207b2f31b90ad300954
|
|
| BLAKE2b-256 |
4a04b969b5f78595bfedf36c8fa16c6db43232e0efd59914ba5c5fba600009da
|
Provenance
The following attestation bundles were made for browsewright-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on krishnashakula/browsewright
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browsewright-0.1.0-py3-none-any.whl -
Subject digest:
d4a437044d799e19c26a12c3d84369e657795343c027bd11a7efa5a16f1bf608 - Sigstore transparency entry: 1843136503
- Sigstore integration time:
-
Permalink:
krishnashakula/browsewright@4aa2f9125db0d5ba9f710f827f206e1fc4632f28 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/krishnashakula
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa2f9125db0d5ba9f710f827f206e1fc4632f28 -
Trigger Event:
release
-
Statement type: