Give an LLM a URL and a goal — it drives a real browser, fills forms, and returns structured data. The browser that scripts itself.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kshakula2023

These details have not been verified by PyPI

Project description

🦾 Browsewright

The browser that scripts itself.

Give an LLM a URL and a goal. It drives a real Chrome, fills out forms, gets past bot walls, and hands you structured data not raw HTML.

Playwright automates a browser you script. Browsewright is the browser that scripts itself.

You don't write selectors. You don't maintain scrapers that break every time a site ships a redesign. You give it intent — "find the pricing", "enrich this lead", "fill out this form" — and an LLM drives a real browser to get it done.

pip install browsewright
bw "https://stripe.com" "what does this company do and who is it for"

============================================================
RESULT  [api]  412 tokens  3.1s
------------------------------------------------------------
Stripe is financial infrastructure for the internet. It provides
payment processing, billing, and treasury APIs for businesses from
startups to enterprises like Amazon and Shopify...
============================================================

🤯 It doesn't just read the web. It does things on the web.

Most "AI scrapers" hand you text. Browsewright acts. Point it at a real government records form with no API, give it a profile, and walk away:

bw-tasks form \
  "https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx" \
  --profile examples/sample_profile.json

It read the field labels, mapped your profile onto the form with an LLM, picked valid dropdown options, submitted it, and came back with:

Page 1 of 815 results — real names and dates, extracted as JSON.

No selectors. No XPath. No API. The form has none — it's a 20-year-old ASP.NET page that's invisible to every HTTP scraper. Browsewright drives it like a human.

💸 And it's almost free

Benchmark — 50 real, diverse websites in one run: 50 / 50 extracted successfully · $0.047 total · ~1,200 tokens & ~20s median per site. 28% were answered by the free API/archive shortcut with no browser at all. (Reproduce it: python examples/batch_test.py.)

It tries the cheapest path first — open APIs, RSS, public archives — and only spins up Chrome when a page actually needs it. You pay pennies for the easy 80% and a real browser for the hard 20%.

How it stacks up

	Browsewright	Firecrawl	Browser-Use	Tavily
Returns structured JSON from intent	✅	✅	⚠️ scripted	✅
Fills & submits real forms	✅	❌	✅	❌
Drives a real Chrome (human motor layer)	✅	❌	✅	❌
Gets past Cloudflare/DataDome bot walls	✅	⚠️	⚠️	❌
Free API/archive shortcut before any browser	✅	❌	❌	❌
Runs fully local, your own API key	✅	❌ SaaS	✅	❌ SaaS
5 ready-made business tasks built in	✅	❌	❌	❌
MIT, self-hostable	✅	partial	✅	❌

Comparisons reflect typical default usage; all four are good tools. Browsewright's bet is intent in → action + structured data out, run locally for pennies.

Install

pip install browsewright          # core
pip install "browsewright[mcp]"   # + MCP server (Claude Desktop / Code / any client)

Or from source:

git clone https://github.com/krishnashakula/browsewright && cd browsewright
python -m venv .venv && . .venv/bin/activate    # Windows: .venv\Scripts\activate
pip install -e .

Add your Anthropic API key:

cp .env.example .env
# edit .env and paste your key from https://console.anthropic.com/settings/keys

The first browser run launches Chrome via nodriver (Chrome must be installed).

bw "not recognized" after install? pip put the scripts in a folder that isn't on your PATH (common on Windows). Use the module form, which always works: python -m browsewright "<url>" "<goal>" · python -m browsewright.tasks_cli enrich "<url>"

Use it

CLI

bw "https://news.ycombinator.com" "the top story right now"
bw "https://example.com" "find the pricing" --json
bw "https://example.com" "debug this" --no-headless --verbose

Python

import asyncio
from browsewright import search

res = asyncio.run(search("https://stripe.com", "what does this company do"))
print(res.answer)         # synthesized answer
print(res.stage)          # "api" | "browser" | "common_crawl" | "blocked" | "error"
print(res.tokens_total, res.elapsed_s)

As an MCP tool (Claude Desktop / Claude Code / any MCP client)

{ "mcpServers": { "browsewright": { "command": "bw-mcp" } } }

Your LLM now has a read_page(url, goal) tool.

The 5 built-in tasks — `bw-tasks`

One pipeline — fetch → structured extract (JSON) → diff/aggregate → action — exposed as five business workflows. Each is a CLI subcommand and a library function.

Task	Command	Output
🕵️ Competitor watch	`bw-tasks watch <url>`	Baseline now, change alerts later
🎯 Lead enrichment	`bw-tasks enrich <url>`	CRM fields + a personalized cold-email line
📝 Agentic form fill	`bw-tasks form <url> --profile p.json`	Understands fields, fills, submits, reads results
💰 Price/stock tracking	`bw-tasks track <url>`	Price & availability change alerts
📣 Brand monitoring	`bw-tasks brand <name> <urls…>`	Mentions + sentiment digest

Common flags: --json, --out FILE, --slack <webhook>, --no-headless, --aggressive.

Real enrich output (trimmed):

{
  "company_name": "Tavily",
  "industry": "AI/SaaS - Developer Tools",
  "tech_stack_or_integrations": ["OpenAI", "Anthropic", "Groq", "Databricks"],
  "recent_news_or_signals": ["Raised $25M Series A", "Databricks MCP partnership"],
  "icp_fit_score_1_to_10": 7,
  "personalized_cold_email_first_line": "I noticed Tavily just partnered with Databricks on the MCP Marketplace—looks like you're doubling down on enterprise adoption after your $25M Series A."
}

Build your own task with the core primitive

Every task is a thin wrapper over extract_structured(url, schema). Define any schema, get JSON back:

import asyncio
from browsewright import extract_structured

schema = {"headline": "string",
          "open_roles": [{"title": "string", "team": "string", "location": "string"}]}
data = asyncio.run(extract_structured(
    "https://example.com/careers", schema,
    instruction="Extract the page headline and every open job posting."))
print(data["open_roles"])

Scheduling

Tasks are single-shot; snapshot/diff state persists between runs, so change detection works across invocations. Run on cron, n8n/Make/Zapier, or /loop:

# every 6h, alert on competitor pricing changes
0 */6 * * * bw-tasks watch "https://competitor.com/pricing" --slack https://hooks.slack.com/services/XXX

How it works

search(url, goal)
   │
   ├─ Polite gate ........ robots.txt check + per-host rate limit
   │
   ├─ Pre-flight pipeline (cheapest path first)
   │     1. Common Crawl ... public archive            (opt-in)
   │     2. Open API ....... RSS / wp-json / *.json     (no browser, ~1.5k tokens)
   │     3. Origin IP ...... CDN bypass                 (skipped in polite mode)
   │     4. Classifier ..... detect Cloudflare/Akamai/DataDome/…
   │
   └─ Browser session (only if no shortcut hit)
         • real headless Chrome via nodriver (native TLS fingerprint)
         • human motor layer — Bézier mouse, typing cadence, scroll pacing
         • LLM decides actions only at junctions (~1 call/page)
         • blind-scene shortcut: extract directly when the DOM scan is blocked
         • visual recovery: a vision call clears interstitials/challenges

Polite by default

Polite mode is the default and what you should ship. It checks robots.txt, rate-limits per host, and does not bypass CDN bot protection. --aggressive (polite=False) enables origin-IP discovery and ignores robots — use it only on targets you own or are authorized to test.

⚠️ You are responsible for complying with each site's Terms of Service, applicable law (CFAA and equivalents), and data-protection rules (GDPR/CCPA). Browsewright is for authorized research, your own properties, and sites whose terms permit automated access. The authors accept no liability for misuse.

⭐ Star it / contribute

If Browsewright saved you a scraper, drop a star — it's the whole reason this is open source. Issues and PRs welcome: pre-flight vendors, new tasks, more sites in the benchmark.

MIT licensed. Built on nodriver

Anthropic Claude.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kshakula2023

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsewright-0.1.0.tar.gz (53.4 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

browsewright-0.1.0-py3-none-any.whl (56.0 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file browsewright-0.1.0.tar.gz.

File metadata

Download URL: browsewright-0.1.0.tar.gz
Upload date: Jun 16, 2026
Size: 53.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for browsewright-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`97d70a7e97be4cf7f72ba3cf9f1d8d758dff0e53b458d3745e84a17a4268a6fc`
MD5	`02f67356748d0a93520b993f0ddaed6e`
BLAKE2b-256	`a9f9daea876849f61288d1709e59cdd0e9f411c02d0bcbbf97a8eb3c11bf7746`

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsewright-0.1.0.tar.gz:

Publisher: publish.yml on krishnashakula/browsewright

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: browsewright-0.1.0.tar.gz
- Subject digest: 97d70a7e97be4cf7f72ba3cf9f1d8d758dff0e53b458d3745e84a17a4268a6fc
- Sigstore transparency entry: 1843136410
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: krishnashakula/browsewright@4aa2f9125db0d5ba9f710f827f206e1fc4632f28
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/krishnashakula
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa2f9125db0d5ba9f710f827f206e1fc4632f28
- Trigger Event: release

File details

Details for the file browsewright-0.1.0-py3-none-any.whl.

File metadata

Download URL: browsewright-0.1.0-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 56.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for browsewright-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4a437044d799e19c26a12c3d84369e657795343c027bd11a7efa5a16f1bf608`
MD5	`1d558a3247271207b2f31b90ad300954`
BLAKE2b-256	`4a04b969b5f78595bfedf36c8fa16c6db43232e0efd59914ba5c5fba600009da`

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsewright-0.1.0-py3-none-any.whl:

Publisher: publish.yml on krishnashakula/browsewright

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: browsewright-0.1.0-py3-none-any.whl
- Subject digest: d4a437044d799e19c26a12c3d84369e657795343c027bd11a7efa5a16f1bf608
- Sigstore transparency entry: 1843136503
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: krishnashakula/browsewright@4aa2f9125db0d5ba9f710f827f206e1fc4632f28
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/krishnashakula
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa2f9125db0d5ba9f710f827f206e1fc4632f28
- Trigger Event: release

browsewright 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🦾 Browsewright

The browser that scripts itself.

🤯 It doesn't just read the web. It does things on the web.

💸 And it's almost free

How it stacks up

Install

Use it

CLI

Python

As an MCP tool (Claude Desktop / Claude Code / any MCP client)

The 5 built-in tasks — bw-tasks

Build your own task with the core primitive

Scheduling

How it works

Polite by default

⭐ Star it / contribute

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

The 5 built-in tasks — `bw-tasks`