Skip to main content

A stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.

Project description

browser-act-lite

Stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.

Based on Camoufox stealth browser — each invocation launches a fresh browser instance with a unique fingerprint, extracts the fully rendered DOM (including iframes), and outputs clean HTML or Markdown.

Features

  • Anti-detection — Camoufox fingerprint rotation, headless stealth mode
  • Iframe extraction — Recursively captures iframe contents and merges them into the output
  • DOM cleanup — Strips hidden elements, inline styles, scripts, and SVG noise
  • Markdown conversion — DOM → Markdown with absolute URL rewriting and heading-based chunking
  • Proxy support — HTTP/SOCKS proxy with optional authentication
  • Parallel-safe — Stateless design, safe to run multiple instances concurrently

Requirements

  • Python >= 3.10
  • macOS / Linux / Windows

Installation

pip install -e .

On first run the stealth browser engine will be downloaded automatically.

Usage

Extract as HTML

browser-act-lite stealth-extract https://example.com -f html

Extract as Markdown

browser-act-lite stealth-extract https://example.com -f markdown

Save to file

browser-act-lite stealth-extract https://example.com -f markdown -o

Output is saved to outputs/<hostname>_<timestamp>.md.

With proxy

browser-act-lite stealth-extract https://example.com -f html -p http://user:pass@host:port

Options

Usage: browser-act-lite stealth-extract [OPTIONS] URL

Options:
  -f, --format [html|markdown]  Output format (required)
  -p, --proxy TEXT              Proxy URL, e.g. http://user:pass@host:port
  -t, --timeout INTEGER         Page load timeout in seconds [default: 30]
  -o, --output                  Save to outputs/ directory instead of stdout
  --help                        Show this message and exit

Project Structure

src/browser_act_lite/
├── cli.py              # Click CLI entry point
├── extractor.py        # Core extraction: launch browser → navigate → extract
├── engine.py           # Stealth browser engine config & monkey-patches
└── pipeline/
    ├── __init__.py     # html_to_markdown / markdown_split
    ├── dom_filter.py   # DOM evaluation & iframe extraction (Playwright)
    ├── converter.py    # Markdownify customisation
    ├── url.py          # URL absolutification
    └── js/
        └── dom_html.js # In-page JS for DOM serialisation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

browser_act_cli_lite-0.1.0-cp312-cp312-win_amd64.whl (188.9 kB view details)

Uploaded CPython 3.12Windows x86-64

browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_x86_64.whl (218.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_aarch64.whl (203.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

browser_act_cli_lite-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (168.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file browser_act_cli_lite-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3df081079e74e0921fc88d9bfdeb32da7b15b865ef1f2b5fdf4c984cb4017c61
MD5 568c91bd1e6db082381f189857e9f190
BLAKE2b-256 986035791da608d1c93d09905ef04d2a25f0201048eff7de393342eded5c817f

See more details on using hashes here.

File details

Details for the file browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a8b9cdb900f2510a0002418b7b035aff7e4380e2276329145f685a32365566d8
MD5 fae3c395471493591e5b3b5449d1e57d
BLAKE2b-256 681c45dac6a416976421f44479d5e8ed975cb43aa46d74a62da5d0b064f880d9

See more details on using hashes here.

File details

Details for the file browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.1.0-cp312-cp312-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 04c937e99d8dc193357b27a532963295bb3617cbfdc5b09d6161bcc04f63f085
MD5 bdb52e77541c38707603338279860977
BLAKE2b-256 df2390c3a90000e2cf2dbd6c7ab71c8959c3febcfb768991a3d3b489955a7a3b

See more details on using hashes here.

File details

Details for the file browser_act_cli_lite-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 86a9fc8f495a279abe5347ff0f43231129d4fd6a14ddac21ad33873727c4f598
MD5 eb0be9692885c86b99cb1b6150c8f18c
BLAKE2b-256 ce5a2667322c86c6b2d5df4dad9d1c7edcde729914e4bb847251f30144576c61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page