Skip to main content

A stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.

Project description

browser-act-lite

Stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.

Based on Camoufox stealth browser — each invocation launches a fresh browser instance with a unique fingerprint, extracts the fully rendered DOM (including iframes), and outputs clean HTML or Markdown.

Features

  • Anti-detection — Camoufox fingerprint rotation, headless stealth mode
  • Iframe extraction — Recursively captures iframe contents and merges them into the output
  • DOM cleanup — Strips hidden elements, inline styles, scripts, and SVG noise
  • Markdown conversion — DOM → Markdown with absolute URL rewriting and heading-based chunking
  • Proxy support — HTTP/SOCKS proxy with optional authentication
  • Parallel-safe — Stateless design, safe to run multiple instances concurrently

Requirements

  • Python 3.12
  • macOS / Linux / Windows

Installation

pip install browser-act-cli-lite

On first run the stealth browser engine will be downloaded automatically.

Usage

Extract as HTML

browser-act-lite stealth-extract https://example.com -f html

Extract as Markdown

browser-act-lite stealth-extract https://example.com -f markdown

Save to file

browser-act-lite stealth-extract https://example.com -f markdown -o

Output is saved to outputs/<hostname>_<timestamp>.md.

With proxy

browser-act-lite stealth-extract https://example.com -f html -p http://user:pass@host:port

Options

Usage: browser-act-lite stealth-extract [OPTIONS] URL

Options:
  -f, --format [html|markdown]  Output format (required)
  -p, --proxy TEXT              Proxy URL, e.g. http://user:pass@host:port
  -t, --timeout INTEGER         Page load timeout in seconds (1-300) [default: 30]
  -o, --output                  Save to outputs/ directory instead of stdout
  --help                        Show this message and exit

Project Structure

src/browser_act_lite/
├── cli.py              # Click CLI entry point
├── extractor.py        # Core extraction: launch browser → navigate → extract
├── engine.py           # Stealth browser engine config & monkey-patches
└── pipeline/
    ├── __init__.py     # html_to_markdown / markdown_split
    ├── dom_filter.py   # DOM evaluation & iframe extraction (Playwright)
    ├── converter.py    # Markdownify customisation
    ├── url.py          # URL absolutification
    └── js/
        └── dom_html.js # In-page JS for DOM serialisation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_x86_64.whl (218.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_aarch64.whl (203.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

browser_act_cli_lite-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (168.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2200de41b05dab40c28b91cf89c2d6b84055d2368dc9364d29a912e52f99d8f9
MD5 e8c2ce873397a1219b1cf2ff9ebc6cc6
BLAKE2b-256 bba300db425ebf839afb01ee1d85f60cbe7ee61fff1bbe2ac5c22418ac612c55

See more details on using hashes here.

File details

Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 4fc4e2926c45273d88023a63c437d42c51e8d9f7177e6a3bc5ddf8a124330848
MD5 7e59f571336ca50b0114958c74412abe
BLAKE2b-256 6af74710a288182e95e8663ad963f21bdc4c2b89e2305740fbb97d4e0070fe0b

See more details on using hashes here.

File details

Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for browser_act_cli_lite-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5dcbc909622afb5c12ea9cbd95b429b7e3a3c87f1f5d5592c4551e632fc73030
MD5 1af52c744986bc8fe434a7832e0dd40d
BLAKE2b-256 2c2dda9421910aff5d1f8b5e09488d77925bc87feb99ae24dc3b35f644f69c5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page