A stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.
Project description
browser-act-lite
Stateless, parallel-safe, anti-detection CLI tool for extracting rendered web page content.
Based on Camoufox stealth browser — each invocation launches a fresh browser instance with a unique fingerprint, extracts the fully rendered DOM (including iframes), and outputs clean HTML or Markdown.
Features
- Anti-detection — Camoufox fingerprint rotation, headless stealth mode
- Iframe extraction — Recursively captures iframe contents and merges them into the output
- DOM cleanup — Strips hidden elements, inline styles, scripts, and SVG noise
- Markdown conversion — DOM → Markdown with absolute URL rewriting and heading-based chunking
- Proxy support — HTTP/SOCKS proxy with optional authentication
- Parallel-safe — Stateless design, safe to run multiple instances concurrently
Requirements
- Python 3.12
- macOS / Linux / Windows
Installation
pip install browser-act-cli-lite
On first run the stealth browser engine will be downloaded automatically.
Usage
Extract as HTML
browser-act-lite stealth-extract https://example.com -f html
Extract as Markdown
browser-act-lite stealth-extract https://example.com -f markdown
Save to file
browser-act-lite stealth-extract https://example.com -f markdown -o
Output is saved to outputs/<hostname>_<timestamp>.md.
With proxy
browser-act-lite stealth-extract https://example.com -f html -p http://user:pass@host:port
Options
Usage: browser-act-lite stealth-extract [OPTIONS] URL
Options:
-f, --format [html|markdown] Output format (required)
-p, --proxy TEXT Proxy URL, e.g. http://user:pass@host:port
-t, --timeout INTEGER Page load timeout in seconds (1-300) [default: 30]
-o, --output Save to outputs/ directory instead of stdout
--help Show this message and exit
Project Structure
src/browser_act_lite/
├── cli.py # Click CLI entry point
├── extractor.py # Core extraction: launch browser → navigate → extract
├── engine.py # Stealth browser engine config & monkey-patches
└── pipeline/
├── __init__.py # html_to_markdown / markdown_split
├── dom_filter.py # DOM evaluation & iframe extraction (Playwright)
├── converter.py # Markdownify customisation
├── url.py # URL absolutification
└── js/
└── dom_html.js # In-page JS for DOM serialisation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_x86_64.whl.
File metadata
- Download URL: browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_x86_64.whl
- Upload date:
- Size: 218.0 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2200de41b05dab40c28b91cf89c2d6b84055d2368dc9364d29a912e52f99d8f9
|
|
| MD5 |
e8c2ce873397a1219b1cf2ff9ebc6cc6
|
|
| BLAKE2b-256 |
bba300db425ebf839afb01ee1d85f60cbe7ee61fff1bbe2ac5c22418ac612c55
|
File details
Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_aarch64.whl.
File metadata
- Download URL: browser_act_cli_lite-0.2.0-cp312-cp312-manylinux_2_17_aarch64.whl
- Upload date:
- Size: 203.2 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc4e2926c45273d88023a63c437d42c51e8d9f7177e6a3bc5ddf8a124330848
|
|
| MD5 |
7e59f571336ca50b0114958c74412abe
|
|
| BLAKE2b-256 |
6af74710a288182e95e8663ad963f21bdc4c2b89e2305740fbb97d4e0070fe0b
|
File details
Details for the file browser_act_cli_lite-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: browser_act_cli_lite-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 168.1 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dcbc909622afb5c12ea9cbd95b429b7e3a3c87f1f5d5592c4551e632fc73030
|
|
| MD5 |
1af52c744986bc8fe434a7832e0dd40d
|
|
| BLAKE2b-256 |
2c2dda9421910aff5d1f8b5e09488d77925bc87feb99ae24dc3b35f644f69c5b
|