A simple tool to scoop up HTML and screenshots from web pages using a browserless instance secured by CF Zero Access
Project description
Page Scoop
A command-line tool for capturing HTML content and screenshots from web pages using browserless.
Features
- Capture HTML content from any URL
- Take screenshots with customizable options:
- Multiple formats (PNG, JPEG, WEBP)
- Adjustable viewport size
- Full-page capture
- Image quality control
- Configurable through:
- Command-line arguments
- Environment variables
- Configuration file
Installation
uv tool install page-scoop
Requirements
- Python 3.10 or higher
- A browserless instance (self-hosted or cloud service)
Configuration
You can configure page-scoop using one of these methods:
- Command-line arguments
- Environment variables
- Configuration file
Configuration File
Create a configuration file (~/.config/page-scoop/config.json) with the following structure:
{
"browserless_url": "your-browserless-url",
"token": "your-auth-token",
"cf_client_id": "your-cloudflare-client-id",
"cf_client_secret": "your-cloudflare-client-secret"
}
Usage
Capture HTML
page-scoop html https://example.com
Options:
--browserless-url: Browserless instance URL--token: Auth token for browserless--output: Save HTML to file instead of stdout--timeout: HTTP request timeout in seconds--wait-for: Wait for selector to appear before capture--wait-time: Wait time in milliseconds before capture
Take Screenshot
page-scoop screenshot https://example.com --output screenshot.png
Options:
--browserless-url: Browserless instance URL--token: Auth token for browserless--output: Path to save screenshot file--timeout: HTTP request timeout in seconds--width: Viewport width--height: Viewport height--full-page: Capture full page height--format: Screenshot format (png, jpeg, webp)--quality: Image quality (for JPEG/WEBP)--wait-for: Wait for selector to appear before capture--wait-time: Wait time in milliseconds before capture--overwrite: Overwrite existing file if it exists
Create/Update Configuration
page-scoop config --browserless-url your-url --token your-token
Options:
--browserless-url: Browserless instance URL--token: Auth token for browserless--cf-client-id: Cloudflare Access client ID--cf-client-secret: Cloudflare Access client secret--update: Update existing config file
Environment Variables
BROWSERLESS_URL: Browserless instance URLBROWSERLESS_TOKEN: Auth token for browserlessCF_CLIENT_ID: Cloudflare Access client IDCF_CLIENT_SECRET: Cloudflare Access client secret
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file page_scoop-0.1.2.tar.gz.
File metadata
- Download URL: page_scoop-0.1.2.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2fd4549654143f75240031d84ab0cfd8c84d102dc941f60e21fd2d640fc28bb
|
|
| MD5 |
80e0135541fc5677a32ce471f2eb9fe1
|
|
| BLAKE2b-256 |
7d2cca2d7acd6063cb62a8c781bca871f70aaf997caa4c6a34572aebc52e0f79
|
File details
Details for the file page_scoop-0.1.2-py3-none-any.whl.
File metadata
- Download URL: page_scoop-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f10d62e3571c1cede87652815752e0314c600d0b6ec1b250ef14c5844cadd98
|
|
| MD5 |
a9c0b00f19ade129eafc923048b4a744
|
|
| BLAKE2b-256 |
36045bceb8e5144e80fd71fd71844053c53656f7e32bee2dad76a408deef3da7
|