Skip to main content

A simple tool to scoop up HTML and screenshots from web pages using a browserless instance secured by CF Zero Access

Project description

Page Scoop

A command-line tool for capturing HTML content and screenshots from web pages using browserless.

Features

  • Capture HTML content from any URL
  • Take screenshots with customizable options:
    • Multiple formats (PNG, JPEG, WEBP)
    • Adjustable viewport size
    • Full-page capture
    • Image quality control
  • Configurable through:
    • Command-line arguments
    • Environment variables
    • Configuration file

Installation

uv tool install page-scoop

Requirements

  • Python 3.10 or higher
  • A browserless instance (self-hosted or cloud service)

Configuration

You can configure page-scoop using one of these methods:

  1. Command-line arguments
  2. Environment variables
  3. Configuration file

Configuration File

Create a configuration file (~/.config/page-scoop/config.json) with the following structure:

{
    "browserless_url": "your-browserless-url",
    "token": "your-auth-token",
    "cf_client_id": "your-cloudflare-client-id",
    "cf_client_secret": "your-cloudflare-client-secret"
}

Usage

Capture HTML

page-scoop html https://example.com

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --output: Save HTML to file instead of stdout
  • --timeout: HTTP request timeout in seconds
  • --wait-for: Wait for selector to appear before capture
  • --wait-time: Wait time in milliseconds before capture

Take Screenshot

page-scoop screenshot https://example.com --output screenshot.png

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --output: Path to save screenshot file
  • --timeout: HTTP request timeout in seconds
  • --width: Viewport width
  • --height: Viewport height
  • --full-page: Capture full page height
  • --format: Screenshot format (png, jpeg, webp)
  • --quality: Image quality (for JPEG/WEBP)
  • --wait-for: Wait for selector to appear before capture
  • --wait-time: Wait time in milliseconds before capture
  • --overwrite: Overwrite existing file if it exists

Create/Update Configuration

page-scoop config --browserless-url your-url --token your-token

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --cf-client-id: Cloudflare Access client ID
  • --cf-client-secret: Cloudflare Access client secret
  • --update: Update existing config file

Environment Variables

  • BROWSERLESS_URL: Browserless instance URL
  • BROWSERLESS_TOKEN: Auth token for browserless
  • CF_CLIENT_ID: Cloudflare Access client ID
  • CF_CLIENT_SECRET: Cloudflare Access client secret

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

page_scoop-0.1.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

page_scoop-0.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file page_scoop-0.1.1.tar.gz.

File metadata

  • Download URL: page_scoop-0.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.0

File hashes

Hashes for page_scoop-0.1.1.tar.gz
Algorithm Hash digest
SHA256 908780ad631dadb2d649e6030afca226460fd96407063ac9f5927885b6c7eac9
MD5 9f29a54eddd92e129e5d98a60f039d67
BLAKE2b-256 f1d99ba0f64ca3b497d3a1810e7f3218bdb2e56321ccdeb0734b418c20d4f0da

See more details on using hashes here.

File details

Details for the file page_scoop-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for page_scoop-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3e1132ee5ce2998918f6755e6ad8b0119246ef44653ff2082453f86d9a965d1
MD5 5e60c5bdd6fea8a09320d9832388c51c
BLAKE2b-256 58e54d9820ab5e781157de1ce6a1b7a2599e2d1ee5e85520837c721615f9c762

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page