Skip to main content

A simple tool to scoop up HTML and screenshots from web pages using a browserless instance secured by CF Zero Access

Project description

Page Scoop

A command-line tool for capturing HTML content and screenshots from web pages using browserless.

Features

  • Capture HTML content from any URL
  • Take screenshots with customizable options:
    • Multiple formats (PNG, JPEG, WEBP)
    • Adjustable viewport size
    • Full-page capture
    • Image quality control
  • Configurable through:
    • Command-line arguments
    • Environment variables
    • Configuration file

Installation

uv tool install page-scoop

Requirements

  • Python 3.10 or higher
  • A browserless instance (self-hosted or cloud service)

Configuration

You can configure page-scoop using one of these methods:

  1. Command-line arguments
  2. Environment variables
  3. Configuration file

Configuration File

Create a configuration file (~/.config/page-scoop/config.json) with the following structure:

{
    "browserless_url": "your-browserless-url",
    "token": "your-auth-token",
    "cf_client_id": "your-cloudflare-client-id",
    "cf_client_secret": "your-cloudflare-client-secret"
}

Usage

Capture HTML

page-scoop html https://example.com

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --output: Save HTML to file instead of stdout
  • --timeout: HTTP request timeout in seconds
  • --wait-for: Wait for selector to appear before capture
  • --wait-time: Wait time in milliseconds before capture

Take Screenshot

page-scoop screenshot https://example.com --output screenshot.png

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --output: Path to save screenshot file
  • --timeout: HTTP request timeout in seconds
  • --width: Viewport width
  • --height: Viewport height
  • --full-page: Capture full page height
  • --format: Screenshot format (png, jpeg, webp)
  • --quality: Image quality (for JPEG/WEBP)
  • --wait-for: Wait for selector to appear before capture
  • --wait-time: Wait time in milliseconds before capture
  • --overwrite: Overwrite existing file if it exists

Create/Update Configuration

page-scoop config --browserless-url your-url --token your-token

Options:

  • --browserless-url: Browserless instance URL
  • --token: Auth token for browserless
  • --cf-client-id: Cloudflare Access client ID
  • --cf-client-secret: Cloudflare Access client secret
  • --update: Update existing config file

Environment Variables

  • BROWSERLESS_URL: Browserless instance URL
  • BROWSERLESS_TOKEN: Auth token for browserless
  • CF_CLIENT_ID: Cloudflare Access client ID
  • CF_CLIENT_SECRET: Cloudflare Access client secret

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

page_scoop-0.1.2.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

page_scoop-0.1.2-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file page_scoop-0.1.2.tar.gz.

File metadata

  • Download URL: page_scoop-0.1.2.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for page_scoop-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f2fd4549654143f75240031d84ab0cfd8c84d102dc941f60e21fd2d640fc28bb
MD5 80e0135541fc5677a32ce471f2eb9fe1
BLAKE2b-256 7d2cca2d7acd6063cb62a8c781bca871f70aaf997caa4c6a34572aebc52e0f79

See more details on using hashes here.

File details

Details for the file page_scoop-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for page_scoop-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7f10d62e3571c1cede87652815752e0314c600d0b6ec1b250ef14c5844cadd98
MD5 a9c0b00f19ade129eafc923048b4a744
BLAKE2b-256 36045bceb8e5144e80fd71fd71844053c53656f7e32bee2dad76a408deef3da7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page