Skip to main content

Capture all resources from a webpage like browser DevTools Sources tab

Project description

pagesource

A Python CLI tool that captures all resources loaded by a webpage (like browser DevTools Sources tab) and saves them with the original directory structure.

Installation

pip install pagesource

# IMPORTANT: Install Playwright browser after package installation
playwright install chromium

Usage

Basic Usage

# Capture all resources from a webpage
pagesource https://example.com

This will save all resources to ./pagesource_output/ with the directory structure preserved.

Options

# Specify custom output directory
pagesource https://example.com -o ./my-output

# Wait extra time for JavaScript content (useful for SPAs)
pagesource https://example.com --wait 5

# Include external resources (CDN assets, third-party scripts)
pagesource https://example.com --include-external

# Combine options
pagesource https://example.com -o ./output --wait 3 --include-external

CLI Reference

pagesource <url> [OPTIONS]

Arguments:
  url                     URL of the webpage to capture resources from

Options:
  -o, --output PATH       Output directory (default: ./pagesource_output)
  -w, --wait INTEGER      Additional seconds to wait after page load
  -e, --include-external  Include external resources (CDN, third-party)
  -v, --version           Show version and exit
  --help                  Show help message

Output Structure

Resources are saved preserving the URL path structure:

pagesource_output/
  example.com/
    index.html
    assets/
      css/
        style.css
      js/
        app.js
    images/
      logo.png

If --include-external is used, external resources are saved in their own host directories:

pagesource_output/
  example.com/
    ...
  cdn.example.com/
    libs/
      library.js
  fonts.googleapis.com/
    css/
      font.css

Features

  • Captures all network resources loaded by the page (HTML, CSS, JS, images, fonts, etc.)
  • Preserves original directory structure
  • Handles query strings (strips them from filenames)
  • Infers file extensions from Content-Type when missing
  • Handles duplicate filenames
  • Sanitizes paths for filesystem safety
  • Optional wait time for JavaScript-heavy pages

Requirements

  • Python 3.10+
  • Playwright (with Chromium browser)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pagesource-0.1.2.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pagesource-0.1.2-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file pagesource-0.1.2.tar.gz.

File metadata

  • Download URL: pagesource-0.1.2.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for pagesource-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d355b323ec959b7bcea65bfc8cca4409442823a81c66c81f0a6d337bb3f9fcdb
MD5 c7b0fbf93692e5226835cb8788de97c8
BLAKE2b-256 36b7f433ab420b96a96f18013e6db7580716c69bb6ddd980284898a494f5d41d

See more details on using hashes here.

File details

Details for the file pagesource-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pagesource-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for pagesource-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31cda7a4e5e645c2bdeb2599d6c9c4d898909a75b7d899544fb6991e41e616a5
MD5 9a005ba9423cd217ca9e2cbdbe6483a7
BLAKE2b-256 dca0e6ee41cae56f6ebfee17e2a17022cf9505dedc0f1bc8d133e1d53b025124

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page