Capture all resources from a webpage like browser DevTools Sources tab
Project description
pagesource
A Python CLI tool that captures all resources loaded by a webpage (like browser DevTools Sources tab) and saves them with the original directory structure.
Installation
pip install pagesource
# IMPORTANT: Install Playwright browser after package installation
playwright install chromium
Usage
Basic Usage
# Capture all resources from a webpage
pagesource https://example.com
This will save all resources to ./pagesource_output/ with the directory structure preserved.
Options
# Specify custom output directory
pagesource https://example.com -o ./my-output
# Wait extra time for JavaScript content (useful for SPAs)
pagesource https://example.com --wait 5
# Include external resources (CDN assets, third-party scripts)
pagesource https://example.com --include-external
# Combine options
pagesource https://example.com -o ./output --wait 3 --include-external
CLI Reference
pagesource <url> [OPTIONS]
Arguments:
url URL of the webpage to capture resources from
Options:
-o, --output PATH Output directory (default: ./pagesource_output)
-w, --wait INTEGER Additional seconds to wait after page load
-e, --include-external Include external resources (CDN, third-party)
-v, --version Show version and exit
--help Show help message
Output Structure
Resources are saved preserving the URL path structure:
pagesource_output/
example.com/
index.html
assets/
css/
style.css
js/
app.js
images/
logo.png
If --include-external is used, external resources are saved in their own host directories:
pagesource_output/
example.com/
...
cdn.example.com/
libs/
library.js
fonts.googleapis.com/
css/
font.css
Features
- Captures all network resources loaded by the page (HTML, CSS, JS, images, fonts, etc.)
- Preserves original directory structure
- Handles query strings (strips them from filenames)
- Infers file extensions from Content-Type when missing
- Handles duplicate filenames
- Sanitizes paths for filesystem safety
- Optional wait time for JavaScript-heavy pages
Requirements
- Python 3.10+
- Playwright (with Chromium browser)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pagesource-0.1.2.tar.gz.
File metadata
- Download URL: pagesource-0.1.2.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d355b323ec959b7bcea65bfc8cca4409442823a81c66c81f0a6d337bb3f9fcdb
|
|
| MD5 |
c7b0fbf93692e5226835cb8788de97c8
|
|
| BLAKE2b-256 |
36b7f433ab420b96a96f18013e6db7580716c69bb6ddd980284898a494f5d41d
|
File details
Details for the file pagesource-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pagesource-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31cda7a4e5e645c2bdeb2599d6c9c4d898909a75b7d899544fb6991e41e616a5
|
|
| MD5 |
9a005ba9423cd217ca9e2cbdbe6483a7
|
|
| BLAKE2b-256 |
dca0e6ee41cae56f6ebfee17e2a17022cf9505dedc0f1bc8d133e1d53b025124
|