An ultra-lightweight web screenshot tool written in Python
Project description
WebCap is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by BBOT.
Installation
pipx install webcap
Features
WebCap's most unique feature is its ability to capture not only the fully-rendered DOM, but also every snippet of parsed Javascript (regardless of inline or external), and the full content of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:
Screenshots
Fully-rendered DOM
Javascript Capture
Requests + Responses
OCR
Full feature list
- Blazing fast screenshots
- Fullscreen capture (entire scrollable page)
- JSON output
- Full DOM extraction
- Javascript extraction (inline + external)
- Javascript extraction (environment dump)
- Full network logs (incl. request/response bodies)
- Title
- Status code
- Fuzzy (perception) hashing
- Technology detection
- OCR text extraction
- Web interface
Example Commands
# Capture screenshots of all URLs in urls.txt
webcap -u urls.txt -o ./my_screenshots
# Output to JSON, and include the fully-rendered DOM
webcap -u urls.txt --json --dom | jq
# Capture requests and responses
webcap -u urls.txt --json --requests --responses | jq
# Capture javascript
webcap -u urls.txt --json --javascript | jq
# Extract text from screenshots
webcap -u urls.txt --json --ocr | jq
Use as a Python library
import base64
from webcap import Browser
async def main():
# create a browser instance
browser = Browser()
# start the browser
await browser.start()
# take a screenshot
webscreenshot = await browser.screenshot("http://example.com")
# save the screenshot to a file
with open("screenshot.png", "wb") as f:
f.write(webscreenshot.blob)
# stop the browser
await browser.stop()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
CLI Usage (--help)
usage: webcap [-h] [-u URLS [URLS ...]] [-o OUTPUT] [-j] [-r RESOLUTION] [-f] [--no-screenshots] [-t THREADS] [--delay DELAY] [-U USER_AGENT] [-H HEADERS [HEADERS ...]] [-p PROXY] [-b]
[-d] [-rs] [-rq] [-J] [--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]] [--ocr] [-s] [--debug] [--no-color] [-c CHROME]
options:
-h, --help show this help message and exit
-u URLS [URLS ...], --urls URLS [URLS ...]
URL(s) to capture, or file(s) containing URLs
-o OUTPUT, --output OUTPUT
Output directory
-j, --json Output JSON
Screenshots:
-r RESOLUTION, --resolution RESOLUTION
Resolution to capture
-f, --full-page Capture the full page (larger resolution images)
--no-screenshots Don't take screenshots
Performance:
-t THREADS, --threads THREADS
Number of threads to use
--delay DELAY Delay before capturing (default: 3.0 seconds)
HTTP:
-U USER_AGENT, --user-agent USER_AGENT
User agent to use
-H HEADERS [HEADERS ...], --headers HEADERS [HEADERS ...]
Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)
-p PROXY, --proxy PROXY
HTTP proxy to use
JSON Output:
-b, --base64 Output each screenshot as base64
-d, --dom Capture the fully-rendered DOM
-rs, --responses Capture the full body of each HTTP response (including API calls etc.)
-rq, --requests Capture the full body of each HTTP request (including API calls etc.)
-J, --javascript Capture every snippet of Javascript (inline + external)
--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]
Ignore certain types of network requests (default: Image, Media, Font, Stylesheet)
--ocr Extract text from screenshots
Misc:
-s, --silent Silent mode
--debug Enable debugging
--no-color Disable color output
-c CHROME, --chrome CHROME
Path to Chrome executable
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webcap-0.1.22.tar.gz.
File metadata
- Download URL: webcap-0.1.22.tar.gz
- Upload date:
- Size: 35.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0406901baea01bad1d018d50b5e1c91953b2fd3f287d2265a319b8b880d9b037
|
|
| MD5 |
9c9e167217c31c675c4f10c49ae1aa0b
|
|
| BLAKE2b-256 |
d6744aa40e980bfb4b13731a8effe66a10aa089b8fba9c52eea134a6f7acaa37
|
File details
Details for the file webcap-0.1.22-py3-none-any.whl.
File metadata
- Download URL: webcap-0.1.22-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4465f1a83204c3540a60c9622935b51c15ea1fd4baba33f9af4a9a399eb686cf
|
|
| MD5 |
e5959aa1b12c50409cb0ba41943a0e3a
|
|
| BLAKE2b-256 |
8f5658e33f09d087dc84dd5e979fe5ce3910dc5fa748b9907169e4d9899b8e0e
|