Skip to main content

An ultra-lightweight web screenshot tool written in Python

Project description

Python Version License Ruff Tests Codecov Discord

WebCap is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by BBOT.

Installation

pipx install webcap

Features

WebCap's most unique feature is its ability to capture not only the fully-rendered DOM, but also every snippet of parsed Javascript (regardless of inline or external), and the full content of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:

Screenshots

image

Fully-rendered DOM

image

Javascript Capture

image

Requests + Responses

image

OCR

image

Full feature list

  • Blazing fast screenshots
  • Fullscreen capture (entire scrollable page)
  • JSON output
  • Full DOM extraction
  • Javascript extraction (inline + external)
  • Javascript extraction (environment dump)
  • Full network logs (incl. request/response bodies)
  • Title
  • Status code
  • Fuzzy (perception) hashing
  • Technology detection
  • OCR text extraction
  • Web interface

Example Commands

Scanning

# Capture screenshots of all URLs in urls.txt
webcap scan urls.txt -o ./my_screenshots

# Output to JSON, and include the fully-rendered DOM
webcap scan urls.txt --json --dom | jq

# Capture requests and responses
webcap scan urls.txt --json --requests --responses | jq

# Capture javascript
webcap scan urls.txt --json --javascript | jq

# Extract text from screenshots
webcap scan urls.txt --json --ocr | jq

Server

# Start the server
webcap server

# Browse to http://localhost:8000

Webcap as a Python library

import base64
from webcap import Browser

async def main():
    # create a browser instance
    browser = Browser()
    # start the browser
    await browser.start()
    # take a screenshot
    webscreenshot = await browser.screenshot("http://example.com")
    # save the screenshot to a file
    with open("screenshot.png", "wb") as f:
        f.write(webscreenshot.blob)
    # stop the browser
    await browser.stop()

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

CLI Usage (--help)

 Usage: webcap scan [OPTIONS] URLS                                                                                                                                                            
                                                                                                                                                                                              
 Screenshot URLs                                                                                                                                                                              
                                                                                                                                                                                              
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    urls      TEXT  URL(s) to capture, or file(s) containing URLs [default: None] [required]                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --json    -j                  Output JSON                                                                                                                                                  │
│ --chrome  -c      TEXT        Path to Chrome executable [default: None]                                                                                                                    │
│ --output  -o      OUTPUT_DIR  Output directory [default: /home/bls/Downloads/code/webcap/screenshots]                                                                                      │
│ --help                        Show this message and exit.                                                                                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Screenshots ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resolution      -r      RESOLUTION  Resolution to capture [default: 1440x900]                                                                                                            │
│ --full-page       -f                  Capture the full page (larger resolution images)                                                                                                     │
│ --no-screenshots                      Only visit the sites; don't capture screenshots (useful with -j/--json)                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Performance ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --threads  -t      INTEGER  Number of threads to use [default: 15]                                                                                                                         │
│ --delay            SECONDS  Delay before capturing (default: 3.0 seconds) [default: 3.0]                                                                                                   │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ HTTP ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --user-agent  -U      TEXT  User agent to use [default: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36]                   │
│ --headers     -H      TEXT  Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)                                                                         │
│ --proxy       -p      TEXT  HTTP proxy to use [default: None]                                                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ JSON (Only apply when -j/--json is used) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --base64        -b                     Output each screenshot as base64                                                                                                                    │
│ --dom           -d                     Capture the fully-rendered DOM                                                                                                                      │
│ --responses     -rs                    Capture the full body of each HTTP response (including API calls etc.)                                                                              │
│ --requests      -rq                    Capture the full body of each HTTP request (including API calls etc.)                                                                               │
│ --javascript    -J                     Capture every snippet of Javascript (inline + external)                                                                                             │
│ --ignore-types                   TEXT  Capture the full body of each HTTP response (including API calls etc.) [default: Image, Media, Font, Stylesheet]                                    │
│ --ocr                --no-ocr          Extract text from screenshots [default: no-ocr]                                                                                                     │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webcap-0.1.31.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webcap-0.1.31-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file webcap-0.1.31.tar.gz.

File metadata

  • Download URL: webcap-0.1.31.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.21

File hashes

Hashes for webcap-0.1.31.tar.gz
Algorithm Hash digest
SHA256 54b24838398e4470dd1d9c2d1942b936d131eb9776577890a36cace487ac67fd
MD5 14bf3f361309bf14d90d48dbcd77618d
BLAKE2b-256 3c5fec14f14422e95e287a66b0fc74c6638fa867a6e23d4734aef253f4657a5a

See more details on using hashes here.

File details

Details for the file webcap-0.1.31-py3-none-any.whl.

File metadata

  • Download URL: webcap-0.1.31-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.21

File hashes

Hashes for webcap-0.1.31-py3-none-any.whl
Algorithm Hash digest
SHA256 b257e39d3a2066c2ad3da9956fecf52648172181cf4dbd2147f3e8ca57cf5428
MD5 e322e6ae2a3184c770bc074c1a5fb8ca
BLAKE2b-256 dceea7545473af191fc4348fd436d89da941731d120142a6671eca2d67d48c2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page