Skip to main content

Playwright helper to capture manual browsing sessions and save page HTML snapshots.

Project description

Webscrapehelper

A lightweight Playwright helper that opens a headed browser and records the HTML source (plus metadata) for each page you visit until the browser window is closed.

Features

  • Launches Chromium (or another Playwright browser) in headed mode so you can browse manually.
  • Collects page HTML, title, URL, HTTP status, and file size whenever the main frame navigates.
  • Saves snapshots to disk (session_log.jsonl + individual .html files) and keeps an in-memory list for immediate post-run use.
  • Returns a SessionResult with helpers such as .html_list and .html_snapshots.
  • Optional callback hook so your code can react to each captured event in real time.
  • Lets you write HTML files to a custom directory via html_output_dir.

Requirements

  • Python 3.9+
  • Playwright (pip install playwright and run playwright install once to fetch browser binaries).

Local installation

pip install -e .
playwright install

Quick start

import asyncio
from webscrapehelper import SessionRecorder

async def main():
    recorder = SessionRecorder(output_dir="session_data", headless=False)
    result = await recorder.run()
    print(f"Captured {len(result.html_list)} snapshots")
    if result.html_list:
        first_html = result.html_list[0]
        print(first_html[:200])  # preview

if __name__ == "__main__":
    asyncio.run(main())

Browse as usual; the script exits after you close the Playwright browser. Captured snapshots live in session_data/, and the returned SessionResult keeps the HTML in memory. To persist the HTML elsewhere, pass html_output_dir="C:/some/folder" when creating the recorder.

See examples/record_session.py for a slightly richer example that logs each event.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscrapehelper-0.1.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webscrapehelper-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file webscrapehelper-0.1.0.tar.gz.

File metadata

  • Download URL: webscrapehelper-0.1.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for webscrapehelper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 befe89e9c237106191c682104b5b69b73d65856ef595a0974a6b7168a1d8efec
MD5 3fa60112795c15c2f903dc6738f29d01
BLAKE2b-256 bed84f0fff585cef0a554840d68db3752caa28816d43057718a41277d2c4ee80

See more details on using hashes here.

File details

Details for the file webscrapehelper-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for webscrapehelper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8db49bb8b1a5b722cda464c71737870ced9cfa026e03639cb87ee89440bbb59b
MD5 c3fa8df7bb29729a3c66685421325c37
BLAKE2b-256 8652e20a69dcce38590b35064eca75ca0671f7ca33da97d0678afffd18b38a95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page