Skip to main content

A simple and light Playwright-based scraper

Project description

pw-simple-scraper

A lightweight, easy-to-use web scraper built with Python and Playwright

PyPI Python License: MIT

한국어 보러가기


Overview

  • pw-simple-scraper scrapes desired elements from a web page.
  • Provide a URL + CSS selector, and it will return the matching elements as a list of strings.
  • The result is wrapped in a ScrapeResult object. You can access the extracted values via .result (List[str]).


Installation

# 1. Install Playwright
pip install playwright

# 2-1. Install Chromium (macOS / Windows)
python -m playwright install chromium

# 2-2. Install Chromium (Linux)
python -m playwright install --with-deps chromium

# 3. Install pw-simple-scraper
pip install pw-simple-scraper
  • Since this scraper is built on top of Playwright, both the Playwright library and the Chromium browser are required.


Usage

from pw-simple_scraper import scrape_context, scrape_href

# Extract text
res = scrape_context("https://example.com", "h3")
print(res.result)   # ['h3-type-content1', 'h3-type-content2', ...]
print(res.count)    # n (number of scraped elements)

# Extract links
links = scrape_href("https://example.com", "a")
print(links.result) # ['https://www.iana.org/domains/example', ...]

# Apply timeout option (default: 30 seconds)
scrape_context("https://example.com", "something", timeout=10) # 10 seconds

Result is a ScrapeResult object

@dataclass
class ScrapeResult:
    url: str
    selector: str
    result: List[str]       # Extracted values
    count: int              # Number of values
    fetched_at: datetime    # Execution timestamp (UTC)


FAQ

  • Installed but browser fails to launch

    • You must install the browser with python -m playwright install chromium (Be mindful of the Linux --with-deps option.)
  • RuntimeError: All strategies failed

    • This may happen if the selector doesn’t exist or the page loads slowly. Double-check your selector and try increasing the timeout.
  • Scraping inside iframe

    • Planned for future support.
  • xpath support

    • Planned for future support.
  • robot.txt support

    • Will be added as a configurable option in the future.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pw_simple_scraper-0.1.1.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pw_simple_scraper-0.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file pw_simple_scraper-0.1.1.tar.gz.

File metadata

  • Download URL: pw_simple_scraper-0.1.1.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pw_simple_scraper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 037620936dd9576eedb72125db6ca5d4a8be6f8232771f09f7e65fcff17a0c51
MD5 b87009ab9d08e2f8ec1f4d5e4c3b9bfb
BLAKE2b-256 c93298d904ef6646a8dfe864906c0b83ad3f951632f58b24d0a50300d60044f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pw_simple_scraper-0.1.1.tar.gz:

Publisher: release.yml on elecbrandy/pw-simple-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pw_simple_scraper-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pw_simple_scraper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5cd296c8c89603e96188a8efe0336b283de98442f69387811915bf8dfe4030ce
MD5 de17ba0eaa8bf0a1dc39d95fb71b4433
BLAKE2b-256 820b3edbec254a57cd87d77ede3253d3129cbb21e9c40d1d3699f3729965406e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pw_simple_scraper-0.1.1-py3-none-any.whl:

Publisher: release.yml on elecbrandy/pw-simple-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page