Skip to main content

Lightweight utilities for web scraping with requests and Selenium.

Project description

ScraperETC

PyPI
Documentation Status
codecov
CI

ScraperETC is a lightweight Python package that streamlines browser automation and HTTP scraping. It wraps Selenium and requests with clean, Pythonic interfaces that remove the usual boilerplate - especially for waits, drivers, and headers. ScraperETC is designed with anti-bot detection in mind, using smart defaults to reduce the chance of blocks or bans.

Why Use ScraperETC?

  • Selenium imports are long, clunky, and almost impossible to remember. This package wraps what you need so you don't have to memorize boilerplate.
  • HTTP requests are often blocked by anti-bot filters. ScraperETC provides default headers that reduce detection without extra effort.
  • Verifying file downloads shouldn't require writing custom content checks. This package includes built-in PDF validation tools to save you time.

ScraperETC was built to reduce the friction of browser automation and HTTP scraping, especially when using headless Chrome.

Features

  • Minimal wrappers for selenium.webdriver.Chrome and undetected_chromedriver to get up and running fast
  • webdriver_wait() handles selector validation and WebDriverWait behind the scenes
  • http_GET() adds default headers that mimic a modern browser to help you evade bot detection
  • Built-in tools for validating PDF downloads and checking response status
  • Optional exception-raising on failure to let you choose between passive and strict workflows
  • Currently supports only the Chrome web browser, which must be installed and available on your system PATH

Installation

pip install scraper-etc

Requires Python 3.10 or later.

Example Usage

from scraper_etc import setup_chrome_driver, webdriver_wait, http_GET_valid_pdf

# start a headless Chrome driver (using undetected_chromedriver under the hood)
driver = setup_chrome_driver(headless=True)

# wait for a div with a specific ID to appear
elem = webdriver_wait(driver, by="XPATH", selector="//div[@id='main']")

# validate a remote PDF and save it
res = http_GET_valid_pdf("https://example.com/sample.pdf")
if res:
    with open("sample.pdf", "wb") as f:
        f.write(res.content)

Development

ScraperETC includes a modern CI/CD pipeline:

  • Ruff for linting and auto-formatting
  • mypy for static type checking
  • Bandit for security scanning
  • pytest with unit tests covering all core logic
  • Codecov integration for test coverage
  • GitHub Actions CI to run it all on push
  • Dependabot for automated dependency updates

CI workflows live in .github/workflows.

License

This project is released under CC0 (public domain). You are free to use, modify, and redistribute it without restriction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraper_etc-0.1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scraper_etc-0.1.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file scraper_etc-0.1.1.tar.gz.

File metadata

  • Download URL: scraper_etc-0.1.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for scraper_etc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 25df1c6d1d1f95aab913ca962ed73e2ead3ded5c780e5d6cae3a8657d99afe75
MD5 e6b80b7fd0c21c7d20b029c14f49376e
BLAKE2b-256 a71d67c93476639c7d25afe867fe12b0ce33dcfe181a8e9fe97a5d348fc40a86

See more details on using hashes here.

File details

Details for the file scraper_etc-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scraper_etc-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for scraper_etc-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56d7a81bdeaa0280d8908105349bf41d43fe132aaba84beb305fff497d95539a
MD5 3a9e694c5be37455332ebc8bb80a1c3e
BLAKE2b-256 f538d4cff2865c2cfcc72b0b63b4b5ddfce0723b9dfeb8a5434fc41258abe6d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page