Lightweight utilities for web scraping with requests and Selenium.
Project description
ScraperETC
ScraperETC is a lightweight Python package that streamlines browser automation and HTTP scraping. It wraps Selenium and requests with clean, Pythonic interfaces that remove the usual boilerplate - especially for waits, drivers, and headers. ScraperETC is designed with anti-bot detection in mind, using smart defaults to reduce the chance of blocks or bans.
Why Use ScraperETC?
- Selenium imports are long, clunky, and almost impossible to remember. This package wraps what you need so you don't have to memorize boilerplate.
- HTTP requests are often blocked by anti-bot filters. ScraperETC provides default headers that reduce detection without extra effort.
- Verifying file downloads shouldn't require writing custom content checks. This package includes built-in PDF validation tools to save you time.
ScraperETC was built to reduce the friction of browser automation and HTTP scraping, especially when using headless Chrome.
Features
- Minimal wrappers for
selenium.webdriver.Chromeandundetected_chromedriverto get up and running fast webdriver_wait()handles selector validation andWebDriverWaitbehind the sceneshttp_GET()adds default headers that mimic a modern browser to help you evade bot detection- Built-in tools for validating PDF downloads and checking response status
- Optional exception-raising on failure to let you choose between passive and strict workflows
- Currently supports only the Chrome web browser, which must be installed and available on your system
PATH
Installation
pip install scraper-etc
Requires Python 3.10 or later.
Example Usage
from scraper_etc import setup_chrome_driver, webdriver_wait, http_GET_valid_pdf
# start a headless Chrome driver (using undetected_chromedriver under the hood)
driver = setup_chrome_driver(headless=True)
# wait for a div with a specific ID to appear
elem = webdriver_wait(driver, by="XPATH", selector="//div[@id='main']")
# validate a remote PDF and save it
res = http_GET_valid_pdf("https://example.com/sample.pdf")
if res:
with open("sample.pdf", "wb") as f:
f.write(res.content)
Development
ScraperETC includes a modern CI/CD pipeline:
- Ruff for linting and auto-formatting
- mypy for static type checking
- Bandit for security scanning
- pytest with unit tests covering all core logic
- Codecov integration for test coverage
- GitHub Actions CI to run it all on push
- Dependabot for automated dependency updates
CI workflows live in .github/workflows.
License
This project is released under CC0 (public domain). You are free to use, modify, and redistribute it without restriction.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scraper_etc-0.1.2.tar.gz.
File metadata
- Download URL: scraper_etc-0.1.2.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40f4747d756d139c6e6a11ae60d4cbb0ce9aaaaf2a1fbb8335a9ff5edb3e7db3
|
|
| MD5 |
88fe9b37bfd29f58fff50ffcd2f7ef0c
|
|
| BLAKE2b-256 |
d5f5bc743a64cda89ead4583faa407e4a36e119eb8265aa4b996596efc2b2177
|
File details
Details for the file scraper_etc-0.1.2-py3-none-any.whl.
File metadata
- Download URL: scraper_etc-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22b017b54369fb7f4e69acfcecd98963e82ae13b2e30bbd77dbc98c19315a9d1
|
|
| MD5 |
a5ee337e4ed45bb7a5c232dd85670628
|
|
| BLAKE2b-256 |
53c8d571a765bf558539ce33071464932555b17bdcae4f418acdce8506b12254
|