web scraper with dynamic rendering support.
Project description
🕷️ ScrapeSome
ScrapeSome is a lightweight, flexible web scraping library with both synchronous and asynchronous support. It includes intelligent fallbacks, JavaScript page rendering, response formatting (HTML → Text/JSON/Markdown), and retry mechanisms. Ideal for developers who need robust scraping utilities with minimal setup.
🚀 Features
- 🔁 Sync + Async scraping support
- 🔄 Automatic retries and intelligent fallbacks
- 🧪 Playwright rendering fallback for JS-heavy pages
- 📝 Format responses as raw HTML, plain text, Markdown, or structured JSON
- ⚙️ Configurable: timeouts, redirects, user agents, and logging
- 🧪 Test coverage with
pytestandpytest-asyncio
📦 Installation
pip install scrapesome
⚡ Quick Start
Synchronous Example
from scrapesome.scraper import scraper
html = scraper("https://example.com")
Asynchronous Example
import asyncio
from scrapesome.scraper import async_scraper
html = asyncio.run(async_scraper("https://example.com"))
🧰 Advanced Usage
Force Rendering (Playwright)
scraper("https://example.com", force_playwright=True)
Custom User Agents
scraper("https://example.com", user_agents=["MyCustomAgent/1.0"])
Control Redirects
scraper("https://example.com", allow_redirects=False)
🧪 Testing
Run tests with:
pytest --cov=scrapesome tests/
Target coverage: 75–100%
⚙️ Environment Configuration
ScrapeSome reads from environment variables if a .env file is present.
Example .env
LOG_LEVEL=INFO
EXPORT_FORMAT=text
FETCH_PLAYWRIGHT_TIMEOUT=10
FETCH_PAGE_TIMEOUT=10
| Variable | Description |
|---|---|
| FETCH_PLAYWRIGHT_TIMEOUT | Timeout for Playwright-rendered pages (in seconds) |
| FETCH_PAGE_TIMEOUT | Timeout for standard page fetch (in seconds) |
| LOG_LEVEL | Logging verbosity (DEBUG, INFO, WARNING, etc.) |
| EXPORT_FORMAT | Default export format (text, markdown, json, html) |
📄 Output Formats
JSON Example
Get json version
scraper("https://example.com", format_type="json")
Output
{
"title": "Example Domain",
"description": "This domain is for use in illustrative examples.",
"url": "https://example.com"
}
Markdown
Convert HTML to Markdown with:
scraper("https://example.com", format_type="markdown")
Output
# Online Global Masters that boost your global career
**ADEN University** offers students access to professionals who operate in the world of business and administration, who share their knowledge and acumen collaboratively with their students in all **academic programs** offered at ADEN.
[About Us](about-aden-university)
Watch testimonial video
##### Watch testimonial video
×
[
](https://res.cloudinary.com/cruminott/video/upload/vc_auto,w_auto,q_auto,f_auto/adenu/aden-university-3.mp4)
## ADEN University offers the following academic programs
[](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")
##### [EXECUTIVE MBA. Master of Business Administration](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")
The ADEN University Executive MBA is designed to strengthen business leaders to manage...
* **37** credits
* **15** months
* **Spanish Only**
[Visit EMBA Course](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")
[](https://adenuniversity.us/academics/global-mba/ "GLOBAL MBA. Master of Business Administration")
##### [GLOBAL MBA. Master of Business Administration](https://adenuniversity.us/academics/global-mba/ "GLOBAL MBA. Master of Business Administration")
The Global MBA is designed to prepare business leaders to manage companies in an...
* **36** credits
* **14** months
* **Spanish and English**
📁 Project Structure
scrapesome/
├── config.py
├── exceptions.py
├── formatter/
│ └── output_formatter.py
├── logging.py
├── scraper/
│ ├── async_scraper.py
│ ├── sync_scraper.py
│ └── rendering.py
🔒 License
MIT License © 2025
🧑💻 Author
Crafted with care by Vishnu Vardhan Reddy
Contributions welcome! 🙌
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapesome-0.0.1.tar.gz.
File metadata
- Download URL: scrapesome-0.0.1.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01fdbd5c1e3d8f18b5d71a09b47ff311eceb1538b212003c1dec6b34254889c3
|
|
| MD5 |
4463f1d4daf1cd039b51eadf70484367
|
|
| BLAKE2b-256 |
8b6f558dd9e2545fbd70f989f4c77c5a6c7c0ce681066d62c94abd14a14fa04d
|
File details
Details for the file scrapesome-0.0.1-py3-none-any.whl.
File metadata
- Download URL: scrapesome-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c64bfc4fc19c07748709b20dceb65655a450d9346813a17c3276b8f56b5e8a8
|
|
| MD5 |
f5d703864dae09282c6e93a11359cfd0
|
|
| BLAKE2b-256 |
a9150e4b5ec3e236ebc705d3fbade641e6f69da3eab91a09ceb80bf8b7c50212
|