Skip to main content

A Powerful Web Scraper with dynamic rendering support.

Project description

🕷️ ScrapeSome

ScrapeSome is a lightweight, flexible web scraping library with both synchronous and asynchronous support. It includes intelligent fallbacks, JavaScript page rendering, response formatting (HTML → Text/JSON/Markdown), and retry mechanisms. Ideal for developers who need robust scraping utilities with minimal setup.


🚀 Features

  • 🔁 Sync + Async scraping support
  • 🔄 Automatic retries and intelligent fallbacks
  • 🧪 Playwright rendering fallback for JS-heavy pages
  • 📝 Format responses as raw HTML, plain text, Markdown, or structured JSON
  • ⚙️ Configurable: timeouts, redirects, user agents, and logging
  • 🧪 Test coverage with pytest and pytest-asyncio

📦 Installation

pip install scrapesome

⚡ Quick Start

Synchronous Example

from scrapesome.scraper.sync_scraper import scraper
html = scraper("https://example.com")
html

Asynchronous Example

import asyncio
from scrapesome.scraper.async_scraper import scraper
html = asyncio.run(scraper("https://example.com"))
html

🧰 Advanced Usage

Force Rendering (Playwright)

from scrapesome.scraper.sync_scraper import scraper
content = scraper("https://example.com", force_playwright=True)
content

Custom User Agents

from scrapesome.scraper.sync_scraper import scraper
content = scraper("https://example.com", user_agents=["MyCustomAgent/1.0"])
content

Control Redirects

from scrapesome.scraper.sync_scraper import scraper
content = scraper("https://example.com", allow_redirects=False)
content

Similarly async can also be used.

🧪 Testing

Run tests with:

pytest --cov=scrapesome tests/

Target coverage: 75–100%

⚙️ Environment Configuration

ScrapeSome reads from environment variables if a .env file is present.

Example .env

LOG_LEVEL=INFO
EXPORT_FORMAT=text
FETCH_PLAYWRIGHT_TIMEOUT=10
FETCH_PAGE_TIMEOUT=10
Variable Description
FETCH_PLAYWRIGHT_TIMEOUT Timeout for Playwright-rendered pages (in seconds)
FETCH_PAGE_TIMEOUT Timeout for standard page fetch (in seconds)
LOG_LEVEL Logging verbosity (DEBUG, INFO, WARNING, etc.)
EXPORT_FORMAT Default export format (text, markdown, json, html)

📄 Output Formats

JSON Example

Get json version

from scrapesome.scraper.sync_scraper import scraper
content = scraper("https://example.com", format_type="json")
content

Output

{
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "url": "https://example.com"
}

Markdown

Convert HTML to Markdown with:

from scrapesome.scraper.sync_scraper import scraper
content = scraper("https://adenuniversity.us", format_type="markdown")
content

Output

# Online Global Masters that boost your global career

**ADEN University** offers students access to professionals who operate in the world of business and administration, who share their knowledge and acumen collaboratively with their students in all **academic programs** offered at ADEN.

[About Us](about-aden-university)


Watch testimonial video 


##### Watch testimonial video

×

[

](https://res.cloudinary.com/cruminott/video/upload/vc_auto,w_auto,q_auto,f_auto/adenu/aden-university-3.mp4)



## ADEN University offers the following academic programs

[![EXECUTIVE MBA. Master of Business Administration](https://adenuniversity.us/files/2016/06/ADENU_miniatura_Emba_900-1-820x400.jpg "EXECUTIVE MBA. Master of Business Administration")](https://adenuniversity.us/academics/executive-mba/  "EXECUTIVE MBA. Master of Business Administration")

##### [EXECUTIVE MBA. Master of Business Administration](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")

The ADEN University Executive MBA is designed to strengthen business leaders to manage...

* **37** credits
* **15** months
* **Spanish Only**

[Visit EMBA Course](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")

[![GLOBAL MBA. Master of Business Administration](https://adenuniversity.us/files/2016/06/ADENU_miniatura_MBAgl1_900-820x400.jpg "GLOBAL MBA. Master of Business Administration")](https://adenuniversity.us/academics/global-mba/  "GLOBAL MBA. Master of Business Administration")

##### [GLOBAL MBA. Master of Business Administration](https://adenuniversity.us/academics/global-mba/ "GLOBAL MBA. Master of Business Administration")

The Global MBA is designed to prepare business leaders to manage companies in an...

* **36** credits
* **14** months
* **Spanish and English**

Similarly async can also be used.

📁 Project Structure

scrapesome/
├── config.py
├── exceptions.py
├── formatter/
│   └── output_formatter.py
├── logging.py
├── scraper/
│   ├── async_scraper.py
│   ├── sync_scraper.py
│   └── rendering.py

🔒 License

MIT License © 2025

🧑‍💻 Author

Crafted with care by Vishnu Vardhan Reddy Contributions welcome! 🙌

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapesome-0.0.2.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapesome-0.0.2-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file scrapesome-0.0.2.tar.gz.

File metadata

  • Download URL: scrapesome-0.0.2.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrapesome-0.0.2.tar.gz
Algorithm Hash digest
SHA256 75ea13f5aa9537d7cadc7496f9a05ae18324823609546c3d2f75698cc06221f1
MD5 46b945c91ded2da4ab03ddd8f52f48a1
BLAKE2b-256 fb2b5b2cf5c2732f001d885794f2da5d8e2fe937f92f5cd3aa0628eaddacee80

See more details on using hashes here.

File details

Details for the file scrapesome-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: scrapesome-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrapesome-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4f3b0a345e3fbb492dce9b73a107b32ea0778e0ac8e4c112bb3e304c5e477b8b
MD5 e373bee49f6e3985ae5b8d7d590af1a7
BLAKE2b-256 57c21710ae67191c1bfbade5714ac72d4d2fdf02e060cc483906505ec0fcdd0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page