Skip to main content

A Powerful Web Scraper with dynamic rendering support.

Project description

๐Ÿ•ท๏ธ ScrapeSome

ScrapeSome is a lightweight, flexible web scraping library with both synchronous and asynchronous support. It includes intelligent fallbacks, JavaScript page rendering, response formatting (HTML โ†’ Text/JSON/Markdown), and retry mechanisms. Ideal for developers who need robust scraping utilities with minimal setup.


๐Ÿš€ Features

  • ๐Ÿ” Sync + Async scraping support
  • ๐Ÿ”„ Automatic retries and intelligent fallbacks
  • ๐Ÿงช Playwright rendering fallback for JS-heavy pages
  • ๐Ÿ“ Format responses as raw HTML, plain text, Markdown, or structured JSON
  • โš™๏ธ Configurable: timeouts, redirects, user agents, and logging
  • ๐Ÿงช Test coverage with pytest and pytest-asyncio

๐Ÿ“ฆ Installation

pip install scrapesome

โšก Quick Start

Synchronous Example

from scrapesome.scraper.sync_scraper import sync_scraper
html = sync_scraper("https://example.com")
html

Asynchronous Example

import asyncio
from scrapesome.scraper.async_scraper import async_scraper
html = asyncio.run(async_scraper("https://example.com"))
html

๐Ÿงฐ Advanced Usage

Force Rendering (Playwright)

from scrapesome.scraper.sync_scraper import sync_scraper
content = sync_scraper("https://example.com", force_playwright=True)
content

Custom User Agents

from scrapesome.scraper.sync_scraper import sync_scraper
content = sync_scraper("https://example.com", user_agents=["MyCustomAgent/1.0"])
content

Control Redirects

from scrapesome.scraper.sync_scraper import sync_scraper
content = sync_scraper("https://example.com", allow_redirects=False)
content

similarly async_scraper can also be used.

๐Ÿงช Testing

Run tests with:

pytest --cov=scrapesome tests/

Target coverage: 75โ€“100%

โš™๏ธ Environment Configuration

ScrapeSome reads from environment variables if a .env file is present.

Example .env

LOG_LEVEL=INFO
OUTPUT_FORMAT=text
FETCH_PLAYWRIGHT_TIMEOUT=10
FETCH_PAGE_TIMEOUT=10
USER_AGENTS=["Mozilla/5.0 (Windows NT 10.0; Win64; x64)......."]
Key Description
FETCH_PLAYWRIGHT_TIMEOUT Timeout for Playwright-rendered pages (in seconds)
FETCH_PAGE_TIMEOUT Timeout for standard page fetch (in seconds)
LOG_LEVEL Logging verbosity (DEBUG, INFO, WARNING, etc.)
OUTPUT_FORMAT Default output format (text, markdown, json, html)
USER_AGENTS Default user agents ("Mozilla/5.0 (Windows NT 10.0; Win64; x64).......")

๐Ÿ“„ Output Formats

JSON Example

Get json version

from scrapesome.scraper.sync_scraper import sync_scraper
content = sync_scraper("https://example.com", output_format_type="json")
content

Output

{
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "url": "https://example.com"
}

Markdown

Convert HTML to Markdown with:

from scrapesome.scraper.sync_scraper import sync_scraper
content = sync_scraper("https://adenuniversity.us", output_format_type="markdown")
content

Output

# Online Global Masters that boost your global career

**ADENย University** offers students access to professionals who operate in the world of business and administration, who share their knowledge and acumen collaboratively with their students in all **academic programs** offered at ADEN.

[About Us](about-aden-university)


Watch testimonial video 


##### Watch testimonial video

ร—

[

](https://res.cloudinary.com/cruminott/video/upload/vc_auto,w_auto,q_auto,f_auto/adenu/aden-university-3.mp4)



## ADEN University offers the following academic programs

[![EXECUTIVE MBA. Master of Business Administration](https://adenuniversity.us/files/2016/06/ADENU_miniatura_Emba_900-1-820x400.jpg "EXECUTIVE MBA. Master of Business Administration")](https://adenuniversity.us/academics/executive-mba/  "EXECUTIVE MBA. Master of Business Administration")

##### [EXECUTIVE MBA. Master of Business Administration](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")

The ADEN University Executive MBA is designed to strengthen business leaders to manage...

* **37** credits
* **15** months
* **Spanish Only**

[Visit EMBA Course](https://adenuniversity.us/academics/executive-mba/ "EXECUTIVE MBA. Master of Business Administration")

[![GLOBAL MBA. Master of Business Administration](https://adenuniversity.us/files/2016/06/ADENU_miniatura_MBAgl1_900-820x400.jpg "GLOBAL MBA. Master of Business Administration")](https://adenuniversity.us/academics/global-mba/  "GLOBAL MBA. Master of Business Administration")

##### [GLOBAL MBA. Master of Business Administration](https://adenuniversity.us/academics/global-mba/ "GLOBAL MBA. Master of Business Administration")

The Global MBA is designed to prepare business leaders to manage companies in an...

* **36** credits
* **14** months
* **Spanish and English**

similarly async_scraper can also be used.

๐Ÿ“ Project Structure

scrapesome/
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ pytest.ini
โ”œโ”€โ”€ .github/
โ”‚   โ”œโ”€โ”€ workflows/
โ”‚       โ””โ”€โ”€ deploy.yml
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ config.py
โ”œโ”€โ”€ exceptions.py
โ”œโ”€โ”€ formatter/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ output_formatter.py
โ”œโ”€โ”€ logging.py
โ”œโ”€โ”€ scraper/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ async_scraper.py
โ”‚   โ”œโ”€โ”€ sync_scraper.py
โ”‚   โ””โ”€โ”€ rendering.py
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ index.md
โ”‚   โ”œโ”€โ”€ getting_started.md
โ”‚   โ”œโ”€โ”€ usage.md
โ”‚   โ”œโ”€โ”€ config.md
โ”‚   โ”œโ”€โ”€ examples.md
โ”‚   โ”œโ”€โ”€ about.md
โ”‚   โ””โ”€โ”€ licence.md
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ test_sync_scraper.py
โ”‚   โ”œโ”€โ”€ test_async_scraper.py
โ”‚   โ””โ”€โ”€ test_config.py
โ”œโ”€โ”€ setup.py
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

๐Ÿ”’ License

MIT License ยฉ 2025

๐Ÿค Contributions

Contributions are welcome! Whether it's bug reports, feature suggestions, or pull requests โ€” your help is appreciated.

To get started:

git clone https://github.com/scrapesome/scrapesome.git
cd scrapesome

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapesome-0.0.3.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapesome-0.0.3-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file scrapesome-0.0.3.tar.gz.

File metadata

  • Download URL: scrapesome-0.0.3.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrapesome-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3339b484954f77046dfc1428804d554b20c91217bfe8b98c9b89d280bc9dedb6
MD5 05631b497e8de6b17520f246ee5e0234
BLAKE2b-256 00c10cfe7d862fa6639559648113c38373745af8bdf4fe081d41d98789a5e93f

See more details on using hashes here.

File details

Details for the file scrapesome-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: scrapesome-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrapesome-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 674d999e39e9ffddac70e5595ed19874ee95f098745a025da1afa16a43bd7e85
MD5 bdba6bc1d630a54d8b4a1531ae3fa7ec
BLAKE2b-256 39646ba3c5a45682a91db3ec5c38c5d78e866dfcaa9eb1a696fd97fd6c6afe75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page