Fast, secure web scraping for Python
Project description
EasyScrape
Fast, secure web scraping for Python.
from easyscrape import scrape
result = scrape("https://example.com")
print(result.css("h1")) # "Example Domain"
Features
- Simple API: One function to fetch and extract data
- CSS & XPath: Use familiar selectors
- Built-in security: SSRF protection, path traversal prevention
- Automatic retries: Exponential backoff on failures
- Rate limiting: Respect server limits
- Caching: Two-tier memory and disk cache
- Async support: High-performance concurrent scraping
- JavaScript rendering: Optional Playwright integration
Installation
pip install easyscrape-py
# Optional: JavaScript rendering
pip install easyscrape-py[browser]
playwright install chromium
# Optional: Data export (Excel, Parquet)
pip install easyscrape-py[export]
# Everything
pip install easyscrape-py[all]
Quick Start
Basic Scraping
from easyscrape import scrape
result = scrape("https://example.com")
# Extract single element
title = result.css("h1")
# Extract all matching elements
links = result.css_list("a", "href")
# Extract structured data
data = result.extract({
"title": "h1",
"description": "meta[name=description]::attr(content)",
})
Multiple Items
books = result.extract_all(".product", {
"title": "h3 a::attr(title)",
"price": ".price::text",
"url": "a::attr(href)",
})
Configuration
from easyscrape import scrape, Config
config = Config(
timeout=60.0,
max_retries=5,
rate_limit=1.0, # 1 request/second
cache_enabled=True,
)
result = scrape("https://example.com", config=config)
Async Scraping
import asyncio
from easyscrape import async_scrape_many
async def main():
urls = [f"https://example.com/page/{i}" for i in range(100)]
results = await async_scrape_many(urls)
return [r.css("h1") for r in results if r.ok]
titles = asyncio.run(main())
JavaScript Rendering
config = Config(javascript=True)
result = scrape("https://spa-site.com", config=config)
CLI
# Get all links
easyscrape https://example.com --links
# Extract specific fields
easyscrape https://example.com -e title=h1 -e desc=.description
# Extract multiple items to CSV
easyscrape https://example.com -e name=.name -c .product -o data.csv -f csv
Error Handling
from easyscrape import scrape
from easyscrape.exceptions import NetworkError, HTTPError, RateLimitHit
try:
result = scrape(url)
except RateLimitHit:
time.sleep(60)
result = scrape(url)
except HTTPError as e:
print(f"HTTP {e.status_code}")
except NetworkError as e:
print(f"Network error: {e}")
Security
EasyScrape includes built-in protections:
- SSRF protection: Blocks requests to localhost, private IPs, cloud metadata endpoints
- Path traversal prevention: Validates file paths in export functions
- Safe defaults: SSL verification enabled, redirect limits enforced
Documentation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
easyscrape_py-0.1.0.tar.gz
(132.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easyscrape_py-0.1.0.tar.gz.
File metadata
- Download URL: easyscrape_py-0.1.0.tar.gz
- Upload date:
- Size: 132.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
816c722c8c283a9f51d68b597bd88fa0280ad2e5de5f76af2aa9169dbe2819f0
|
|
| MD5 |
085d7289814d8bb41919252a2bdba181
|
|
| BLAKE2b-256 |
2303a83ccae31bbd26126da273f2ee631c57899f08506bd6cd8356ae31698825
|
File details
Details for the file easyscrape_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: easyscrape_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 83.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2017d8b64a4f8e0d5b63db252135071aec037cb4da6e724b6462866c3f98fb54
|
|
| MD5 |
42abb352102df97a8b06ceb21ddd1fe7
|
|
| BLAKE2b-256 |
e3fd79b182192ee807e31608d752a12ad9575c331e9ba30017547f224970173a
|