Fast, secure web scraping for Python
Project description
EasyScrape
Fast, secure web scraping for Python.
from easyscrape import scrape
result = scrape("https://example.com")
print(result.css("h1")) # "Example Domain"
Features
- Simple API: One function to fetch and extract data
- CSS & XPath: Use familiar selectors
- Built-in security: SSRF protection, path traversal prevention
- Automatic retries: Exponential backoff on failures
- Rate limiting: Respect server limits
- Caching: Two-tier memory and disk cache
- Async support: High-performance concurrent scraping
- JavaScript rendering: Optional Playwright integration
Installation
pip install easyscrape-py
# Optional: JavaScript rendering
pip install easyscrape-py[browser]
playwright install chromium
# Optional: Data export (Excel, Parquet)
pip install easyscrape-py[export]
# Everything
pip install easyscrape-py[all]
Quick Start
Basic Scraping
from easyscrape import scrape
result = scrape("https://example.com")
# Extract single element
title = result.css("h1")
# Extract all matching elements
links = result.css_list("a", "href")
# Extract structured data
data = result.extract({
"title": "h1",
"description": "meta[name=description]::attr(content)",
})
Multiple Items
books = result.extract_all(".product", {
"title": "h3 a::attr(title)",
"price": ".price::text",
"url": "a::attr(href)",
})
Configuration
from easyscrape import scrape, Config
config = Config(
timeout=60.0,
max_retries=5,
rate_limit=1.0, # 1 request/second
cache_enabled=True,
)
result = scrape("https://example.com", config=config)
Async Scraping
import asyncio
from easyscrape import async_scrape_many
async def main():
urls = [f"https://example.com/page/{i}" for i in range(100)]
results = await async_scrape_many(urls)
return [r.css("h1") for r in results if r.ok]
titles = asyncio.run(main())
JavaScript Rendering
config = Config(javascript=True)
result = scrape("https://spa-site.com", config=config)
CLI
# Get all links
easyscrape https://example.com --links
# Extract specific fields
easyscrape https://example.com -e title=h1 -e desc=.description
# Extract multiple items to CSV
easyscrape https://example.com -e name=.name -c .product -o data.csv -f csv
Error Handling
from easyscrape import scrape
from easyscrape.exceptions import NetworkError, HTTPError, RateLimitHit
try:
result = scrape(url)
except RateLimitHit:
time.sleep(60)
result = scrape(url)
except HTTPError as e:
print(f"HTTP {e.status_code}")
except NetworkError as e:
print(f"Network error: {e}")
Security
EasyScrape includes built-in protections:
- SSRF protection: Blocks requests to localhost, private IPs, cloud metadata endpoints
- Path traversal prevention: Validates file paths in export functions
- Safe defaults: SSL verification enabled, redirect limits enforced
Documentation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
easyscrape_py-0.1.1.tar.gz
(144.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easyscrape_py-0.1.1.tar.gz.
File metadata
- Download URL: easyscrape_py-0.1.1.tar.gz
- Upload date:
- Size: 144.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fce1562b4a8d587bc3666d4b83c98dfa06f529e1789092b04e092a66f38fbbf4
|
|
| MD5 |
9f8b7d62181e002356ad1f9ca391244c
|
|
| BLAKE2b-256 |
9eaa68e476eb22d16363a6c480b8e1966473990e876b0291556d9737a641f2bf
|
File details
Details for the file easyscrape_py-0.1.1-py3-none-any.whl.
File metadata
- Download URL: easyscrape_py-0.1.1-py3-none-any.whl
- Upload date:
- Size: 83.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a25db0443ac9cd281c1d852770aa807d841969083b0c29f199bc8930c135fa3
|
|
| MD5 |
0534a22f3345d1bc79b602f53620453d
|
|
| BLAKE2b-256 |
8f59dd9752fbf757408907826190afe8a8b967ade844050ebe711641dc7989d2
|