Skip to main content

Fast, secure web scraping for Python

Project description

EasyScrape

PyPI Python 3.9+ License: MIT

Fast, secure web scraping for Python.

from easyscrape import scrape

result = scrape("https://example.com")
print(result.css("h1"))  # "Example Domain"

Features

  • Simple API: One function to fetch and extract data
  • CSS & XPath: Use familiar selectors
  • Built-in security: SSRF protection, path traversal prevention
  • Automatic retries: Exponential backoff on failures
  • Rate limiting: Respect server limits
  • Caching: Two-tier memory and disk cache
  • Async support: High-performance concurrent scraping
  • JavaScript rendering: Optional Playwright integration

Installation

pip install easyscrape-py

# Optional: JavaScript rendering
pip install easyscrape-py[browser]
playwright install chromium

# Optional: Data export (Excel, Parquet)
pip install easyscrape-py[export]

# Everything
pip install easyscrape-py[all]

Quick Start

Basic Scraping

from easyscrape import scrape

result = scrape("https://example.com")

# Extract single element
title = result.css("h1")

# Extract all matching elements
links = result.css_list("a", "href")

# Extract structured data
data = result.extract({
    "title": "h1",
    "description": "meta[name=description]::attr(content)",
})

Multiple Items

books = result.extract_all(".product", {
    "title": "h3 a::attr(title)",
    "price": ".price::text",
    "url": "a::attr(href)",
})

Configuration

from easyscrape import scrape, Config

config = Config(
    timeout=60.0,
    max_retries=5,
    rate_limit=1.0,  # 1 request/second
    cache_enabled=True,
)

result = scrape("https://example.com", config=config)

Async Scraping

import asyncio
from easyscrape import async_scrape_many

async def main():
    urls = [f"https://example.com/page/{i}" for i in range(100)]
    results = await async_scrape_many(urls)
    return [r.css("h1") for r in results if r.ok]

titles = asyncio.run(main())

JavaScript Rendering

config = Config(javascript=True)
result = scrape("https://spa-site.com", config=config)

CLI

# Get all links
easyscrape https://example.com --links

# Extract specific fields
easyscrape https://example.com -e title=h1 -e desc=.description

# Extract multiple items to CSV
easyscrape https://example.com -e name=.name -c .product -o data.csv -f csv

Error Handling

from easyscrape import scrape
from easyscrape.exceptions import NetworkError, HTTPError, RateLimitHit

try:
    result = scrape(url)
except RateLimitHit:
    time.sleep(60)
    result = scrape(url)
except HTTPError as e:
    print(f"HTTP {e.status_code}")
except NetworkError as e:
    print(f"Network error: {e}")

Security

EasyScrape includes built-in protections:

  • SSRF protection: Blocks requests to localhost, private IPs, cloud metadata endpoints
  • Path traversal prevention: Validates file paths in export functions
  • Safe defaults: SSL verification enabled, redirect limits enforced

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyscrape_py-0.1.0.tar.gz (132.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easyscrape_py-0.1.0-py3-none-any.whl (83.2 kB view details)

Uploaded Python 3

File details

Details for the file easyscrape_py-0.1.0.tar.gz.

File metadata

  • Download URL: easyscrape_py-0.1.0.tar.gz
  • Upload date:
  • Size: 132.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for easyscrape_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 816c722c8c283a9f51d68b597bd88fa0280ad2e5de5f76af2aa9169dbe2819f0
MD5 085d7289814d8bb41919252a2bdba181
BLAKE2b-256 2303a83ccae31bbd26126da273f2ee631c57899f08506bd6cd8356ae31698825

See more details on using hashes here.

File details

Details for the file easyscrape_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: easyscrape_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 83.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for easyscrape_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2017d8b64a4f8e0d5b63db252135071aec037cb4da6e724b6462866c3f98fb54
MD5 42abb352102df97a8b06ceb21ddd1fe7
BLAKE2b-256 e3fd79b182192ee807e31608d752a12ad9575c331e9ba30017547f224970173a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page