Skip to main content

Fast, secure web scraping for Python

Project description

EasyScrape

PyPI Python 3.9+ License: MIT

Fast, secure web scraping for Python.

from easyscrape import scrape

result = scrape("https://example.com")
print(result.css("h1"))  # "Example Domain"

Features

  • Simple API: One function to fetch and extract data
  • CSS & XPath: Use familiar selectors
  • Built-in security: SSRF protection, path traversal prevention
  • Automatic retries: Exponential backoff on failures
  • Rate limiting: Respect server limits
  • Caching: Two-tier memory and disk cache
  • Async support: High-performance concurrent scraping
  • JavaScript rendering: Optional Playwright integration

Installation

pip install easyscrape-py

# Optional: JavaScript rendering
pip install easyscrape-py[browser]
playwright install chromium

# Optional: Data export (Excel, Parquet)
pip install easyscrape-py[export]

# Everything
pip install easyscrape-py[all]

Quick Start

Basic Scraping

from easyscrape import scrape

result = scrape("https://example.com")

# Extract single element
title = result.css("h1")

# Extract all matching elements
links = result.css_list("a", "href")

# Extract structured data
data = result.extract({
    "title": "h1",
    "description": "meta[name=description]::attr(content)",
})

Multiple Items

books = result.extract_all(".product", {
    "title": "h3 a::attr(title)",
    "price": ".price::text",
    "url": "a::attr(href)",
})

Configuration

from easyscrape import scrape, Config

config = Config(
    timeout=60.0,
    max_retries=5,
    rate_limit=1.0,  # 1 request/second
    cache_enabled=True,
)

result = scrape("https://example.com", config=config)

Async Scraping

import asyncio
from easyscrape import async_scrape_many

async def main():
    urls = [f"https://example.com/page/{i}" for i in range(100)]
    results = await async_scrape_many(urls)
    return [r.css("h1") for r in results if r.ok]

titles = asyncio.run(main())

JavaScript Rendering

config = Config(javascript=True)
result = scrape("https://spa-site.com", config=config)

CLI

# Get all links
easyscrape https://example.com --links

# Extract specific fields
easyscrape https://example.com -e title=h1 -e desc=.description

# Extract multiple items to CSV
easyscrape https://example.com -e name=.name -c .product -o data.csv -f csv

Error Handling

from easyscrape import scrape
from easyscrape.exceptions import NetworkError, HTTPError, RateLimitHit

try:
    result = scrape(url)
except RateLimitHit:
    time.sleep(60)
    result = scrape(url)
except HTTPError as e:
    print(f"HTTP {e.status_code}")
except NetworkError as e:
    print(f"Network error: {e}")

Security

EasyScrape includes built-in protections:

  • SSRF protection: Blocks requests to localhost, private IPs, cloud metadata endpoints
  • Path traversal prevention: Validates file paths in export functions
  • Safe defaults: SSL verification enabled, redirect limits enforced

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyscrape_py-0.1.1.tar.gz (144.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easyscrape_py-0.1.1-py3-none-any.whl (83.5 kB view details)

Uploaded Python 3

File details

Details for the file easyscrape_py-0.1.1.tar.gz.

File metadata

  • Download URL: easyscrape_py-0.1.1.tar.gz
  • Upload date:
  • Size: 144.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for easyscrape_py-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fce1562b4a8d587bc3666d4b83c98dfa06f529e1789092b04e092a66f38fbbf4
MD5 9f8b7d62181e002356ad1f9ca391244c
BLAKE2b-256 9eaa68e476eb22d16363a6c480b8e1966473990e876b0291556d9737a641f2bf

See more details on using hashes here.

File details

Details for the file easyscrape_py-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: easyscrape_py-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 83.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for easyscrape_py-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a25db0443ac9cd281c1d852770aa807d841969083b0c29f199bc8930c135fa3
MD5 0534a22f3345d1bc79b602f53620453d
BLAKE2b-256 8f59dd9752fbf757408907826190afe8a8b967ade844050ebe711641dc7989d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page