Skip to main content

High-performance HTML parsing library for Python

Project description

scrape-rs (Python)

PyPI Python codecov License

Python bindings for scrape-rs, a high-performance HTML parsing library.

Installation

pip install scrape-rs

Alternative package managers:

# uv (recommended - 10-100x faster)
uv pip install scrape-rs

# Poetry
poetry add scrape-rs

# Pipenv
pipenv install scrape-rs

[!IMPORTANT] Requires Python 3.10 or later.

Quick start

from scrape_rs import Soup

html = "<html><body><div class='content'>Hello, World!</div></body></html>"
soup = Soup(html)

div = soup.find("div")
print(div.text)
# Hello, World!

Usage

Find elements

from scrape_rs import Soup

soup = Soup(html)

# Find first element by tag
div = soup.find("div")

# Find all elements
divs = soup.find_all("div")

# CSS selectors
for el in soup.select("div.content > p"):
    print(el.text)

Element properties

element = soup.find("a")

# Get text content
text = element.text

# Get inner HTML
html = element.inner_html

# Get attribute
href = element.get("href")

Batch processing

from scrape_rs import Soup

# Process multiple documents in parallel
documents = [html1, html2, html3]
soups = Soup.parse_batch(documents)

for soup in soups:
    print(soup.find("title").text)

[!TIP] Use parse_batch() for processing multiple documents. It uses all CPU cores automatically.

Type hints

This package includes type stubs for full IDE support:

from scrape_rs import Soup, Tag

def extract_links(soup: Soup) -> list[str]:
    return [a.get("href") for a in soup.select("a[href]")]

Performance

Compared to BeautifulSoup on the same HTML documents:

Operation Speedup
Parse (1 KB) 9.7x faster
Parse (219 KB) 9.2x faster
Parse (5.9 MB) 10.6x faster
find(".class") 132x faster
select(".class") 40x faster

[!TIP] Run python benches/compare_python.py from the project root to benchmark on your hardware.

Related packages

Part of the scrape-rs project:

  • scrape-core — Rust core library
  • scrape-rs (npm) — Node.js bindings
  • @scrape-rs/wasm — Browser/WASM bindings

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_scrape-0.1.0.tar.gz (73.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_scrape-0.1.0-cp314-cp314-macosx_11_0_arm64.whl (556.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

File details

Details for the file fast_scrape-0.1.0.tar.gz.

File metadata

  • Download URL: fast_scrape-0.1.0.tar.gz
  • Upload date:
  • Size: 73.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for fast_scrape-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f6f61b700b400f38c5e720745450eb529f5e6ccae8e919a82b7b73b4360e5773
MD5 b7d9afbed7b46add9e0394e5ee053d2a
BLAKE2b-256 475a7c7b42c8b79060b175bd282e1c71a979b1263328404f70004a6fca5c0f5c

See more details on using hashes here.

File details

Details for the file fast_scrape-0.1.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_scrape-0.1.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa081df9b38f9bfb287811ad73cd88e444c8e1d6f5b4ac6dcc30b65853bfd21e
MD5 eab3b1fa03ee53f0a45123a81281c024
BLAKE2b-256 023bd45861d6c5fde9387e310e1a9ef3efe363ba18df72b39840b36fdff459a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page