High-performance HTML parsing library for Python
Project description
scrape-rs (Python)
Python bindings for scrape-rs, a high-performance HTML parsing library.
Installation
pip install scrape-rs
Alternative package managers:
# uv (recommended - 10-100x faster)
uv pip install scrape-rs
# Poetry
poetry add scrape-rs
# Pipenv
pipenv install scrape-rs
[!IMPORTANT] Requires Python 3.10 or later.
Quick start
from scrape_rs import Soup
html = "<html><body><div class='content'>Hello, World!</div></body></html>"
soup = Soup(html)
div = soup.find("div")
print(div.text)
# Hello, World!
Usage
Find elements
from scrape_rs import Soup
soup = Soup(html)
# Find first element by tag
div = soup.find("div")
# Find all elements
divs = soup.find_all("div")
# CSS selectors
for el in soup.select("div.content > p"):
print(el.text)
Element properties
element = soup.find("a")
# Get text content
text = element.text
# Get inner HTML
html = element.inner_html
# Get attribute
href = element.get("href")
Batch processing
from scrape_rs import Soup
# Process multiple documents in parallel
documents = [html1, html2, html3]
soups = Soup.parse_batch(documents)
for soup in soups:
print(soup.find("title").text)
[!TIP] Use
parse_batch()for processing multiple documents. It uses all CPU cores automatically.
Type hints
This package includes type stubs for full IDE support:
from scrape_rs import Soup, Tag
def extract_links(soup: Soup) -> list[str]:
return [a.get("href") for a in soup.select("a[href]")]
Performance
Compared to BeautifulSoup on the same HTML documents:
| Operation | Speedup |
|---|---|
| Parse (1 KB) | 9.7x faster |
| Parse (219 KB) | 9.2x faster |
| Parse (5.9 MB) | 10.6x faster |
find(".class") |
132x faster |
select(".class") |
40x faster |
[!TIP] Run
python benches/compare_python.pyfrom the project root to benchmark on your hardware.
Related packages
Part of the scrape-rs project:
scrape-core— Rust core libraryscrape-rs(npm) — Node.js bindings@scrape-rs/wasm— Browser/WASM bindings
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_scrape-0.1.0.tar.gz.
File metadata
- Download URL: fast_scrape-0.1.0.tar.gz
- Upload date:
- Size: 73.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6f61b700b400f38c5e720745450eb529f5e6ccae8e919a82b7b73b4360e5773
|
|
| MD5 |
b7d9afbed7b46add9e0394e5ee053d2a
|
|
| BLAKE2b-256 |
475a7c7b42c8b79060b175bd282e1c71a979b1263328404f70004a6fca5c0f5c
|
File details
Details for the file fast_scrape-0.1.0-cp314-cp314-macosx_11_0_arm64.whl.
File metadata
- Download URL: fast_scrape-0.1.0-cp314-cp314-macosx_11_0_arm64.whl
- Upload date:
- Size: 556.0 kB
- Tags: CPython 3.14, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa081df9b38f9bfb287811ad73cd88e444c8e1d6f5b4ac6dcc30b65853bfd21e
|
|
| MD5 |
eab3b1fa03ee53f0a45123a81281c024
|
|
| BLAKE2b-256 |
023bd45861d6c5fde9387e310e1a9ef3efe363ba18df72b39840b36fdff459a8
|