Scrapery: A fast, lightweight library to scrape HTML, XML, and JSON using XPath, CSS selectors, and intuitive DOM navigation.

These details have not been verified by PyPI

Project description

🕷️ scrapery

Free PyPI Version Python Versions Downloads

A blazing fast, lightweight, and modern parsing library for HTML, XML, and JSON, designed for web scraping and data extraction.
`It supports both XPath and CSS selectors, along with seamless DOM navigation, making parsing and extracting data straightforward and intuitive.

✨ Features

⚡ Blazing Fast Performance – Optimized for high-speed HTML, XML, and JSON parsing
🎯 Dual Selector Support – Use XPath or CSS selectors for flexible extraction
🛡 Comprehensive Error Handling – Detailed exceptions for different error scenarios
🔄 Async Support – Built-in async utilities for high-concurrency scraping
🧩 Robust Parsing – Encoding detection and content normalization for reliable results
🧑‍💻 Function-Based API – Clean and intuitive interface for ease of use
📦 Multi-Format Support – Parse HTML, XML, and JSON in a single library

⚡ Performance Comparison

The following benchmarks were run on sample HTML and JSON data to compare scrapery with other popular Python libraries.

Library	HTML Parse Time	JSON Parse Time
scrapery	12 ms	8 ms
Other library	120 ms	N/A

⚠️ Actual performance may vary depending on your environment. These results are meant for illustrative purposes only. No library is endorsed or affiliated with scrapery.

📦 Installation

pip install scrapery

# -------------------------------
# HTML Example
# -------------------------------

import scrapery as scrape

html_content = """
<html>
    <body>
        <h1>Welcome</h1>
        <p>Hello<br>World</p>
        <a href="/about">About Us</a>
        <table>
            <tr><th>Name</th><th>Age</th></tr>
            <tr><td>John</td><td>30</td></tr>
            <tr><td>Jane</td><td>25</td></tr>
        </table>
    </body>
</html>
"""

# Parse HTML content
doc = scrape.parse_html(html_content)

# Extract text
# CSS selector: First <h1>
print(scrape.get_selector_content(doc, selector="h1"))  
# ➜ Welcome

# XPath: First <h1>
print(scrape.get_selector_content(doc, selector="//h1"))  
# ➜ Welcome

# CSS selector: <a href> attribute
print(scrape.get_selector_content(doc, selector="a", attr="href"))  
# ➜ /about

# XPath: <a> element href
print(scrape.get_selector_content(doc, selector="//a", attr="href"))  
# ➜ /about

# CSS: First <td> in table (John)
print(scrape.get_selector_content(doc, selector="td"))  
# ➜ John

# XPath: Second <td> (//td[2] = 30)
print(scrape.get_selector_content(doc, selector="//td[2]"))  
# ➜ 30

# XPath: Jane's age (//tr[3]/td[2])
print(scrape.get_selector_content(doc, selector="//tr[3]/td[2]"))  
# ➜ 25

# No css selector or XPath: full text
print(scrape.get_selector_content(doc))  
# ➜ Welcome HelloWorld About Us Name Age John 30 Jane 25

# Root attribute (lang, if it existed)
print(scrape.get_selector_content(doc, attr="lang"))  
# ➜ None

#-------------------------
# DOM navigation
#-------------------------
# Example 1: parent, children, siblings
p_elem = select_one(doc,"p")
print("Parent tag of <p>:", scrape.parent(p_elem).tag)
print("Children of <p>:", [c.tag for c in scrape.children(p_elem)])
print("Siblings of <p>:", [s.tag for s in scrape.siblings(p_elem)])

# Example 2: next_sibling, prev_sibling
print("Next sibling of <p>:", scrape.next_sibling(p_elem).tag)
h1_elem = scrape.select_one(doc,"h1")
print("Previous sibling of <p>:", scrape.next_sibling(h1_elem))

# Example 3: ancestors and descendants
ancs = scrape.ancestors(p_elem)
print("Ancestor tags of <p>:", [a.tag for a in ancs])
desc = descendants(scrape.select_one(doc,"table"))
print("Descendant tags of <table>:", [d.tag for d in desc])

# Example 4: class utilities
div_html = '<div class="card primary"></div>'
div_elem = scrape.parse_html(div_html)
print("Has class 'card'? ->", scrape.has_class(div_elem, "card"))
print("Classes:", scrape.get_classes(div_elem))

# -------------------------------
# Resolve relative URLs
# -------------------------------

html = """
<html>
  <body>
    <a href="/about">About</a>
    <img src="/images/logo.png">
  </body>
</html>
"""

doc = scrape.parse_html(html)
base = "https://example.com"

# Get all <a> links
print(scrape.get_absolute_url(doc, "a", base_url=base))
# → 'https://example.com/about'

# Get all <img> sources
print(scrape.get_absolute_url(doc, "img", base_url=base, attr="src"))
# → 'https://example.com/images/logo.png'

# Extract tables
tables = scrape.get_table_content(doc, as_dicts=True)
print("Tables:", tables)

# -------------------------------
# XML Example
# -------------------------------

xml_content = """
<users>
    <user id="1"><name>John</name></user>
    <user id="2"><name>Jane</name></user>
</users>
"""

xml_doc = scrape.parse_xml(xml_content)
users = scrape.find_xml_all(xml_doc, "//user")
for u in users:
    print(u.attrib, u.xpath("./name/text()")[0])

# Convert XML to dict
xml_dict = scrape.xml_to_dict(xml_doc)
print(xml_dict)

# -------------------------------
# JSON Example
# -------------------------------

json_content = '{"users":[{"name":"John","age":30},{"name":"Jane","age":25}]}'
data = scrape.parse_json(json_content)

# Access using path
john_age = scrape.json_get_value(data, "users.0.age")
print("John's age:", john_age)

# Extract all names
names = scrape.json_extract_values(data, "name")
print("Names:", names)

# Flatten JSON
flat = scrape.json_flatten(data)
print("Flattened JSON:", flat)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.23

Mar 1, 2026

0.1.22

Nov 11, 2025

0.1.21

Nov 11, 2025

0.1.20

Nov 4, 2025

0.1.19

Oct 27, 2025

0.1.18

Oct 26, 2025

0.1.17

Oct 26, 2025

0.1.16

Oct 26, 2025

0.1.15

Oct 23, 2025

0.1.14

Sep 26, 2025

0.1.13

Sep 24, 2025

0.1.12

Sep 23, 2025

0.1.11

Sep 22, 2025

0.0.9

Sep 22, 2025

0.0.8

Sep 22, 2025

0.0.7

Sep 22, 2025

0.0.6

Sep 18, 2025

0.0.5

Sep 18, 2025

0.0.4

Sep 17, 2025

This version

0.0.2

Sep 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapery-0.0.2.tar.gz (12.8 kB view details)

Uploaded Sep 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapery-0.0.2-py3-none-any.whl (13.2 kB view details)

Uploaded Sep 6, 2025 Python 3

File details

Details for the file scrapery-0.0.2.tar.gz.

File metadata

Download URL: scrapery-0.0.2.tar.gz
Upload date: Sep 6, 2025
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapery-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`dc1dd4f188ff0a12f80c7f0cb2baa378ad1a7dbf259f061060739c9f76a12fb2`
MD5	`e46430d49b24ea5c440a4d84957722f0`
BLAKE2b-256	`5bbcc0e2192ac1b90ff6b6b13cee75afb0fc055ef0d34b1c50e3a95890d01632`

See more details on using hashes here.

File details

Details for the file scrapery-0.0.2-py3-none-any.whl.

File metadata

Download URL: scrapery-0.0.2-py3-none-any.whl
Upload date: Sep 6, 2025
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scrapery-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06df029000e9c52a50c45cc48597050e2fa918222eb108d8610cb73e5cc04f0f`
MD5	`a5f018603fe557c67880fd1af26fddaf`
BLAKE2b-256	`185419c8dc870199e6296888ed9f52903c0badf428b19e95ea40e9cf7da45f7a`

See more details on using hashes here.

scrapery 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🕷️ scrapery

✨ Features

⚡ Performance Comparison

📦 Installation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes