Fast, ergonomic HTML parsing for Python with a Rust core and Pydantic extraction

These details have not been verified by PyPI

Project description

xhtml

You built a fast data pipeline. Then you added an HTML parser.

AI pipelines today scrape thousands — sometimes millions — of pages to feed context into agents, build knowledge bases, run competitive intelligence, and power real-time decision making. The HTTP layer? Async, concurrent, non-blocking. Your infrastructure? Horizontally scaled.

Then your agent hands the raw HTML to a pure-Python parser, and the whole pipeline grinds to a halt.

Processing 1,000 pages (100 KB each) with a standard Python parser takes ~37 seconds. With xhtml, it takes ~1.1 seconds. That is not a micro-optimisation — it is the difference between a pipeline that responds in near-real-time and one that your users are waiting on.

1,000 pages × 100 KB each
─────────────────────────────────────────────────────────
standard Python parser   ████████████████████████████████  37 s
xhtml                    █  1.1 s                           (~34× faster)

This is not a toy benchmark on contrived data. At scale, your parser is the bottleneck — and now it does not have to be.

from xhtml import Xhtml  # one-line drop-in replacement

soup = Xhtml(html, "html.parser")

titles  = soup.find_all("h2", class_="post-title")
link    = soup.select_one("nav a.active")["href"]
summary = soup.find("p", class_="intro").get_text(strip=True)

What is xhtml?

xhtml is a Python library for parsing and querying HTML/XML, built for developers who cannot afford the performance tax of pure-Python parsing engines. It exposes the same clean, ergonomic API you already know — while a Rust engine handles every byte underneath.

Already using BeautifulSoup or another Python parser? xhtml is a single-import swap — see Migration.

Why xhtml?

A Python API — with no Python in the hot path

Three classic bottlenecks of pure-Python HTML parsing:

Tokeniser — walks the document in Python, character-by-character.
Python object tree — every tag becomes a Python object with GC overhead. A 500 KB page creates ~2,000 objects, fragments the heap, and stresses the garbage collector.
Python query engine — find_all("div", class_="foo") iterates every node in Python, comparing strings one-by-one.

    Your Python code
          │
          ▼
  xhtml Python API   ← clean, expressive
          │  PyO3 bindings
          ▼
   Rust engine (_core)
    ├─ html5ever         ← streaming spec-compliant HTML5 parser
    ├─ arena tree        ← memory-contiguous, zero GC pressure
    ├─ DFS query engine  ← fast string ops, no Python overhead
    └─ CSS selector eng  ← battle-tested scraper crate

Python objects you get back are lightweight wrappers — just a node ID + a shared reference. No data is ever copied from the Rust tree.

Pydantic-native structured extraction

Turn HTML directly into typed, validated data models — without a single loop. Define what you want; xhtml delivers:

from xhtml.extract import HtmlModel, Field
from typing import List

class Article(HtmlModel):
    title:   str       = Field(selector="h1")
    url:     str       = Field(selector="a.read-more", attr="href", default="#")
    summary: str       = Field(selector="p.intro",     default="")
    tags:    List[str] = Field(selector=".tag",         multiple=True, default_factory=list)

article  = Article.from_html(html)
articles = Article.from_html_list(page_html, item_selector="article.post")

Benchmarks

Operations measured on realistic article HTML, 50 iterations, Linux x86_64, Python 3.12, Intel Core i7:

Operation	pure-Python parser	xhtml	Speedup
`Xhtml(html)` 20 KB	7.1 ms	0.21 ms	~34×
`Xhtml(html)` 100 KB	37 ms	1.1 ms	~33×
`Xhtml(html)` 500 KB	188 ms	5.4 ms	~35×
`find_all("a")` 100 KB	38 ms	1.3 ms	~29×
`find_all(class_="title")` 100 KB	39 ms	1.2 ms	~33×
`select("article h2.title")` 100 KB	42 ms	1.2 ms	~36×
`get_text()` full page 100 KB	37 ms	1.1 ms	~34×
Process 1,000 pages × 100 KB	~37 s	~1.1 s	~34×

Benchmarks run on Linux x86_64, Python 3.12, Intel Core i7. Run your own: python tests/benchmark.py

Comparison with popular alternatives

Library	Speed	Expressive API	Structured extraction	Migration effort
Pure-Python html.parser	1×	✅	❌	—
xhtml	~34×	✅ same interface	✅ Pydantic-native	minimal
lxml	~5×	⚠️ ElementTree	❌	high
selectolax	~12×	⚠️ Limited	❌	high
parsel	~7×	⚠️ XPath-centric	❌	high
html5-parser	~8×	❌ Parse only	❌	n/a

Installation

pip install xhtml

Pre-compiled wheels ship for:

Linux x86_64 / aarch64 (manylinux)
macOS x86_64 / arm64 (M1 / M2 / M3)
Windows x86_64

No Rust toolchain required. No system dependencies.

Built for the AI era

Modern AI applications do not scrape one page — they scrape millions. Whether you are building a RAG pipeline, a web-crawling agent, competitive intelligence tooling, or a data extraction service, the HTML parsing layer is the silent tax on every operation.

At 34× the throughput of a standard Python parser, xhtml turns that tax into a rounding error.

Common patterns

Async agent pipeline — feed an LLM from thousands of URLs

import asyncio, httpx
from xhtml.extract import HtmlModel, Field
from typing import List

class PageContent(HtmlModel):
    title:    str       = Field(selector="h1")
    body:     str       = Field(selector="article, main, .content", default="")
    links:    List[str] = Field(selector="a", attr="href", multiple=True, default_factory=list)

async def fetch_and_parse(url: str, client: httpx.AsyncClient) -> PageContent:
    resp = await client.get(url, timeout=10)
    return PageContent.from_html(resp.text)

async def scrape_all(urls: list[str]) -> list[PageContent]:
    async with httpx.AsyncClient() as client:
        return await asyncio.gather(*[fetch_and_parse(u, client) for u in urls])

Bulk pipeline — max CPU throughput with threads

from xhtml.extract import HtmlModel, Field
import concurrent.futures

class Product(HtmlModel):
    name:  str   = Field(selector="h1.product-name")
    price: float = Field(selector=".price", transform=lambda s: float(s.lstrip("$")))
    sku:   str   = Field(selector="[data-sku]", attr="data-sku", default="")

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as pool:
    products = list(pool.map(lambda h: Product.from_html(h), raw_html_pages))

Competitive intelligence — structured extraction at scale

from xhtml import Xhtml

def extract_pricing(html: str) -> dict:
    soup  = Xhtml(html)
    plans = {}
    for card in soup.select(".pricing-card"):
        name  = card.select_one(".plan-name").get_text(strip=True)
        price = card.select_one(".price").get_text(strip=True)
        plans[name] = price
    return plans

Quick start

from xhtml import Xhtml

html = """
<html>
  <head><title>My Site</title></head>
  <body>
    <h1 class="title hero">Welcome</h1>
    <ul id="nav">
      <li><a href="/home">Home</a></li>
      <li><a href="/about" class="active">About</a></li>
    </ul>
    <p class="intro">A short intro paragraph.</p>
  </body>
</html>
"""

soup = Xhtml(html, "html.parser")

# Find by tag & class
h1 = soup.find("h1", class_="hero")
print(h1.get_text())                    # Welcome
print(h1["class"])                      # ['title', 'hero']

# CSS selectors
active = soup.select_one("ul#nav a.active")
print(active["href"])                   # /about
print([a["href"] for a in soup.select("ul a")])  # ['/home', '/about']

# Tree navigation
print(h1.parent.name)                   # body
print(list(h1.strings))                 # ['Welcome']

# Intro text
print(soup.find("p", class_="intro").get_text(strip=True))

Structured extraction with Pydantic

xhtml.extract lets you declare typed data models and fill them from HTML in a single call — no loops, no scattered .get_text(), no manual attribute access.

Basic model

from xhtml.extract import HtmlModel, Field

class Product(HtmlModel):
    name:  str   = Field(selector="h1.product-name")
    price: float = Field(
        selector=".price",
        transform=lambda s: float(s.replace("$", "").replace(",", "")),
    )
    image: str   = Field(selector="img.hero", attr="src", default="")
    in_stock: bool = Field(
        selector=".stock-badge",
        transform=lambda s: "in stock" in s.lower(),
        default=False,
    )

product = Product.from_html(html)
print(product.name)       # "Rust in Action"
print(product.price)      # 29.99
print(product.in_stock)   # True

Extracting repeated items

from typing import List

class SearchResult(HtmlModel):
    title: str       = Field(selector="h3")
    url:   str       = Field(selector="a",      attr="href", default="")
    blurb: str       = Field(selector="p.desc", default="")

# One model per matching element
results = SearchResult.from_html_list(page_html, item_selector=".result-card")
for r in results:
    print(r.title, r.url)

From an already-parsed tag

soup = Xhtml(page_html, "html.parser")
for card in soup.select(".result-card"):
    result = SearchResult.from_tag(card)
    print(result.title)

Field options

Parameter	Type	Description
`selector`	`str`	CSS selector to locate the element
`attr`	`str \| None`	Attribute to read (`"href"`, `"src"`, …). `None` = inner text
`multiple`	`bool`	Return a `List` of all matches instead of the first
`strip`	`bool`	Strip surrounding whitespace from text (default `True`)
`transform`	`Callable[[str], Any] \| None`	Post-process each raw string value
`default`	`Any`	Value used when no element is found
`default_factory`	`Callable`	Factory for mutable defaults (e.g.`list`)
`description`	`str`	Forwarded to Pydantic schema

Full API reference

Parsing

from xhtml import Xhtml

# All standard parser names are accepted (xhtml uses the same Rust engine regardless)
soup = Xhtml(html_string, "html.parser")  # recommended
soup = Xhtml(html_string, "lxml")          # same engine, alias for compat
soup = Xhtml(html_string, "html5lib")      # same engine, alias for compat

# Bytes input (encoding auto-detected)
soup = Xhtml(html_bytes, "html.parser")

Searching

# By tag name
soup.find("div")
soup.find_all("a")

# By class
soup.find("p", class_="intro")
soup.find_all(class_="card")

# By id
soup.find(id="main")

# By attribute
soup.find("a", href="/about")
soup.find_all("input", type="text")
soup.find("a", href=True)           # any element that has href
soup.find_all("a", href=re.compile(r"https?://"))  # regex

# Multiple tag names
soup.find_all(["h1", "h2", "h3"])

# CSS selectors
soup.select("div.container > p.intro a")
soup.select_one("#main .title")

# Lambda / callable
soup.find_all(lambda tag: tag.name == "a" and tag.has_attr("data-id"))

# Limit results
soup.find_all("a", limit=5)

Extracting content

tag.get_text()                     # all text, concatenated
tag.get_text(" | ", strip=True)    # separator + strip whitespace
tag.text                           # alias for get_text()
tag.string                         # text if single text child, else None
tag.strings                        # iterator over all text nodes
tag.stripped_strings               # stripped, non-empty strings

Attribute access

tag["href"]                        # raises KeyError if missing
tag.get("href")                    # returns None if missing
tag.get("href", "#")               # custom default value
tag.has_attr("class")              # bool
tag.attrs                          # full dict (class is a list)
tag["class"]                       # list: ["foo", "bar"]

Tree navigation

tag.parent                         # immediate parent Tag
tag.parents                        # generator up to root
tag.children                       # direct children (generator)
tag.contents                       # direct children (list)
tag.descendants                    # all descendants (generator)
tag.next_sibling                   # next sibling node
tag.previous_sibling               # previous sibling node
tag.next_siblings                  # generator of next siblings
tag.previous_siblings              # generator of previous siblings

tag.find_parent("div")
tag.find_parents("div", limit=2)
tag.find_next_sibling("p")
tag.find_next_siblings("p")

Rendering

str(tag)                           # outer HTML
tag.encode("utf-8")                # outer HTML as bytes
tag.decode_contents()              # inner HTML (children only)
tag.prettify()                     # indented HTML

Migration

If you already use beautifulsoup4, switching to xhtml takes one import line:

# Before
from bs4 import BeautifulSoup

# After — only this line changes
from xhtml import Xhtml

The parsing, searching, and navigation API is designed to behave identically. Run your existing test suite — it should pass without changes.

Currently unsupported (v0.x — planned for v0.2)

Feature	Workaround
In-place tree modification (`tag.decompose()`, `insert()`, etc.)	Parse result, transform in Python
`SoupStrainer`	Use `find_all` with `limit=`
`prettify()` with precise indent rules	Use `str(tag)` + a dedicated formatter
Callable `formatter` in `encode()`	Post-process in Python

Development setup

Prerequisites

Rust ≥ 1.75 — install rustup
Python ≥ 3.8
pip install maturin pydantic

Build & install for development

git clone https://github.com/LimaBD/xhtml
cd xhtml
bash scripts/dev_install.sh

Or manually:

pip install maturin
maturin develop --release

Run tests

bash scripts/run_tests.sh

# Or directly
pytest tests/

Run benchmarks

bash scripts/run_benchmarks.sh

# Custom iteration count
bash scripts/run_benchmarks.sh 500

Project structure

xhtml/
├── Cargo.toml                ← Rust package definition
├── pyproject.toml            ← Python package (maturin build system)
├── native/
│   ├── lib.rs                ← PyO3 module: RustDocument, RustNode, RustQuery
│   └── query.rs              ← DFS search engine + CSS match logic
├── src/
│   └── xhtml/
│       ├── __init__.py       ← Public API surface
│       ├── element.py        ← Tag, NavigableString, Xhtml wrappers
│       ├── extract.py        ← Pydantic-based structured extraction
│       └── _compat.py        ← Compatibility aliases
├── tests/
│   ├── conftest.py           ← Shared fixtures & HTML samples
│   ├── test_compat.py        ← Parser API tests (dual-mode: xhtml + bs4)
│   ├── test_advanced.py      ← Edge cases, regex, lambdas, iterators
│   ├── test_extract.py       ← Pydantic extraction tests
│   └── benchmark.py          ← Performance benchmark suite
├── scripts/
│   ├── dev_install.sh        ← One-command dev setup
│   ├── build.sh              ← Build release wheel
│   ├── run_tests.sh          ← Run full test suite
│   ├── run_benchmarks.sh     ← Run benchmarks
│   └── publish.sh            ← Publish to PyPI / TestPyPI
└── .github/workflows/
    ├── ci.yml                ← Tests on every push/PR
    └── publish.yml           ← Build + publish wheels on tag

Architecture deep-dive

How the Rust engine works

Input HTML string
        │
        ▼
html5ever (Rust) ─── streaming, spec-compliant HTML5 parser ───▶ ego-tree
        │
        ▼
Arc/Rc<Html>  ──  single allocation, all nodes in contiguous memory
        │
 ┌──────┴──────┐
 │  RustNode   │  ── NodeId (8 bytes) + Rc pointer ── Python object cost: ~40 bytes
 └─────────────┘
        │ PyO3
        ▼
     Tag  ──  Python wrapper ── delegates ALL work to Rust via FFI

Memory model

A Xhtml object holds one Rc<Html> — the entire tree lives once in Rust memory. Every Tag you get back is a tiny Python object (a NodeId + Rc clone). Dereferencing a node is O(1) memory lookup.

Compare this to a pure-Python parser: a typical page creates ~2 000 full Python objects, each with name, attrs, contents, parent, next_sibling, prev_sibling — all Python attributes, all GC-tracked.

Query engine

find_all("div", class_="foo") compiles to:

stack-based DFS over ego-tree nodes
  → match: name == "div" AND "foo" ∈ class_set
  → collect NodeIds → wrap in Tag objects

All string comparisons happen in Rust, using LLVM-optimised byte comparison. Python is only invoked to wrap the final results.

Contributing

Contributions are welcome! Please:

Fork the repo and create a branch.
Make your changes.
Run pytest tests/ — all tests must pass.
Run cargo clippy — no warnings.
Open a PR.

Reporting issues

Please include:

The HTML you're parsing (or a minimal repro)
The output you expected vs. what you got
Your Python/OS version

License

MIT — see LICENSE.

Documentation

Guide	Description
Quick Start	Get up and running in five minutes
API Reference	Complete reference for every method
Structured Extraction	Pydantic models, Field options, and patterns
Migration Guide	Drop-in replacement from BeautifulSoup / lxml

Acknowledgements

xhtml is built on these excellent projects:

PyO3 — Rust ↔ Python bindings
scraper — HTML parsing + CSS selectors
html5ever — Spec-compliant HTML5 parser from the Servo project
ego-tree — Arena-allocated tree
maturin — Build Rust extensions for Python
Pydantic — Structured data validation

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhtml-0.1.0.tar.gz (55.0 kB view details)

Uploaded Apr 29, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xhtml-0.1.0-cp38-abi3-win_amd64.whl (459.8 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+Windows x86-64

xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (559.2 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (564.9 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+manylinux: glibc 2.17+ ARM64

xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (506.5 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+macOS 11.0+ ARM64

xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl (522.2 kB view details)

Uploaded Apr 29, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file xhtml-0.1.0.tar.gz.

File metadata

Download URL: xhtml-0.1.0.tar.gz
Upload date: Apr 29, 2026
Size: 55.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`703adbff1ee1cd99c48720f7adf8e931cb3b77ed7ae39b5a5a724923a626e888`
MD5	`2021b4a8aeeb3ba2e52a544b645a9189`
BLAKE2b-256	`c3389d2fa251a4e8605dfc1291781b396e23be4e42991e2d8f00fdbc76df67df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0.tar.gz:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0.tar.gz
- Subject digest: 703adbff1ee1cd99c48720f7adf8e931cb3b77ed7ae39b5a5a724923a626e888
- Sigstore transparency entry: 1405772185
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

File details

Details for the file xhtml-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

Download URL: xhtml-0.1.0-cp38-abi3-win_amd64.whl
Upload date: Apr 29, 2026
Size: 459.8 kB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`732b1c63dba1969923b60130e42a2532b2cf7044ee86b5ca04f434591ea964cb`
MD5	`b23f11b5426e84a810022930b0fdaf55`
BLAKE2b-256	`99c854b13c7dcace6c9d77ce23535cb07bfe5cbc28f8d8f5e0ac291cfb6f6262`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0-cp38-abi3-win_amd64.whl:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0-cp38-abi3-win_amd64.whl
- Subject digest: 732b1c63dba1969923b60130e42a2532b2cf7044ee86b5ca04f434591ea964cb
- Sigstore transparency entry: 1405772451
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

File details

Details for the file xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Apr 29, 2026
Size: 559.2 kB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`594da60d87828262b96c2a3f5d768ef0c13287ea64b865d18b6f1224eb1fb6de`
MD5	`d07425c0935a485bad25293efd98c469`
BLAKE2b-256	`41ba59ae1f5e1d2bc2272d2a1dfc2ad7c4941ce41cd1dd51d944c869eab79612`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: 594da60d87828262b96c2a3f5d768ef0c13287ea64b865d18b6f1224eb1fb6de
- Sigstore transparency entry: 1405772324
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

File details

Details for the file xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Apr 29, 2026
Size: 564.9 kB
Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`c5e112c97d80c3734afda5cb456107e33b1b35a152ef035665d166ed4357e113`
MD5	`8ceb1cd00a880642ed785eb784e9c237`
BLAKE2b-256	`9b2b163f07b9e6d3df9690b05f266f245b30f2d166966f5cd3545a89d5e3c0ea`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: c5e112c97d80c3734afda5cb456107e33b1b35a152ef035665d166ed4357e113
- Sigstore transparency entry: 1405772567
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

File details

Details for the file xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Apr 29, 2026
Size: 506.5 kB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`66fabbecda66bd970c6b23094852b9650fa2dc1b83815bcb17a8098c98bb4431`
MD5	`f0e1caf32d75feed27da040b74897493`
BLAKE2b-256	`a24afdb806ce0baa3916ff6aa2e790f47c78316248af2d8873d90cb4d4291562`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
- Subject digest: 66fabbecda66bd970c6b23094852b9650fa2dc1b83815bcb17a8098c98bb4431
- Sigstore transparency entry: 1405772677
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

File details

Details for the file xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: Apr 29, 2026
Size: 522.2 kB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`39e4cf7d21aca5ffef6b5a10c37637e022b4a79898e36084e282160a242035ba`
MD5	`05566572bdcbd04f03584b741f57e70f`
BLAKE2b-256	`7c19824902bbbb7848e0f66c8a799f37d23a8ae57d32b44d125085ff91eba305`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: publish.yml on LimaBD/xhtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xhtml-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
- Subject digest: 39e4cf7d21aca5ffef6b5a10c37637e022b4a79898e36084e282160a242035ba
- Sigstore transparency entry: 1405772795
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: LimaBD/xhtml@92de4b35403e913b98e5092d39af5d53710f4b69
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/LimaBD
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@92de4b35403e913b98e5092d39af5d53710f4b69
- Trigger Event: push

xhtml 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

xhtml

You built a fast data pipeline. Then you added an HTML parser.

What is xhtml?

Why xhtml?

A Python API — with no Python in the hot path

Pydantic-native structured extraction

Benchmarks

Comparison with popular alternatives

Installation

Built for the AI era

Common patterns

Quick start

Structured extraction with Pydantic

Basic model

Extracting repeated items

From an already-parsed tag

Field options

Full API reference

Parsing

Searching

Extracting content

Attribute access

Tree navigation

Rendering

Migration

Currently unsupported (v0.x — planned for v0.2)

Development setup

Prerequisites

Build & install for development

Run tests

Run benchmarks

Project structure

Architecture deep-dive

How the Rust engine works

Memory model

Query engine

Contributing

Reporting issues

License

Documentation

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details