A HTML5 parser.
Project description
Dompa
A zero-dependency HTML5 document parser. It takes an input of an HTML string, parses it into a node tree, and provides an API for querying and manipulating said node tree.
Install
pip install dompa
Requires Python 3.10 or higher.
Usage
The most basic usage looks like this:
from dompa import Dompa
dom = Dompa("<div>Hello, World</div>")
# Get the tree of nodes
nodes = dom.nodes()
# Get the HTML string
html = dom.html()
DOM manipulation
You can run queries on the node tree to get or manipulate node(s).
query
You can find nodes with the query method which takes a Callable that gets Node passed to it and that has to return
a boolean true or false, like so:
from dompa import Dompa
dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")
list_items = dom.query(lambda n: n.name == "li")
All nodes returned with query are deep copies, so mutating them has no effect on Dompa's state.
traverse
The traverse method is very similar to the query method, but instead of returning deep copies of data it returns a
direct reference to data instead, meaning it is ideal for updating the node tree inside of Dompa. It takes a Callable
that gets a Node passed to it, and has to
return the updated node, like so:
from typing import Optional
from dompa import Dompa
from dompa.nodes import Node, TextNode
dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")
def update_title(node: Node) -> Optional[Node]:
if node.name == "h1":
node.children = [TextNode(value="New Title")]
return node
dom.traverse(update_title)
If you wish to remove a node then return None instead of the node.
Types of nodes
There are three types of nodes that you can use in Dompa to manipulate the node tree.
Node
The most common node is just Node. You should use this if you want the node to potentially have any children inside of
it.
from dompa.nodes import Node
Node(name="name-goes-here", attributes={}, children=[])
Would render:
<name-goes-here></name-goes-here>
VoidNode
A void node (or Void Element according to the HTML standard) is self-closing, meaning you would not have any children in it.
from dompa.nodes import VoidNode
VoidNode(name="name-goes-here", attributes={})
Would render:
<name-goes-here>
You would use this to create things like img, input, br and so forth, but of course you can also create custom
elements. Dompa does not enforce the use of any known names.
TextNode
A text node is just for rendering text. It has no tag of its own, it cannot have any attributes and no children.
from dompa.nodes import TextNode
TextNode(value="Hello, World!")
Would render:
Hello, World!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dompa-0.5.3.tar.gz.
File metadata
- Download URL: dompa-0.5.3.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c02833fe5feea28a911ae7daecc3f99b80d3ee7995748acff8237514f56d306a
|
|
| MD5 |
7f9ed767241e7b402097cd30f5316546
|
|
| BLAKE2b-256 |
9941b4a0d9840fa46f41fd14e99c8d84ce13b1a78b94e4b4dd878745ebed41ae
|
File details
Details for the file dompa-0.5.3-py3-none-any.whl.
File metadata
- Download URL: dompa-0.5.3-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ce2572c188144f3b5c4c6cf4b3a497da7539688a3ac0047f73d9a4bf3069edd
|
|
| MD5 |
ab2e6851c9ae1bb84b1773243959aae0
|
|
| BLAKE2b-256 |
56d8839644f6b23437567be80ce2c1a4c95979fddf70c5edd73ca7c679f448cb
|