Skip to main content

A HTML5 parser.

Project description

Dompa

Coverage

A work-in-progress HTML5 document parser. It takes an input of an HTML string, parses it into a node tree, and provides an API for querying and manipulating the node tree.

Install

pip install dompa

Requires Python 3.10 or higher.

Usage

The most basic usage looks like this:

from dompa import Dompa

dom = Dompa("<div>Hello, World</div>")

# Get the tree of nodes
nodes = dom.nodes()

# Get the HTML string
html = dom.html()

DOM manipulation

You can run queries on the node tree to get or manipulate node(s).

query

You can find nodes with the query method which takes a Callable that gets Node passed to it and that has to return a boolean true or false, like so:

from dompa import Dompa

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")
list_items = dom.query(lambda n: n.name == "li")

All nodes returned with query are deep copies, so mutating them has no effect on Dompa's state.

traverse

The traverse method is very similar to the query method, but instead of returning deep copies of data it returns a direct reference to data instead, meaning it is ideal for updating the node tree inside of Dompa. It takes a Callable that gets a Node passed to it, and has to return the updated node, like so:

from typing import Optional
from dompa import Dompa
from dompa.nodes import Node, TextNode

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")


def update_title(node: Node) -> Optional[Node]:
    if node.name == "h1":
        node.children = [TextNode(value="New Title")]

    return node


dom.traverse(update_title)

If you wish to remove a node then return None instead of the node.

Types of nodes

There are three types of nodes that you can use in Dompa to manipulate the node tree.

Node

The most common node is just Node. You should use this if you want the node to potentially have any children inside of it.

from dompa.nodes import Node

Node(name="name-goes-here", attributes={}, children=[])

Would render:

<name-goes-here></name-goes-here>

VoidNode

A void node (or Void Element according to the HTML standard) is self-closing, meaning you would not have any children in it.

from dompa.nodes import VoidNode

VoidNode(name="name-goes-here", attributes={})

Would render:

<name-goes-here>

You would use this to create things like img, input, br and so forth, but of course you can also create custom elements. Dompa does not enforce the use of any known names.

TextNode

A text node is just for rendering text. It has no tag of its own, it cannot have any attributes and no children.

from dompa.nodes import TextNode

TextNode(value="Hello, World!")

Would render:

Hello, World!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dompa-0.5.2.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dompa-0.5.2-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file dompa-0.5.2.tar.gz.

File metadata

  • Download URL: dompa-0.5.2.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.5.2.tar.gz
Algorithm Hash digest
SHA256 8a2f4dba0c26c2290d97d9c1c58f5aa1183d6dab9f13291743a44035a88f6147
MD5 4e60e8a2310fffcb44203765a5a08e7d
BLAKE2b-256 5e310ade19e108d75b247adb0e4e239bc67918eaf2101d8ca45d9c6acb31a059

See more details on using hashes here.

File details

Details for the file dompa-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: dompa-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6989ef8552aa09d8b87330f82e0121b695f17f77f3bcc6ae90ce045eef452e4d
MD5 16529350d9f36f047d3e9eb0f255969c
BLAKE2b-256 bbdea3368ed99a63d9f8528e83ff49200e19c5a3188f3c9c8c87368eb00b0d77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page