Skip to main content

A zero-dependency, extendable HTML5 parser.

Project description

Dompa

Coverage

A zero-dependency HTML5 document parser. It takes an input of an HTML string, parses it into a node tree, and provides an API for querying and manipulating said node tree.

Install

pip install dompa

Requires Python 3.10 or higher.

Usage

The most basic usage looks like this:

from dompa import Dompa
from dompa.actions import ToHtml

dom = Dompa("<div>Hello, World</div>")

# Get the tree of nodes
nodes = dom.get_nodes()

# Turn the node tree into HTML
html = dom.action(ToHtml)

DOM manipulation

You can run queries on the node tree to get or manipulate node(s).

query

You can find nodes with the query method which takes a Callable that gets Node passed to it and that has to return a boolean true or false, like so:

from dompa import Dompa

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")
list_items = dom.query(lambda n: n.name == "li")

All nodes returned with query are deep copies, so mutating them has no effect on Dompa's state.

traverse

The traverse method is very similar to the query method, but instead of returning deep copies of data it returns a direct reference to data instead, meaning it is ideal for updating the node tree inside of Dompa. It takes a Callable that gets a Node passed to it, and has to return the updated node, like so:

from typing import Optional
from dompa import Dompa
from dompa.nodes import Node, TextNode

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")


def update_title(node: Node) -> Optional[Node]:
    if node.name == "h1":
        node.children = [TextNode(value="New Title")]

    return node


dom.traverse(update_title)

If you wish to remove a node then return None instead of the node. If you wish to replace a single node with multiple nodes, use FragmentNode.

Types of nodes

There are three types of nodes that you can use in Dompa to manipulate the node tree.

Node

The most common node is just Node. You should use this if you want the node to potentially have any children inside of it.

from dompa.nodes import Node

Node(name="name-goes-here", attributes={}, children=[])

Would render:

<name-goes-here></name-goes-here>

VoidNode

A void node (or Void Element according to the HTML standard) is self-closing, meaning you would not have any children in it.

from dompa.nodes import VoidNode

VoidNode(name="name-goes-here", attributes={})

Would render:

<name-goes-here>

You would use this to create things like img, input, br and so forth, but of course you can also create custom elements. Dompa does not enforce the use of any known names.

TextNode

A text node is just for rendering text. It has no tag of its own, it cannot have any attributes and no children.

from dompa.nodes import TextNode

TextNode(value="Hello, World!")

Would render:

Hello, World!

FragmentNode

A fragment node is a node whose children will replace itself. It is sort of a transient node in a sense that it doesn't really exist. You can use it to replace a single node with multiple nodes on the same level inside of the traverse method.

from dompa.nodes import TextNode, FragmentNode, Node

FragmentNode(children=[
    Node(name="h2", children=[TextNode(value="Hello, World!")]),
    Node(name="p", children=[TextNode(value="Some content ...")])
])

Would render:

<h2>Hello, World!</h2>
<p>Some content ...</p>

Serializers

Both Dompa and its nodes have support for serializers - a way to transform data to whatever you want.

Dompa Serializers

You can create a Dompa serializer by extending the abstract class dompa.Serializer with your serializer class, like for example:

from dompa import Serializer
from dompa.nodes import Node


class MySerializer(Serializer):
    def __init__(self, nodes: list[Node]):
        self.nodes = nodes

    def serialize(self):
        pass

Basically, a Dompa serializer gets the node tree, and has a serialize method that does transforms it into something.

ToHtml

Dompa comes with a built-in serializer to transform the node tree into a HTML string.

Example usage:

from dompa import Dompa
from dompa.serializers import ToHtml

template = Dompa("<h1>Hello World</h1>")
html = template.serialize(ToHtml)

Node Serializers

Node serializers are basically identical to Dompa serializers, except that they are in a different namespace and, when Dompa serializers work with a node tree, Node serializers work with a singular node (and its children, if it has any).

You can create a Node serializer by extending the abstract class dompa.nodes.Serializer with your serializer class, like for example:

from dompa.nodes import Node, Serializer


class MySerializer(Serializer):
    def __init__(self, node: Node):
        self.node = node

    def serialize(self):
        pass

A Node serializer is very much like a Dompa serializer. Unlike the Dompa serializer, which gets a node tree to work with, a Node serializer gets a singular node.

ToHtml

Dompa comes with a built-in serializer to transform a node into a HTML string.

Example use:

from dompa import Dompa
from dompa.nodes.serializers import ToHtml

template = Dompa("<h1>Hello World</h1>")
h1_node = template.query(lambda x: x.name == "h1")[0]
html = h1_node.serialize(ToHtml)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dompa-0.9.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dompa-0.9.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file dompa-0.9.0.tar.gz.

File metadata

  • Download URL: dompa-0.9.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.9.0.tar.gz
Algorithm Hash digest
SHA256 95c2bf63e360c65e402bc8dfe340f0a26b0dd099011a89f70a1c3af387e5223c
MD5 0e1a75226b578e4628814b2f98e6b3d5
BLAKE2b-256 b438ad2854ba2d54913daf35c93ac27e84852f2e2dbbb1ed2c2d23e4602465db

See more details on using hashes here.

File details

Details for the file dompa-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: dompa-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d33fe9021a1f9f833a01f756255790cf5bbb75bc9874e868b41fb589fdd21ae5
MD5 8725801e18cddec3982a45cd412fc728
BLAKE2b-256 8b71dd778341afa58e3a57b82f7de35564d833e34b1b130c9cffcc408de6d78b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page