Skip to main content

A HTML5 parser.

Project description

Dompa

Coverage

A work-in-progress HTML5 document parser. It takes an input of an HTML string, parses it into a node tree, and provides an API for querying and manipulating the node tree.

Install

pip install dompa

Requires Python 3.10 or higher.

Usage

The most basic usage looks like this:

from dompa import Dompa

dom = Dompa("<div>Hello, World</div>")

# Get the tree of nodes
nodes = dom.nodes()

# Get the HTML string
html = dom.html()

DOM manipulation

You can run queries on the node tree to get or manipulate node(s).

query

You can find nodes with the query method which takes a Callable that gets Node passed to it and that has to return a boolean true or false, like so:

from dompa import Dompa

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")
list_items = dom.query(lambda n: n.name == "li")

All nodes returned with query are deep copies, so mutating them has no effect on Dompa's state.

traverse

The traverse method is very similar to the query method, but instead of returning deep copies of data it returns a direct reference to data instead, meaning it is ideal for updating the node tree inside of Dompa. It takes a Callable that gets a Node passed to it, and has to return the updated node, like so:

from typing import Optional
from dompa import Dompa
from dompa.nodes import Node, TextNode

dom = Dompa("<h1>Site Title</h1><ul><li>...</li><li>...</li></ul>")


def update_title(node: Node) -> Optional[Node]:
    if node.name == "h1":
        node.children = [TextNode(value="New Title")]

    return node


dom.traverse(update_title)

If you wish to remove a node then return None instead of the node.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dompa-0.5.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dompa-0.5.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file dompa-0.5.0.tar.gz.

File metadata

  • Download URL: dompa-0.5.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.5.0.tar.gz
Algorithm Hash digest
SHA256 326e61edf9fd4dd25e688be5c3390711d3b5c475f4a07d1c39f91c08f807ea9c
MD5 5b974d0d386230e3ae7e58abade73f76
BLAKE2b-256 29c2e5eb87e27225e8331af9699bb374c8bf5c8380c89b36ddb69acfce41efe1

See more details on using hashes here.

File details

Details for the file dompa-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: dompa-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for dompa-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad0fdaf0d98a0eeff6b13b83e3530d66e3c513ab9514e9eea910c7ba59df4ce1
MD5 4b6d5404edfd0ef0549aa07214181f60
BLAKE2b-256 8ff852c1de0a767ce4e0f28f23c9d514d28340ba1aab408d11082a923e370f81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page