Skip to main content

A modern, zero-dependency drop-in replacement for html5lib

Project description

markuptree

A modern, zero-dependency drop-in replacement for html5lib.

Installation

pip install markuptree

Quickstart

import markuptree

# Parse an HTML document
doc = markuptree.parse("<html><body><p>Hello <b>world</b></p></body></html>")

# Parse a fragment
fragments = markuptree.parseFragment("<p>Hello</p><p>World</p>")

# Serialize a tree back to HTML
html = markuptree.serialize(doc, tree="etree")

# Use the HTMLParser class directly
parser = markuptree.HTMLParser(tree="etree")
doc = parser.parse("<p>Hello</p>")
print(parser.errors)           # list of parse errors
print(parser.documentEncoding) # detected encoding

Tree Builders

Two built-in backends for constructing parse trees:

Backend Description
"etree" Uses xml.etree.ElementTree (default)
"dom" Uses xml.dom.minidom
# Get a tree builder class
TB = markuptree.getTreeBuilder("etree")

Tree Walkers

Walk a parsed tree and yield serializer tokens:

Walker = markuptree.getTreeWalker("etree")
walker = Walker(doc)
for token in walker:
    print(token)

Serializer

from markuptree.serializer import HTMLSerializer

Walker = markuptree.getTreeWalker("etree")
s = HTMLSerializer(
    omit_optional_tags=True,
    quote_attr_values="always",
    minimize_boolean_attributes=True,
    alphabetical_attributes=True,
)
html = s.render(Walker(doc))

Serializer Options

Option Default Description
quote_attr_values "legacy" "legacy" (quote when needed) or "always"
quote_char '"' Quote character for attributes
use_best_quote_char True Pick ' or " to minimize escaping
omit_optional_tags True Omit optional start/end tags
minimize_boolean_attributes True disabled instead of disabled="disabled"
use_trailing_solidus False <br /> instead of <br>
space_before_trailing_solidus True Space before />
escape_lt_in_attrs False Escape < in attribute values
resolve_entities True Resolve character entities
alphabetical_attributes False Sort attributes alphabetically
inject_meta_charset True Inject <meta charset>
strip_whitespace False Collapse whitespace
sanitize False Strip unsafe elements/attributes

Filters

Token stream filters that sit between a tree walker and the serializer:

from markuptree.treewalkers.etree import TreeWalker
from markuptree.filters.sanitizer import Filter as SanitizerFilter
from markuptree.serializer import HTMLSerializer

walker = TreeWalker(doc)
safe = SanitizerFilter(walker)
html = HTMLSerializer().render(safe)
Filter Description
filters.base Passthrough base class
filters.alphabeticalattributes Sort attributes A-Z
filters.inject_meta_charset Inject/replace <meta charset>
filters.whitespace Collapse whitespace (preserves <pre>)
filters.optionaltags Omit optional end tags
filters.sanitizer Strip unsafe elements, attributes, URI schemes
filters.lint Emit warnings for void/non-void tag misuse

Migration from html5lib

# Change this:
import html5lib

# To this:
import markuptree as html5lib

Or use the compatibility shim:

from markuptree._compat import *

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markuptree-0.1.0.tar.gz (84.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markuptree-0.1.0-py3-none-any.whl (59.4 kB view details)

Uploaded Python 3

File details

Details for the file markuptree-0.1.0.tar.gz.

File metadata

  • Download URL: markuptree-0.1.0.tar.gz
  • Upload date:
  • Size: 84.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markuptree-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2fdf90cbc3d2801e61bb3e7329da4289b31a7efb83bf33343c5dadbf67fe8f5d
MD5 c2e11468e9345b9b54d8df2d03673519
BLAKE2b-256 3fe426be2d61bb9bec87cba330d49034f601731cf7d2a937e850be5acf697689

See more details on using hashes here.

Provenance

The following attestation bundles were made for markuptree-0.1.0.tar.gz:

Publisher: publish.yml on agentine/markuptree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markuptree-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: markuptree-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markuptree-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 092f3e651a0b1010a19f964dfefaef4c306aaba2c62308feb9a66a9e83fd966e
MD5 7a72f9d01837557a59c4c0115f13e8cc
BLAKE2b-256 c332a4ce128f60904924c893558de472591855779cc9d63d5d6baf5079b7d30b

See more details on using hashes here.

Provenance

The following attestation bundles were made for markuptree-0.1.0-py3-none-any.whl:

Publisher: publish.yml on agentine/markuptree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page