A configurable XML and HTML formatter.

These details have not been verified by PyPI

Project description

Markuplift

A configurable XML and HTML formatter for Python

Markuplift provides flexible, configurable formatting of XML and HTML documents. Unlike basic pretty-printers, Markuplift gives you complete control over how your markup is formatted through user-defined predicates for block vs inline elements, whitespace handling, and custom text content formatters.

Key Features

Configurable element classification - Define block/inline elements using XPath expressions or Python predicates
Flexible whitespace control - Normalize, preserve, or strip whitespace on a per-element basis
External formatter integration - Pipe element text content through external tools (e.g., js-beautify, prettier)
Comprehensive format options - Control indentation, attribute wrapping, self-closing tags, and more
CLI and Python API - Use from command line or integrate into your Python applications

Quick Start

Installation

Install from PyPI using pip:

pip install markuplift

Or using uv (recommended for modern Python development):

uv add markuplift

For development installation with all dependencies:

git clone https://github.com/rob-smallshire/markuplift.git
cd markuplift
uv sync --all-extras

CLI Usage

# Basic formatting
markuplift format input.xml

# Format with custom block elements
markuplift format input.html --block "//div | //section | //article"

# Use external JavaScript formatter for script tags
markuplift format input.html --text-formatter "//script[@type='text/javascript']" "js-beautify"

# Format from stdin to stdout
cat messy.xml | markuplift format --output formatted.xml

Python API

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements, tag_in

# Create formatter with whitespace handling
formatter = Formatter(
    block_when=html_block_elements(),
    inline_when=html_inline_elements(),
    preserve_whitespace_when=tag_in("pre", "code"),
    indent_size=2
)

# Format complex HTML with code examples (preserves whitespace in <code>)
messy_html = (
    '<div><h3>Documentation</h3><p>Here are some    spaced    examples:</p><ul><li>'
    'Installation: <code>   pip install markuplift   </code></li><li>Basic <em>conf'
    'iguration</em> and setup</li><li>Code example:<pre>    def format_xml():\n    '
    '    return "beautiful"\n    </pre></li></ul></div>'
)
formatted = formatter.format_str(messy_html)
print(formatted)

Output:

<div>
  <h3>Documentation</h3>
  <p>Here are some    spaced    examples:</p>
  <ul>
    <li>Installation: <code>   pip install markuplift   </code></li>
    <li>Basic <em>configuration</em> and setup</li>
    <li>Code example:
      <pre>    def format_xml():
        return "beautiful"
    </pre>
    </li>
  </ul>
</div>

Real-World Example

Here's Markuplift formatting a complex article structure with mixed content:

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements, tag_in, any_of

# Real-world messy HTML with code blocks and excessive whitespace
messy_html = (
    '<article><h1>Using   Markuplift</h1><section><h2>Code    Formatting</h2><p>He'
    're\'s how to    use   our   API   with   proper   spacing:</p><pre><code>from'
    ' markuplift import Formatter\nformatter = Formatter(\n    preserve_whitespace'
    '=True\n)</code></pre><p>The   <em>preserve_whitespace</em>   feature   keeps '
    '  code   formatting   intact   while   <strong>normalizing</strong>   text   '
    'content!</p></section></article>'
)

formatter = Formatter(
    block_when=html_block_elements(),
    inline_when=html_inline_elements(),
    preserve_whitespace_when=tag_in("pre", "code"),
    normalize_whitespace_when=any_of(tag_in("p", "li"), html_inline_elements()),
    indent_size=2
)

formatted = formatter.format_str(messy_html)
print(formatted)

Output:

<article>
  <h1>Using   Markuplift</h1>
  <section>
    <h2>Code    Formatting</h2>
    <p>Here's how to use our API with proper spacing:</p>
    <pre><code>from markuplift import Formatter formatter = Formatter( preserve_whitespace=True )</code></pre>
    <p>The <em>preserve_whitespace</em> feature keeps code formatting intact while <strong>normalizing</strong> text content!</p>
  </section>
</article>

Advanced Example

Technical documentation with comprehensive whitespace control:

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements, tag_in, any_of

# Technical documentation with code, forms, and mixed content
messy_html = (
    '<div><h2>API   Documentation</h2><p>Use this    form   to   test   the   API:'
    '</p><form><fieldset><legend>Configuration</legend><div><label>Code Sample: <t'
    'extarea name="code">    def example():\n        return "test"\n        # pres'
    'erve formatting</textarea></label></div><div><p>Inline   code   like   <code>'
    '   format()   </code>   works   perfectly!</p></div></fieldset></form><h3>Exp'
    'ected   Output:</h3><pre>{\n  "status": "formatted",\n  "whitespace": "preser'
    'ved"\n}</pre></div>'
)

formatter = Formatter(
    block_when=html_block_elements(),
    inline_when=html_inline_elements(),
    preserve_whitespace_when=tag_in("pre", "code", "textarea"),
    normalize_whitespace_when=any_of(
        tag_in("p", "li", "h1", "h2", "h3"), html_inline_elements()
    ),
    indent_size=2
)

formatted = formatter.format_str(messy_html)
print(formatted)

Output:

<div>
  <h2>API Documentation</h2>
  <p>Use this form to test the API:</p>
  <form>
    <fieldset>
      <legend>Configuration</legend>
      <div>
        <label>Code Sample: <textarea name="code">    def example():
        return "test"
        # preserve formatting</textarea></label>
      </div>
      <div>
        <p>Inline code like <code> format() </code> works perfectly!</p>
      </div>
    </fieldset>
  </form>
  <h3>Expected Output:</h3>
  <pre>{
  "status": "formatted",
  "whitespace": "preserved"
}</pre>
</div>

Custom Element Predicate Factories

Markuplift uses the following types for creating custom formatting rules. The core types are:

ElementPredicate: Callable[[etree._Element], bool] - A function that tests if an element matches criteria
ElementPredicateFactory: Callable[[etree._Element], ElementPredicate] - A function that creates optimized, document-specific predicates. The element here is the root of the document.

This architecture uses triple-nested functions to allow queries to be performed efficiently:

Outer function: Accepts configuration parameters and performs validation
Middle function: Accepts the document root and performs expensive preparation (queries, traversals)
Inner function: Accepts individual elements and performs fast lookups against pre-computed results

Example: Custom CSS Class Predicate

Here's how to create a custom predicate for elements with a specific CSS class:

from lxml import etree

from markuplift.predicates import PredicateError
from markuplift.types import ElementPredicateFactory, ElementPredicate

def has_css_class(class_name: str) -> ElementPredicateFactory:
    """Factory for predicate matching elements with a specific CSS class."""
    # Level 1: Configuration and validation
    if not class_name or not class_name.strip():
        raise PredicateError("CSS class name cannot be empty")
    if ' ' in class_name:
        raise PredicateError("CSS class name cannot contain spaces")

    clean_class = class_name.strip()

    def create_document_predicate(root: etree._Element) -> ElementPredicate:
        # Level 2: Document-specific preparation - find all matching elements once
        matching_elements = set()
        for element in root.iter():
            class_attr = element.get('class', '')
            if class_attr and clean_class in class_attr.split():
                matching_elements.add(element)

        def element_predicate(element: etree._Element) -> bool:
            # Level 3: Fast membership test
            return element in matching_elements
        return element_predicate
    return create_document_predicate

This is especially powerful for complex predicates where the middle level can do expensive operations like XPath queries, regex compilation, or tree traversals once per document, then the inner function just does fast lookups against pre-computed results.

Using Custom Predicates

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements, any_of

# Use custom predicate with built-in ones
formatter = Formatter(
    block_when=html_block_elements(),
    inline_when=html_inline_elements(),
    preserve_whitespace_when=has_css_class("code-block"),
    normalize_whitespace_when=any_of(has_css_class("prose"), html_inline_elements()),
    indent_size=2
)

# Format HTML with CSS classes
html = '<div class="container"><p class="prose">Text content</p><pre class="code-block">preserved code</pre></div>'
formatted = formatter.format_str(html)
print(formatted)

Output:

<div class="container">
  <p class="prose">Text content</p>
  <pre class="code-block">preserved code</pre>
</div>

Use Cases

Markuplift is perfect for:

Web development - Format HTML templates and components with consistent styling
Data processing - Clean up XML data feeds and configuration files
Documentation - Standardize markup in documentation systems
Code generation - Format dynamically generated XML/HTML with precise control
CI/CD pipelines - Ensure consistent markup formatting across your codebase
Diffing and version control - Improve readability of markup changes in version control systems

License

Markuplift is released under the MIT License.

Contributing

Contributions are welcome! Please see our Contributing Guide for details on:

Setting up the development environment
Running tests and linting
Submitting pull requests
Reporting issues

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

6.1.4

Oct 2, 2025

6.1.3

Oct 2, 2025

6.1.2

Oct 2, 2025

6.1.1

Oct 2, 2025

6.1.0

Oct 2, 2025

6.0.2

Oct 1, 2025

6.0.1

Oct 1, 2025

6.0.0

Oct 1, 2025

5.1.1

Oct 1, 2025

5.1.0

Oct 1, 2025

5.0.0

Oct 1, 2025

4.4.0

Sep 30, 2025

4.3.0

Sep 30, 2025

4.2.0

Sep 30, 2025

4.1.1

Sep 30, 2025

4.1.0

Sep 30, 2025

4.0.0

Sep 29, 2025

3.1.0

Sep 26, 2025

This version

3.0.2

Sep 25, 2025

3.0.1

Sep 25, 2025

3.0.0

Sep 25, 2025

2.1.1

Sep 25, 2025

2.1.0

Sep 25, 2025

2.0.1

Sep 24, 2025

2.0.0

Sep 24, 2025

1.0.0

Sep 12, 2025

0.1.0

Sep 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

markuplift-3.0.2-py3-none-any.whl (26.3 kB view details)

Uploaded Sep 25, 2025 Python 3

File details

Details for the file markuplift-3.0.2-py3-none-any.whl.

File metadata

Download URL: markuplift-3.0.2-py3-none-any.whl
Upload date: Sep 25, 2025
Size: 26.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for markuplift-3.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84a06a97e422a1c08b6c0e0d569a93825dd912622788405890766ea85a76ad6a`
MD5	`7d3f2a62d6d130053141c8ec15bcf5f0`
BLAKE2b-256	`2bfb9cb5bdf77408f713489173cd9e6f7333fbfdd2318c82c13cb45c436a8c1c`

See more details on using hashes here.

markuplift 3.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Markuplift

Key Features

Quick Start

Installation

CLI Usage

Python API

Real-World Example

Advanced Example

Custom Element Predicate Factories

Example: Custom CSS Class Predicate

Using Custom Predicates

Use Cases

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes