Skip to main content

A configurable XML and HTML formatter.

Project description

Markuplift Logo

Markuplift

A configurable XML and HTML formatter for Python

CI PyPI version Python versions License Downloads

Markuplift provides flexible, configurable formatting of XML and HTML documents. Unlike basic pretty-printers, Markuplift gives you complete control over how your markup is formatted through user-defined predicates for block vs inline elements, whitespace handling, and custom text content formatters.

Key Features

  • Configurable element classification - Define block/inline elements using XPath expressions or Python predicates
  • Flexible whitespace control - Normalize, preserve, or strip whitespace on a per-element basis
  • External formatter integration - Pipe element text content through external tools (e.g., js-beautify, prettier)
  • Comprehensive format options - Control indentation, attribute wrapping, self-closing tags, and more
  • CLI and Python API - Use from command line or integrate into your Python applications

Quick Start

Installation

Install from PyPI using pip:

pip install markuplift

Or using uv (recommended for modern Python development):

uv add markuplift

For development installation with all dependencies:

git clone https://github.com/rob-smallshire/markuplift.git
cd markuplift
uv sync --all-extras

CLI Usage

# Basic formatting
markuplift format input.xml

# Format with custom block elements
markuplift format input.html --block "//div | //section | //article"

# Use external JavaScript formatter for script tags
markuplift format input.html --text-formatter "//script[@type='text/javascript']" "js-beautify"

# Format from stdin to stdout
cat messy.xml | markuplift format --output formatted.xml

Python API

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements

# Create formatter with HTML-aware defaults
formatter = Formatter(
    block_predicate_factory=html_block_elements(),
    inline_predicate_factory=html_inline_elements(),
    indent_size=2
)

# Format complex nested HTML (minified input)
messy_html = (
    '<ul><li>Getting Started<ul><li>Installation via <code>pip install markuplift</code>'
    '</li><li>Basic <em>configuration</em> and setup</li></ul></li><li>Advanced Features'
    '<ul><li>Custom <strong>predicates</strong> and XPath</li><li>External formatter <co'
    'de>integration</code></li></ul></li></ul>'
)
formatted = formatter.format_str(messy_html)
print(formatted)

Output:

<ul>
  <li>Getting Started
    <ul>
      <li>Installation via <code>pip install markuplift</code></li>
      <li>Basic <em>configuration</em> and setup</li>
    </ul>
  </li>
  <li>Advanced Features
    <ul>
      <li>Custom <strong>predicates</strong> and XPath</li>
      <li>External formatter <code>integration</code></li>
    </ul>
  </li>
</ul>

Real-World Example

Here's Markuplift formatting a complex article structure with mixed content:

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements

# Real-world messy HTML (imagine this came from a CMS or generator)
messy_html = (
    '<article><h1>Using Markuplift</h1><section><h2>Introduction</h2><p>Markuplift '
    'is a <em>powerful</em> formatter for <strong>XML and HTML</strong>.</p><p>Key '
    'features include:</p><ul><li>Configurable <code>block</code> and <code>inline<'
    '/code> elements</li><li>XPath-based element selection</li><li>Custom text form'
    'atters for <pre><code>code blocks</code></pre></li></ul></section></article>'
)

formatter = Formatter(
    block_predicate_factory=html_block_elements(),
    inline_predicate_factory=html_inline_elements(),
    indent_size=2
)

formatted = formatter.format_str(messy_html)
print(formatted)

Output:

<article>
  <h1>Using Markuplift</h1>
  <section>
    <h2>Introduction</h2>
    <p>Markuplift is a <em>powerful</em> formatter for <strong>XML and HTML</strong>.</p>
    <p>Key features include:</p>
    <ul>
      <li>Configurable <code>block</code> and <code>inline</code> elements</li>
      <li>XPath-based element selection</li>
      <li>Custom text formatters for
        <pre><code>code blocks</code></pre>
      </li>
    </ul>
  </section>
</article>

Advanced Example

Complex HTML form with custom formatting rules:

from markuplift import Formatter
from markuplift.predicates import html_block_elements, html_inline_elements

# HTML form structure (typical from form builders)
messy_form = (
    '<form><fieldset><legend>User Information</legend><div><label>Name: <input type="text" '
    'name="name" required="required"/></label></div><div><label>Email: <input type="email" '
    'name="email"/></label></div><div><label><input type="checkbox" name="subscribe"/> Subs'
    'cribe to <em>newsletter</em></label></div></fieldset><button type="submit">Submit <str'
    'ong>Form</strong></button></form>'
)

formatter = Formatter(
    block_predicate_factory=html_block_elements(),
    inline_predicate_factory=html_inline_elements(),
    indent_size=2
)

formatted = formatter.format_str(messy_form)
print(formatted)

Output:

<form>
  <fieldset>
    <legend>User Information</legend>
    <div>
      <label>Name: <input type="text" name="name" required="required" /></label>
    </div>
    <div>
      <label>Email: <input type="email" name="email" /></label>
    </div>
    <div>
      <label><input type="checkbox" name="subscribe" /> Subscribe to <em>newsletter</em></label>
    </div>
  </fieldset>
  <button type="submit">Submit <strong>Form</strong></button>
</form>

Documentation

Use Cases

Markuplift is perfect for:

  • Web development - Format HTML templates and components with consistent styling
  • Data processing - Clean up XML data feeds and configuration files
  • Documentation - Standardize markup in documentation systems
  • Code generation - Format dynamically generated XML/HTML with precise control
  • CI/CD pipelines - Ensure consistent markup formatting across your codebase
  • Diffing and version control - Improve readability of markup changes in version control systems

License

Markuplift is released under the MIT License.

Contributing

Contributions are welcome! Please see our Contributing Guide for details on:

  • Setting up the development environment
  • Running tests and linting
  • Submitting pull requests
  • Reporting issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markuplift-2.1.1-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file markuplift-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: markuplift-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for markuplift-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f54c2e4b8224fbdd92887be06610e0e5bcf5743cf0a16fb514adffa80d4b6d0
MD5 144b2a80de18b8a0f328bd4f82f4540b
BLAKE2b-256 2c8e6b91847be29db13f9aaff73d028f764dcf7d4b1c8f686b4ab80d0a74adc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page