Pretty-print XML and HTML files in a light, YAML-like, readable format
Project description
Unxml
Simplify and "flatten" XML files into a YAML-like readable format.
This is a Rust clone of the original unxml F# tool.
See it in action → — a gallery of
real-world XML documents, schemas, stylesheets, and Schematron rules rendered
with unxml, with original-vs-rendered size comparisons.
Installation
Using uv (Easiest)
Install the published wheel from PyPI as a standalone tool:
uv tool install unxml-rs
This puts the unxml command on your PATH. To try it without installing anything:
uvx --from unxml-rs unxml <xml_file>
Pre-built Binaries (Recommended)
Download the latest release for your platform from the GitHub Releases page:
- Linux (x86_64):
unxml-linux-x86_64.tar.gz - Windows (x86_64):
unxml-windows-x86_64.zip - macOS (Intel):
unxml-macos-x86_64.tar.gz - macOS (Apple Silicon):
unxml-macos-arm64.tar.gz
Extract the archive and place the unxml binary in your PATH.
From Source
git clone https://github.com/yourusername/unxml-rs
cd unxml-rs
cargo install --path .
Using Cargo
cargo install unxml
Usage
unxml <xml_file>
By default files render as plain XML. Pass --auto to pick the processing mode
from each file's extension:
| Extension | Mode applied |
|---|---|
.xsl .xslt |
--xslt |
.sch |
--schematron |
.xsd |
--xsd |
An explicit mode flag (--xslt, --schematron, --xsd, --special) always
overrides autodetection.
Each mode rewrites its vocabulary into a terser pseudocode. The full set of transformations, with side-by-side samples, is documented per format:
- XSLT transformations —
xsl:*stylesheets - XSD transformations —
xs:*/xsd:*schemas - Schematron transformations —
.schrule schemas
Syntax-highlighted output (--bat)
unxml --bat some.xsd # implies --auto (detects --xsd), pipes through `bat -l unxml`
--bat renders the output through bat using
the bundled unxml grammar (see editor/) for paged, colourised display. If
bat is not installed it falls back to plain stdout.
Claude Code skill (--install-skills)
unxml --install-skills # writes ~/.claude/skills/unxml/SKILL.md
Installs a Claude Code skill for unxml. It
doesn't auto-activate; invoke it with /unxml.
Hiding noisy namespace prefixes (--hide-ns)
Vocabularies like UBL bury the signal under repeated prefixes (cbc:, cac:).
--hide-ns drops the named prefixes from element and attribute names — and
their xmlns: declarations — so the output reads as bare local names:
unxml --hide-ns cbc,cac invoice.xml # repeatable and comma-separated
Signal-carrying prefixes you don't list (e.g. ext:, bim:) are kept, so an
extension subtree still stands out.
The special value --hide-ns ALL hides every prefix, reducing all element
and attribute names to their bare local form. Useful when you don't know the
prefixes up front — e.g. fingerprinting or clustering documents of unknown
vocabularies with --paths:
unxml --paths --hide-ns ALL unknown.xml # prefix-free structural signature
Under --auto/--bat, unxml also sniffs the document type and hides a
sensible set automatically. Currently it recognises UBL instance documents
(an unprefixed root such as <Invoice> in a UBL namespace) and hides whichever
prefixes are bound to the Common Basic/Aggregate Components namespaces. A
stylesheet or schema that merely references UBL (e.g. an xsl:stylesheet
translating to UBL) is left untouched, since there the prefixes are real syntax.
Canonicalising for diffs (--canonical)
Two documents can mean the same thing yet differ byte-for-byte over things that
carry no meaning: namespace prefixes are arbitrary local aliases for a URI,
and sibling order is often incidental. --canonical removes both so the
rendered output of equivalent documents diffs cleanly:
- Prefixes are rebound to stable names. Recognised vocabularies keep their
conventional prefix (
xsl,xs,cac,ram, …); everything else becomesns1,ns2, … in sorted-URI order. A default namespace (xmlns="…") is rewritten to the same explicit prefix, so<a:Foo>and<Foo xmlns="…">for one URI collapse to the identical name. Allxmlns:*declarations are re-emitted, sorted, on the root. - Sibling elements are sorted by a recursive signature, so order-only differences vanish. Mixed content (prose) keeps document order.
diff <(unxml --canonical a.xml) <(unxml --canonical b.xml)
Two documents differing only in prefix spelling, default-vs-explicit namespace, and sibling order produce byte-identical output:
<a:Order xmlns:a="urn:shop:order" xmlns:c="urn:shop:cust">
<a:Line sku="X1"><a:Qty>2</a:Qty></a:Line>
<c:Customer id="42">Acme</c:Customer>
</a:Order>
ns2:Order(xmlns:ns1="urn:shop:cust", xmlns:ns2="urn:shop:order")
ns1:Customer(id="42") = Acme
ns2:Line(sku="X1")
ns2:Qty = 2
Sibling sorting applies only to plain XML. Element order is significant in
stylesheets and schemas (xsl:* control flow, xs:sequence, Schematron rule
order), so in a dialect/--special mode (--xslt, --xsd, --wsdl,
--schematron) --canonical normalises prefixes only and preserves document
order.
Listing document paths (--paths)
--paths dumps a compact structural summary instead of the full document: the
set of distinct element paths as an indented tree, each node shown once
(repeated siblings collapse) and annotated with the union of attribute names
ever seen at that path. A leading // legend explains the namespace prefixes
(recognised vocabularies on their conventional prefix are omitted as
self-explanatory):
unxml --paths invoice.xml
order(xmlns="urn:shop:order")
customer(id)
line(discount, sku)
qty(unit)
Prefixed namespaces (xmlns:ext) go into a leading // legend; the default
namespace (xmlns) is shown inline on the element that sets it, since several
nested redefinitions would collide under one (default) legend key.
It answers "what shapes exist in this document" and is handy for understanding
or comparing document shapes. It composes with --select (subtree under a
match), --hide-ns (shorter segments), and --canonical (the legend resolves
the generated ns1/ns2 names).
Two further knobs make --paths a fuzzable fingerprint for clustering files by
structure — coarsen the signature so documents of the same format collapse
together despite incidental differences:
--depth Nlimits the tree to N nesting levels (root = level 1), dropping deeper subtrees. Lower N → coarser.--no-attrsdrops ordinary attribute names from each node, keeping only namespaces. Incidental per-document attributes (schemaLocation,version, timestamps) stop fragmenting otherwise-identical formats.
Combined with --hide-ns ALL, --paths --depth 1 --no-attrs reduces each file
to a single root-element + namespace line — a format census signature: run it
over a directory and sort | uniq -c to see how many distinct formats are
present and how many files use each. Raise --depth to cluster by finer
structural variants instead.
Introduction
This command line application was developed for comparing XML files (e.g. database/application state dumps). It takes an XML file and converts it to a YAML-like syntax that is easier to read and compare.
Example
Take an excerpt of the standard UBL 2.1 invoice example:
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:ID>TOSL108</cbc:ID>
<cbc:IssueDate>2009-12-15</cbc:IssueDate>
<cbc:InvoiceTypeCode listID="UN/ECE 1001 Subset" listAgencyID="6">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode listID="ISO 4217 Alpha" listAgencyID="6">EUR</cbc:DocumentCurrencyCode>
<cac:AccountingSupplierParty>
<cac:Party>
<cac:PartyName>
<cbc:Name>Salescompany ltd.</cbc:Name>
</cac:PartyName>
<cac:PostalAddress>
<cbc:StreetName>Main street</cbc:StreetName>
<cbc:CityName>Big city</cbc:CityName>
<cbc:PostalZone>54321</cbc:PostalZone>
</cac:PostalAddress>
</cac:Party>
</cac:AccountingSupplierParty>
</Invoice>
unxml invoice.xml flattens it into:
Invoice(
xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2",
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2")
cbc:UBLVersionID = 2.1
cbc:ID = TOSL108
cbc:IssueDate = 2009-12-15
cbc:InvoiceTypeCode(listAgencyID="6", listID="UN/ECE 1001 Subset") = 380
cbc:DocumentCurrencyCode(listAgencyID="6", listID="ISO 4217 Alpha") = EUR
cac:AccountingSupplierParty
cac:Party
cac:PartyName
cbc:Name = Salescompany ltd.
cac:PostalAddress
cbc:StreetName = Main street
cbc:CityName = Big city
cbc:PostalZone = 54321
With --auto, unxml sniffs the UBL instance and hides the noisy cbc:/cac:
prefixes (along with their xmlns: declarations), leaving just the signal:
Invoice(xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2")
UBLVersionID = 2.1
ID = TOSL108
IssueDate = 2009-12-15
InvoiceTypeCode(listAgencyID="6", listID="UN/ECE 1001 Subset") = 380
DocumentCurrencyCode(listAgencyID="6", listID="ISO 4217 Alpha") = EUR
AccountingSupplierParty
Party
PartyName
Name = Salescompany ltd.
PostalAddress
StreetName = Main street
CityName = Big city
PostalZone = 54321
Mode example: XSLT
Beyond flattening, each mode rewrites its vocabulary into terser pseudocode. A small XSLT stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<table border="1">
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
renders with unxml --xslt as:
xsl:stylesheet(version="1.0", xmlns:xsl="http://www.w3.org/1999/XSL/Transform")
match /:
table(border="1")
foreach catalog/cd:
tr
td
<- title
td
<- artist
match, foreach and <- (for xsl:value-of) read like the control flow the
stylesheet actually expresses. See XSLT transformations for the
full vocabulary, and XSD / Schematron for
the other modes.
Key Features
- Attributes in Parentheses: Element attributes are displayed Pug-style as
element(attr="value") - Text Content with Equals: Element text content is shown as
ElementName = text content - Hierarchical Indentation: Nested elements are properly indented
- Clean Format: Easy to read and compare, great for diffing
- Inline mixed content: Prose interleaved with short inline elements stays on one readable line
Mixed content (prose with inline spans)
Document-style XML interleaves text with small inline elements — a paragraph
containing a <command> or a <link>. Flattening every run onto its own line
makes such prose hard to read, so unxml keeps it inline as one line of
verbatim XML:
<para>The <command>widget</command> daemon keeps its
<link href="recovery.html">recoverable</link> state in one database.</para>
renders as:
para = The <command>widget</command> daemon keeps its <link href="recovery.html">recoverable</link> state in one database.
An element flows inline when its whole subtree is inline-safe — text
interleaved with elements that are themselves inline-safe. A leaf with
significant (multi-line) text, such as <programlisting> or <screen>, is not
inline-safe, so its parent stays in the flattened block form and the listing
keeps its line breaks. Nested inline markup (e.g. <emphasis> wrapping a
<command>) collapses all the way up. This applies to the generic XML render;
the --xslt/--xsd/--wsdl/--schematron modes use their own formatting.
Technical Details
- Built with Rust for performance and safety
- Uses
quick-xmlfor fast XML parsing - Uses
clapfor command-line argument parsing - Proper error handling with
anyhow
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Creating Releases
The version lives in the git tag, not in Cargo.toml (which stays at the
0.0.0-dev placeholder; the release workflow injects the real version with
cargo set-version). Do not bump Cargo.toml or create tags by hand.
To cut a release, let gh create the tag:
gh release create vX.Y.Z --title "Release vX.Y.Z" --notes "…"
The pushed tag triggers the GitHub Actions workflow, which builds binaries and the PyPI wheel for all platforms and attaches them to the release.
The CI workflow runs on every push to ensure code quality with formatting checks, linting, and tests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unxml_rs-1.5.0-py3-none-win_amd64.whl.
File metadata
- Download URL: unxml_rs-1.5.0-py3-none-win_amd64.whl
- Upload date:
- Size: 859.5 kB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
054eaf8442765e7c8a398c6f550647589d9f88dbe61d3e1e38c31bc628c03209
|
|
| MD5 |
b59fb50e8ed6c01181b0dd53fddd2593
|
|
| BLAKE2b-256 |
dccd61a93a8103f0fea5ecad488399f4119853d34f109ab3b6d2d094c27766b2
|
Provenance
The following attestation bundles were made for unxml_rs-1.5.0-py3-none-win_amd64.whl:
Publisher:
release.yml on vivainio/unxml-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unxml_rs-1.5.0-py3-none-win_amd64.whl -
Subject digest:
054eaf8442765e7c8a398c6f550647589d9f88dbe61d3e1e38c31bc628c03209 - Sigstore transparency entry: 1916817602
- Sigstore integration time:
-
Permalink:
vivainio/unxml-rs@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/vivainio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Trigger Event:
push
-
Statement type:
File details
Details for the file unxml_rs-1.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: unxml_rs-1.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5651222b17b31ff66ae64a3f0412fb8692891f37e33bf2872555b71b03a041b
|
|
| MD5 |
a0d18428e20029a9db6d5073a3f2be15
|
|
| BLAKE2b-256 |
a8b13a6e7e410600f7208e9ca51a71643396e0e92e84b153c3424253d52cafc0
|
Provenance
The following attestation bundles were made for unxml_rs-1.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on vivainio/unxml-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unxml_rs-1.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
d5651222b17b31ff66ae64a3f0412fb8692891f37e33bf2872555b71b03a041b - Sigstore transparency entry: 1916817454
- Sigstore integration time:
-
Permalink:
vivainio/unxml-rs@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/vivainio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Trigger Event:
push
-
Statement type:
File details
Details for the file unxml_rs-1.5.0-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: unxml_rs-1.5.0-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 953.9 kB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
411075a146219ca5f208e93e839e730d8255ad376c5b08c619a31a38fd264dc6
|
|
| MD5 |
73f2ac348acab362e52efca3d09e58ae
|
|
| BLAKE2b-256 |
ce4c8783b9f970ffcd691727217673f1213fbda1c033ef20abc0ad85afec8416
|
Provenance
The following attestation bundles were made for unxml_rs-1.5.0-py3-none-macosx_11_0_arm64.whl:
Publisher:
release.yml on vivainio/unxml-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unxml_rs-1.5.0-py3-none-macosx_11_0_arm64.whl -
Subject digest:
411075a146219ca5f208e93e839e730d8255ad376c5b08c619a31a38fd264dc6 - Sigstore transparency entry: 1916817676
- Sigstore integration time:
-
Permalink:
vivainio/unxml-rs@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/vivainio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Trigger Event:
push
-
Statement type:
File details
Details for the file unxml_rs-1.5.0-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: unxml_rs-1.5.0-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 976.8 kB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d776b0bd2fff848dee505e7a2b160637fc381a86c0407a46a7220ff227c4eb2
|
|
| MD5 |
cc640e6db7cb0615791b7984ec1f2da0
|
|
| BLAKE2b-256 |
6ee8465b8753ad315846a7fa92bc492ae8f95ba9a8edb538f4a0f1291bd988ed
|
Provenance
The following attestation bundles were made for unxml_rs-1.5.0-py3-none-macosx_10_12_x86_64.whl:
Publisher:
release.yml on vivainio/unxml-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unxml_rs-1.5.0-py3-none-macosx_10_12_x86_64.whl -
Subject digest:
2d776b0bd2fff848dee505e7a2b160637fc381a86c0407a46a7220ff227c4eb2 - Sigstore transparency entry: 1916817222
- Sigstore integration time:
-
Permalink:
vivainio/unxml-rs@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/vivainio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@241a825068ba5a2c0114923fa2a7375ae5db5964 -
Trigger Event:
push
-
Statement type: