N-way structural & semantic XML diff that generates human-readable Markdown reports, driven by per-dialect recipes (Control-M, sitemaps, and more).
Project description
xmldiffreport
📖 Documentation: https://bilouro.github.io/xmldiffreport/ · English
📖 Documentation: https://bilouro.github.io/xmldiffreport/pt/ · Português
N-way structural & semantic XML diff that produces human-readable Markdown reports — driven by per-dialect recipes.
xmldiffreport compares two or more XML files at once — BMC Control-M
exports, Maven POMs, JUnit/xUnit reports, sitemaps, or any dialect you
teach it with a small recipe — and tells you what actually changed, element by
element and attribute by attribute, not a noisy line-by-line text diff. It aligns elements by a natural key (not by
position), ignores volatile attributes, and renders a clean Markdown
report with a summary table plus per-element detail.
It was born from a real problem — spotting differences between BMC Control-M
job patches flowing through test → uat → bench → prod — and generalized into a
recipe-driven engine that works on any XML dialect (Control-M exports,
sitemaps, POMs, manifests, …).
Status: early (0.1.0), but already useful. Feedback and recipes welcome.
Why not a normal diff / xmldiff?
A plain diff (or git diff) on XML lies, for three reasons:
- Volatile attributes —
VERSION,CREATION_TIME,JOBISN… change on every export with no functional meaning. - Reordering — children are often unordered; a reorder is not a change.
- Attribute order inside a tag is irrelevant.
Text/edit-script diffs (like the excellent xmldiff)
solve part of this but are 2-way, algorithm-matched (you can't say "match
<JOB> by JOBNAME"), and output an edit script rather than a review-friendly report.
| xmldiffreport | xmldiff | DiffDog / Oxygen | DeltaXML | |
|---|---|---|---|---|
| Match by declared natural key | ✅ | ❌ | ⚠️ limited | ✅ |
| N-way (3+ files at once) | ✅ | ❌ | ❌ | ❌ |
| Markdown report out of the box | ✅ | ❌ (edit script) | ⚠️ GUI | ❌ (delta XML) |
| Open source | ✅ | ✅ | ❌ | ❌ |
When to use which — choose xmldiffreport for N-way, key-aligned,
report-first comparison (e.g. "the same folder in uat, bench and prod"); reach
for xmldiff to produce a patch/edit script, DiffDog/Oxygen for interactive
2-way merging, DeltaXML for heuristic matching of keyless documents, and
git diff for raw line changes on already-normalized XML. Full breakdown:
How it compares.
Install
pip install xmldiffreport
Requires Python 3.11+ (uses the standard-library tomllib). No third-party dependencies.
Quickstart
Compare two XML files — that's the core idea:
xmldiffreport old.xml new.xml -o report.md
report.md lists every element that changed, one column per file. No options
needed — it uses the generic recipe by default. Pass as many files as you
like; the report just grows a column each:
xmldiffreport v1.xml v2.xml v3.xml -o report.md
Prefer an HTML page? Add -f html (or name the output *.html):
xmldiffreport old.xml new.xml -f html -o report.html
Exit code is 1 when a difference is found (handy for CI), 0 otherwise.
No files handy?
git clonethe repo and try the bundled, syntheticexamples/:xmldiffreport examples/sitemap/old/sitemap.xml examples/sitemap/new/sitemap.xml --recipe sitemap
Sharper results: recipes
The default compares any XML, but a recipe teaches the tool how to identify
elements in a specific dialect — matching "the same" element by a key (not by
position) and ignoring volatile attributes. Built-ins: controlm, maven-pom, junit, sitemap,
generic; or write your own.
xmldiffreport old.xml new.xml --recipe sitemap -o report.md
→ Writing recipes · generate one from your XML with an LLM.
Comparing many files (or whole directories)
Point it at directories too — they're scanned recursively for *.xml, and
every file found becomes a source:
xmldiffreport ./dump-a ./dump-b --recipe controlm -o report.md
Mental model: every file is a source (labelled by its path); a unit is the
recipe's unit element (e.g. a Control-M SMART_FOLDER); the engine compares
each unit across every source that contains it (2+). A unit that appears in
only one file is ignored. The tool has no notion of "environments" — if it
matters which file is production, name it so.
→ Full, worked guide with directory trees and a complete example: Inputs & file layout.
What the report looks like
For each unit (e.g. a Control-M SMART_FOLDER) present in 2+ sources with
differences (names below are from the synthetic examples/):
GLX_INGEST_DAILY(SMART_FOLDER)Sources:
bench/patch-a.xml,uat/patch-b.xml,prod/hotfix-c.xml**~ JOB
GLX_INGEST_LOAD**
Element · attribute bench/patch-a.xml uat/patch-b.xml prod/hotfix-c.xml CMDLINE… --force… --retry…%%P_DATE MAXRERUN0 5 3 INCOND GLX_INGEST_STAGE-…_OK·AND_ORA O A OUTCOND GLX_INGEST_LOAD-…_OK·SIGN- + + ON NOTOK|RERUN− present present
Notice: it's N-way (one column per file), it shows attribute-level
changes of the same element (the SIGN flip, the AND_OR change), it
collapses identical jobs into a count, and the volatile VERSION/CREATION_TIME
noise is gone.
Recipes
A recipe is a small TOML file that teaches the generic engine about one XML dialect: the natural key per element and which attributes to ignore.
name = "controlm"
[defaults]
unit = "SMART_FOLDER" # the unit of comparison
ignore_attrs = ["VERSION", "JOBISN", "CREATION_TIME", "LAST_UPLOAD", "..."]
[elements.JOB]
key = ["@JOBNAME"]
[elements.OUTCOND]
key = ["@NAME"] # SIGN / ODATE are compared as attributes
[elements.ON] # no clear key → synthesize from CODE + DO actions
key = ["@CODE", "*kinds"]
inline = true # treat children as pseudo-attributes
Key mini-language
A key is a list of tokens, joined by |:
| Token | Meaning |
|---|---|
@ATTR |
value of attribute ATTR |
#text |
the element's own text |
*tag |
the element's tag name (use for singletons compared by their text) |
child:TAG@ATTR |
attribute of a child element |
child:TAG#text |
text of a child element (e.g. sitemap <loc>) |
*kinds |
summary of child kinds / DOACTION actions (for keyless elements like <ON>) |
If no key is given, the engine falls back to @NAME, then #text, then a
composite of all attributes.
Built-in recipes
controlm— BMC Control-M exports (DEFTABLE → SMART_FOLDER → JOB → INCOND/OUTCOND/QUANTITATIVE/CONTROL/ON).maven-pom— Mavenpom.xml: dependency & plugin drift, keyed by coordinates (groupId:artifactId). Reports version/scope changes and added/removed entries across<dependencies>,<dependencyManagement>and<build>.junit— JUnit/xUnit reports (Surefire, Gradle, pytest, …): keyed byclassname+name. Surfaces pass↔fail↔skip transitions and added/removed tests, ignoringtime/timestamp/hostname.sitemap—sitemap.xml(identity by<loc>text; compares<lastmod>/<priority>/<changefreq>).generic— no dialect knowledge (default).
Drop a .toml anywhere and pass its path to --recipe to add your own dialect.
Generate & validate a recipe
Don't want to write one by hand? Let an LLM draft it from a sample of your XML:
xmldiffreport-recipe scaffold sample.xml > prompt.txt # paste prompt.txt into any LLM
xmldiffreport-recipe validate my-dialect.toml # check the result (ships a JSON Schema)
xmldiffreport-recipe show controlm # print a built-in recipe to learn from
See Generate a recipe with an LLM.
Project layout — tool vs. your usage
src/xmldiffreport/ the installable TOOL (engine, recipes, CLI) — generic, reusable
examples/ synthetic datasets + generator (no real data)
usage/ a config-driven HARNESS to run the tool on YOUR files
tests/ pytest suite
The tool in src/ knows nothing about your folders. The usage/ folder
is the thin layer you adapt: a config.toml listing the inputs (files/dirs), a
report_dir, and a collect.py that runs the diff and writes the report.
cp usage/config.example.toml usage/config.toml # then edit the paths
python usage/collect.py # writes usage/reports/<timestamp>.md
Your config.toml, reports, and any XML under usage/ are git-ignored — real
data and paths never get committed.
Library use
from xmldiffreport import diff
result = diff(["old.xml", "new.xml"], recipe="sitemap") # a file, files, or dir(s)
print(result.render()) # Markdown — or result.render("html")
for unit in result.units: # what differs
print(unit.ident, unit.sources)
if result: # truthy when anything differs (handy for exit codes)
...
Performance
Each file is parsed once into an in-memory tree (xml.etree.ElementTree); the
diff cost is roughly linear in the number of nodes. For typical Control-M exports
(a few MB) it's instant, and it's fine up to the order of tens of MB. It is
not designed for gigabyte-scale files — we deliberately favour simple,
maintainable code over incremental/streaming parsing.
Development
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
ruff check . && ruff format --check .
mypy src
pytest
See CONTRIBUTING.md. Examples and tests use synthetic data only — never real exports.
Roadmap
- Report top-level units that exist in only one source (added/removed units).
- JSON report format (Markdown and HTML already ship; formats are pluggable).
- Similarity-based matching fallback for keyless elements.
- More built-in recipes (Android manifest, RSS/Atom, .NET
web.config, …).
License
MIT © Victor H. Bilouro — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xmldiffreport-0.3.1.tar.gz.
File metadata
- Download URL: xmldiffreport-0.3.1.tar.gz
- Upload date:
- Size: 41.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae7c0463e9cbdaa0098bd6ef94d1e806c3ca8e448a352a26903ad9a662f24ef7
|
|
| MD5 |
d4c1cdceffc0fae9d85f7f59fcdaa039
|
|
| BLAKE2b-256 |
ea94a0654caca8eee29c85c6b11c81ea36352c27c86d8be1faa487a59e5880ba
|
Provenance
The following attestation bundles were made for xmldiffreport-0.3.1.tar.gz:
Publisher:
release.yml on bilouro/xmldiffreport
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xmldiffreport-0.3.1.tar.gz -
Subject digest:
ae7c0463e9cbdaa0098bd6ef94d1e806c3ca8e448a352a26903ad9a662f24ef7 - Sigstore transparency entry: 1726789237
- Sigstore integration time:
-
Permalink:
bilouro/xmldiffreport@ac11c55bcb023c1e805f681ee6ea89c42047bf4e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/bilouro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ac11c55bcb023c1e805f681ee6ea89c42047bf4e -
Trigger Event:
push
-
Statement type:
File details
Details for the file xmldiffreport-0.3.1-py3-none-any.whl.
File metadata
- Download URL: xmldiffreport-0.3.1-py3-none-any.whl
- Upload date:
- Size: 31.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7c5c86bf0365424527ea3e8c1d0cf6e20554ed692d6cccaabafd49a7bd2a00d
|
|
| MD5 |
5bfcaf277ed25bc1d2eadb3601fd00c3
|
|
| BLAKE2b-256 |
e21f5e2e5a5630a63081e44de4fd07fac4edb43aa9ab9f05ff6eb686807bf5c6
|
Provenance
The following attestation bundles were made for xmldiffreport-0.3.1-py3-none-any.whl:
Publisher:
release.yml on bilouro/xmldiffreport
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xmldiffreport-0.3.1-py3-none-any.whl -
Subject digest:
e7c5c86bf0365424527ea3e8c1d0cf6e20554ed692d6cccaabafd49a7bd2a00d - Sigstore transparency entry: 1726789454
- Sigstore integration time:
-
Permalink:
bilouro/xmldiffreport@ac11c55bcb023c1e805f681ee6ea89c42047bf4e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/bilouro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ac11c55bcb023c1e805f681ee6ea89c42047bf4e -
Trigger Event:
push
-
Statement type: