Skip to main content

library to compare HTML while ignoring non-functional differences

Project description

htmlcompare

A Python library to ensure two HTML documents are "equal". Currently the functionality is very limited but the idea is that the library should ignore differences automatically when these are not relevant for HTML semantics (e.g. <img style=""> is the same as <img>, style="color: black; font-weight: bold" is equal to style="font-weight:bold;color:black;").

Usage

import htmlcompare

diff = htmlcompare.compare_html('<div>', '<p>')
is_same = bool(diff)

To ease testing the library provides some helpers

from htmlcompare import assert_different_html, assert_same_html

assert_different_html('<br>', '<p>')
assert_same_html('<div />', '<div></div>')

Implemented Features

  • ignores whitespace between HTML tags
  • <div /> is treated like <div></div>
  • ordering of HTML attributes does not matter: <div class="…" style="…" /> is treated equal to <div style="…" class="…" />
  • HTML comments are ignored (yes, also conditional comments unfortunately)
  • ordering of CSS classes inside class attribute does not matter: <div class="foo bar" /> is the same as <div class="bar foo" />.
  • a style or class attribute with empty content (e.g. style="") is considered the same as an absent style/class attribute.
  • inline style declarations and <style> tags are parsed with an actual CSS parser: ordering, whitespace and trailing semicolons do not matter
  • 0px is considered equal to 0 in inline CSS.
  • conditional comments (<!--[if !mso]>...) are considered when checking for equality. Regular comments will be ignored by default.

Limitations / Plans

No validation of conditional comments. Not sure which library I can use here but at some point I'll likely need this as well.

JavaScript - for obvious reasons it will be impossible to implement perfect JS comparison but it might be possible to run some kind of "beautifier" to take care of insignificant stylistic changes. However I don't need this feature so this is unlikely to get implemented (unless contributed by someone else).

Custom hooks could help adapting the comparison to your specific needs. However I don't know which API would be best so this will wait until there are real-world use cases.

Better API: The current API is very minimal and implements just what I needed right now. I hope to improve the API once I use this project in more complex scenarios.

Other projects

xmldiff is a well established project to compare two XML documents. However it seems as if the code does not contain knowledge about specific HTML semantics (e.g. CSS, empty attributes, insignificant attribute order).

Misc

The code is licensed under the MIT license. It requires Python 3.9+.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmlcompare-0.4.1.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

htmlcompare-0.4.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file htmlcompare-0.4.1.tar.gz.

File metadata

  • Download URL: htmlcompare-0.4.1.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for htmlcompare-0.4.1.tar.gz
Algorithm Hash digest
SHA256 faa78519919e2b9036ae4574b401cc59667278c671827e1e2adb1d40d1a5d79a
MD5 32339a16f084cd39f66236201830e0d2
BLAKE2b-256 2f5a3746bbd342203df46c96f7409261acc97007cd80ec9e4c53ea120d925c05

See more details on using hashes here.

File details

Details for the file htmlcompare-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: htmlcompare-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for htmlcompare-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50e0d0f44a5f037145792cfb184c10cae7172e68ba9fcf5c76b0324ec5c1c634
MD5 2c8c83ee7e986141550e7f65e8de5ee1
BLAKE2b-256 536e8e6247c25df493a5708e00ad795b807ac68fe577e7d11baadfd30420b79a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page