Compare xml files with svg output.
Project description
XmlXdiff
XmlXdiff was inspired by X-Diff.
Since version 0.3.2 the distance cost's algorithm is replaced by parent-identification. This might by a wrong decision but the result's for huge xml documents (see. test 9) improved in performance and quality.
This is not a bullet prove library (till now). It s more a playground to get in touch with comparing tree structures and presenting the resulting in a charming way.
dependencies
- PySide2
- svgwrite
- lxml
installation
python pip XmlXdiff
fist step
file example
from diffx import main
_xml1 = './simple/xml1.xml'
_xml2 = './simple/xml2.xml'
main.compare_xml(_xml1, _xml2)
main.save('./simple/diffx_file.svg')
string example
# file example
from diffx import main
_xml1 = './simple/xml1.xml'
_xml2 = './simple/xml2.xml'
main.compare_xml(_xml1, _xml2)
main.save('./simple/diffx_file.svg')
status quo
implementation
Each xml element is identified by it's xpath and a hash calculated by selecting relevant information. Start with the identification of huge xml blocks (changed/moved). Identification of parent elements by tag, text-pre, text-post, attribute-names and attribute-values. Parent xml blocks can contain further parent xml blocks.
<tag attribute-name:"attribute-value" ...>
text-pre
<... children ...>
text-post
</tag>
- mark all xml elements as changed
- iterate over parent blocks, starting with maximum children to parent blocks with less children
- mark unchanged xml elements of current parent
- mark moved xml elements of current parent
- mark xml elements identified by tag name and attribute names of the current parent
- mark xml elements identified by attributes values and element text of the current parent
- mark xml elements identified by tag name of the current parent
- mark xml elements with xpath that do not exist in the other xml tree as added/deleted of the current parent
- Repeat 3. till all xml elements are identified
All xml elements that are still marked as changed have to be investigated
performance
test1: delta_t=0.0699s xml_elements=63
test2: delta_t=0.0104s xml_elements=5
test3: delta_t=0.0154s xml_elements=10
test4: delta_t=0.0240s xml_elements=32
test5: delta_t=0.0258s xml_elements=34
test6: delta_t=0.0290s xml_elements=34
test7: delta_t=0.0124s xml_elements=8
test8: delta_t=0.1027s xml_elements=67
test9: delta_t=4.2290s xml_elements=6144
test11: delta_t=0.0298s xml_elements=34
test12: delta_t=0.0288s xml_elements=45
test13: delta_t=0.0442s xml_elements=75
coverage
Name Stmts Miss Cover
------------------------------------------------------------
lib\diffx\__init__.py 21 4 81%
lib\diffx\base.py 107 2 98%
lib\diffx\differ.py 170 19 89%
lib\diffx\hash.py 71 0 100%
lib\diffx\svg\__init__.py 0 0 100%
lib\diffx\svg\coloured_text.py 21 0 100%
lib\diffx\svg\coloured_without_text.py 12 5 58%
lib\diffx\svg\compact.py 340 34 90%
lib\diffx\svg\render_text.py 76 2 97%
lib\diffx\xpath.py 54 3 94%
------------------------------------------------------------
TOTAL 872 69 92%
open issues
- performance analysis and improvements (different hash algorithms, ...)
- if there are some users, improve interface
- investigation of merge interfaces
release notes
v1.0.0
- XmlXdiff renamed to diffx
- ui improved diffx.main added as entry point
- code refactored - pythonic, pep8
- text block introduced
- performance improved
v0.3.3:
- source code clean up
- diff text without spaces
- static code quality tools introduced
v0.3.2:
- implemented parent-identification without children context
- split segments replaced by parent-identification (no dependency to number of child's nor content of child's)
- color scheme changed
- coverage improved
v0.2.2:
- search areas are split into segments between unchanged xml nodes
- added/deleted/verified to be added
- overlapping search areas possible now (merge proposals)
documentation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file XmlXdiff-1.0.0-py3-none-any.whl.
File metadata
- Download URL: XmlXdiff-1.0.0-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3eaa07329b2470741a36d7221356550b7ba49232edafb13db7cd961c9ba426e
|
|
| MD5 |
baa17711f5ec004ee479869bd26c7312
|
|
| BLAKE2b-256 |
e5e2f86e5e97369a177e9587bde7cbc521940ecb47eb5012e05fcdfccb00bf7e
|