compare xml or json files with svg/html report
This library was designed to compare two *.xml or *.json files. The main focus was to produce a intuitive interpretable *.svg output file. In future there will be a text export available, too.
diffx was inspired by X-Diff.
Since version 0.3.2 the distance cost's algorithm is replaced by parent-identification. This might by a wrong decision but the result's for huge xml documents (see. test 9) improved in performance and quality.
This is not a bullet prove library (till now). It s more a playground to get in touch with comparing tree structures and presenting the resulting in a charming way.
python pip diffx
xml file example
from diffx import main _xml1 = './simple/xml1.xml' _xml2 = './simple/xml2.xml' main.compare_xml(_xml1, _xml2) main.save('./simple/diffx_file.svg')
xml string example
from diffx import main _xml1 = """<root><deleted>with content</deleted><unchanged/><changed name="test1" /></root>""" _xml2 = """<root><unchanged/><changed name="test2" /><added/></root>""" main.compare_xml(_xml1, _xml2) main.save('./simple/diffx_string.svg')
json file example
from diffx import main json1 = './simple/xml1.json' json2 = './simple/xml2.json' main.compare_json(json1, json2) main.save('./simple/diffx_file.svg')
Each xml element is identified by it's xpath and a hash calculated by selecting relevant information. Start with the identification of huge xml blocks (changed/moved). Identification of parent elements by tag, text-pre, text-post, attribute-names and attribute-values. Parent xml blocks can contain further parent xml blocks.
<tag attribute-name:"attribute-value" ...> text-pre <... children ...> text-post </tag>
- mark all xml elements as changed
- iterate over parent blocks, starting with maximum children to parent blocks with less children
- mark unchanged xml elements of current parent
- mark moved xml elements of current parent
- mark xml elements identified by tag name and attribute names of the current parent
- mark xml elements identified by attributes values and element text of the current parent
- mark xml elements identified by tag name of the current parent
- mark xml elements with xpath that do not exist in the other xml tree as added/deleted of the current parent
- Repeat 3. till all xml elements are identified
All xml elements that are still marked as changed have to be investigated
test1: delta_t=0.0699s xml_elements=63 test2: delta_t=0.0104s xml_elements=5 test3: delta_t=0.0154s xml_elements=10 test4: delta_t=0.0240s xml_elements=32 test5: delta_t=0.0258s xml_elements=34 test6: delta_t=0.0290s xml_elements=34 test7: delta_t=0.0124s xml_elements=8 test8: delta_t=0.1027s xml_elements=67 test9: delta_t=4.2290s xml_elements=6144 test11: delta_t=0.0298s xml_elements=34 test12: delta_t=0.0288s xml_elements=45 test13: delta_t=0.0442s xml_elements=75
Name Stmts Miss Cover ---------------------------------------------------- lib\diffx\__init__.py 34 5 85% lib\diffx\base.py 107 2 98% lib\diffx\differ.py 170 14 92% lib\diffx\hash.py 71 0 100% lib\diffx\svg\__init__.py 0 0 100% lib\diffx\svg\coloured_text.py 21 0 100% lib\diffx\svg\compact.py 340 34 90% lib\diffx\svg\render_text.py 76 2 97% lib\diffx\xpath.py 54 3 94% ---------------------------------------------------- TOTAL 873 60 93%
- performance analysis and improvements (different hash algorithms, ...)
- if there are some users, improve interface
- investigation of merge interfaces
- README.md typo corrected
- README.md updated
- main.compare_json implemented
- README.md updated
- XmlXdiff moved diffx
- ui improved diffx.main added as entry point
- code refactored - pythonic, pep8
- text block introduced
- performance improved
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.