Fuses two XML markups together
Project description
xmlfuse
Given two XML documents having the same text, fuses the markup together to create the output XML document.
Installation
pip install xmlfuse
Building and testing:
If you prefer to build from sources, follow these steps:
make venv
make
API
import lxml.etree as et
from xmlfuse.fuse import fuse
xml1 = et.fromstring('<span>Hello, <i>world!</i></span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')
xml = fuze(xml1, xml2)
assert et.tostring(xml) == b'<span><b>Hello</b>, <i>world!</i></span>'
Input documents must have exactly the same text
Error is raised if text differs. Whitespace does matter!
Example:
xml1 = et.fromstring('<span>Hello</span>')
xml2 = et.fromstring('<span>Good bye</span>')
xml = fuze(xml1, xml2)
# expect RuntimeError raised
Conflicting markup
Conflicting markup. Sometimes it is not possible to merge two markups, because tags intersect. In such a case one has a choice:
a. Raise an exception and let caller handle the problem b. Resolve by segmenting one of the markups
We treat first document as master, and second as slave. Master markup is never segmented. If there is a
conflict between master and slave markups (and if auto_segment flag is True), fuse() will segment slave to make markup consistent.
Example:
xml1 = et.fromstring('<span>Hel<i>lo, world!</i></span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')
xml = fuze(xml1, xml2)
assert et.tostring(xml) == b'<span><b>Hel<i>lo</i></b></i>, <i>world!</i></span>'
Set auto_segment flag to False to prevent segmentation. Error will be raised instead, if conflict detected.
Ambiguities
When master ans slave markups wrap the same text, there is a nesting ambuguity - which tag should be inner?
We resolve this by consistently trying to put slave markup inside the master. This behavior can be changed
by setting the flag prefer_slave_inner to false.
Example:
xml1 = et.fromstring('<span><i>Hello</i>, world!</span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')
xml = fuze(xml1, xml2, prefer_slave_inner=True)
assert et.tostring(xml) == b'<span><b><i>Hello</i></b>, world!</span>'
xml = fuze(xml1, xml2, prefer_slave_inner=False)
assert et.tostring(xml) == b'<span><i><b>Hello</b></i>, world!</span>'
Slave top-level tag is dropped
Note that top-level tag from slave is not merged. It is just dropped. If you want it to be merged into the output,
set strip_slave_top_tag=False.
fuse() signature
fuse(xml1, xml2, *, prefer_slave_inner=True, auto_segment=True, strip_slave_top_tag=True)
Where:
xml1is the master XML document (LXML Element object, see http://lxml.de)xml2is the slave XML documentprefer_slave_innercontrols ambigiuty resolutionauto_segmentallows slave smarkup segmentation in case of conflicting markupstrip_slave_top_tagallowsfuseto ignore top-level tag from the slave XML
Returns fused XML document
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xmlfuse-0.0.4-py3-none-any.whl.
File metadata
- Download URL: xmlfuse-0.0.4-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
554542f604b251b35b759b86ededb4a1139b81d51c0f3096de8bece218edaefb
|
|
| MD5 |
9805a8b64f027e2cd0868f174226cd2a
|
|
| BLAKE2b-256 |
b2df88d884463df9d2e4af0feb35519f37c70d3c76c85cecdae1d0b46a8fcbfc
|