Skip to main content

Fuses two XML markups together

Project description

xmlfuse

Build Status PyPI version

Given two XML documents having the same text, fuses the markup together to create the output XML document.

Installation

pip install xmlfuse

Building and testing:

If you prefer to build from sources, follow these steps:

make venv
make

API

import lxml.etree as et
from xmlfuse.fuse import fuse

xml1 = et.fromstring('<span>Hello, <i>world!</i></span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')

xml = fuze(xml1, xml2)
assert et.tostring(xml) == b'<span><b>Hello</b>, <i>world!</i></span>'

Input documents must have exactly the same text

Error is raised if text differs. Whitespace does matter!

Example:

xml1 = et.fromstring('<span>Hello</span>')
xml2 = et.fromstring('<span>Good bye</span>')

xml = fuze(xml1, xml2)
# expect RuntimeError raised

Conflicting markup

Conflicting markup. Sometimes it is not possible to merge two markups, because tags intersect. In such a case one has a choice:

a. Raise an exception and let caller handle the problem b. Resolve by segmenting one of the markups

We treat first document as master, and second as slave. Master markup is never segmented. If there is a conflict between master and slave markups (and if auto_segment flag is True), fuse() will segment slave to make markup consistent.

Example:

xml1 = et.fromstring('<span>Hel<i>lo, world!</i></span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')

xml = fuze(xml1, xml2)
assert et.tostring(xml) == b'<span><b>Hel<i>lo</i></b></i>, <i>world!</i></span>'

Set auto_segment flag to False to prevent segmentation. Error will be raised instead, if conflict detected.

Ambiguities

When master ans slave markups wrap the same text, there is a nesting ambuguity - which tag should be inner?

We resolve this by consistently trying to put slave markup inside the master. This behavior can be changed by setting the flag prefer_slave_inner to false.

Example:

xml1 = et.fromstring('<span><i>Hello</i>, world!</span>')
xml2 = et.fromstring('<span><b>Hello</b>, world!</span>')

xml = fuze(xml1, xml2, prefer_slave_inner=True)
assert et.tostring(xml) == b'<span><b><i>Hello</i></b>, world!</span>'

xml = fuze(xml1, xml2, prefer_slave_inner=False)
assert et.tostring(xml) == b'<span><i><b>Hello</b></i>, world!</span>'

Slave top-level tag is dropped

Note that top-level tag from slave is not merged. It is just dropped. If you want it to be merged into the output, set strip_slave_top_tag=False.

fuse() signature

fuse(xml1, xml2, *, prefer_slave_inner=True, auto_segment=True, strip_slave_top_tag=True)

Where:

  • xml1 is the master XML document (LXML Element object, see http://lxml.de)
  • xml2 is the slave XML document
  • prefer_slave_inner controls ambigiuty resolution
  • auto_segment allows slave smarkup segmentation in case of conflicting markup
  • strip_slave_top_tag allows fuse to ignore top-level tag from the slave XML

Returns fused XML document

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xmlfuse-0.0.4-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file xmlfuse-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: xmlfuse-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.9

File hashes

Hashes for xmlfuse-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 554542f604b251b35b759b86ededb4a1139b81d51c0f3096de8bece218edaefb
MD5 9805a8b64f027e2cd0868f174226cd2a
BLAKE2b-256 b2df88d884463df9d2e4af0feb35519f37c70d3c76c85cecdae1d0b46a8fcbfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page