Skip to main content

Convert between XML trees and span representation

Project description

Spans and Trees

A small Python library for converting between XML trees and a span-based structure. This can be useful for extracting sections of text from XML documents and doing special things with some of the tags.

The two main functions are treeToSpans and spansToTrees for converting between an ElementTree element and text with a list of spans. Examples are shown below.

treeToSpans

First create a little example XML tree to convert.

import xml.etree.ElementTree as ET

xmlstring = "<doc><title>Important document</title><contents>Empty</contents></doc>"
root = ET.ElementTree(ET.fromstring(xmlstring)).getroot()

Then use the treeToSpans function to convert the XML document into the text content with spans.

from spans_and_trees import treeToSpans

text, spans = treeToSpans(root)

print(text)  # Important documentEmpty
print(spans) # [(0, 18, 'title', {}), (18, 5, 'contents', {})]

The format of the spans are a tuple of length 4. The element contents are:

  1. The start location of the span
  2. The length of the span
  3. The tag of the span
  4. A dictionary of the attributes of the span.

spansToTrees

Now we create a dummy document with a block of text and a span at particular offset.

from spans_and_trees import spansToTree

text = 'The quick brown fox jumped over the lazy dog'
spans = [ (10,5,'colour',{'dummy_attrib':'5'}) ] # The span starts at 10, has length of 5, is a 'colour' tag and has a dummy attribute.

root = spansToTree(text, spans)

print(type(root)) # <class 'xml.etree.ElementTree.Element'>

We can check the XML tree that has been created:

xmlstr = ET.tostring(root)

print(xmlstr) # b'<anon>The quick <colour dummy_attrib="5">brown</colour> fox jumped over the lazy dog</anon>'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spans_and_trees-0.1.4.tar.gz (5.4 kB view details)

Uploaded Source

File details

Details for the file spans_and_trees-0.1.4.tar.gz.

File metadata

  • Download URL: spans_and_trees-0.1.4.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for spans_and_trees-0.1.4.tar.gz
Algorithm Hash digest
SHA256 05d166ed86e0aec9103b1683b077c6d19e7ef4588d48fdfe45439084553d7742
MD5 b3ba8fadefa4478b235b517b4c806c56
BLAKE2b-256 52863d8836d7f15b5ffea477a88e13d1c8ad7541c490dcf67e2f7a07feefa014

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page