Skip to main content

Amara3 project, which offers a variety of data processing tools. This module adds the MicroXML support, and adaptation to classic XML.

Project description

Amara 3 XML

Python 3 tools for processing MicroXML, a simplification of XML. Amara 3 XML implements the MicroXML data model, and allows you to parse into this from tradiional XML and MicroXML.

The microx command line tool is especially useful for quick query and processing of XML.

Install

Requires Python 3.4+. Just run:

pip install amara3.xml

Use

Though Amara 3 is focused on MicroXML rather than full XML, the reality is that most of the XML-like data you’ll be dealing with is full XML 1.0. his package provides capabilities to parse legacy XML and reduce it to MicroXML. In many cases the biggest implication of this is that namespace information is stripped. As long as you know what you’re doing you can get pretty far by ignoring this, but make sure you know what you’re doing.

from amara3.uxml import xml

MONTY_XML = """<monty xmlns="urn:spam:ignored">
  <python spam="eggs">What do you mean "bleh"</python>
  <python ministry="abuse">But I was looking for argument</python>
</monty>"""

builder = xml.treebuilder()
root = builder.parse(MONTY_XML)
print(root.xml_name) #"monty"
child = next(root.xml_children)
print(child) #First text node: "

" child = next(root.xml_children) print(child.xml_value) #"What do you mean "bleh"" print(child.xml_attributes["spam"]) #"eggs"

There are some utilities to make this a bit easier as well.

from amara3.uxml import xml
from amara3.uxml.treeutil import *

MONTY_XML = """<monty xmlns="urn:spam:ignored">
  <python spam="eggs">What do you mean "bleh"</python>
  <python ministry="abuse">But I was looking for argument</python>
</monty>"""

builder = xml.treebuilder()
root = builder.parse(MONTY_XML)
py1 = next(select_name(root, "python"))
print(py1.xml_value) #"What do you mean "bleh""
py2 = next(select_attribute(root, "ministry", "abuse"))
print(py2.xml_value) #"But I was looking for argument"

Experimental MicroXML parser

For this parser the input truly must be MicroXML. Basics:

>>> from amara3.uxml.parser import parse
>>> events = parse('<hello><bold>world</bold></hello>')
>>> for ev in events: print(ev)
...
(<event.start_element: 1>, 'hello', {}, [])
(<event.start_element: 1>, 'bold', {}, ['hello'])
(<event.characters: 3>, 'world')
(<event.end_element: 2>, 'bold', ['hello'])
(<event.end_element: 2>, 'hello', [])
>>>

Or…And now for something completely different!…Incremental parsing.

>>> from amara3.uxml.parser import parsefrags
>>> events = parsefrags(['<hello', '><bold>world</bold></hello>'])
>>> for ev in events: print(ev)
...
(<event.start_element: 1>, 'hello', {}, [])
(<event.start_element: 1>, 'bold', {}, ['hello'])
(<event.characters: 3>, 'world')
(<event.end_element: 2>, 'bold

Implementation notes

Switched to a hand-crafted parser because:

  1. Worried about memory consumption of the needed PLY lexer
  2. Lack of incremental feed parse for PLY
  3. Inspiration from James Clark's JS parser https://github.com/jclark/microxml-js/blob/master/microxml.js

Author: Uche Ogbuji uche@ogbuji.net

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amara3.xml-3.3.0.tar.gz (49.4 kB view details)

Uploaded Source

File details

Details for the file amara3.xml-3.3.0.tar.gz.

File metadata

  • Download URL: amara3.xml-3.3.0.tar.gz
  • Upload date:
  • Size: 49.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for amara3.xml-3.3.0.tar.gz
Algorithm Hash digest
SHA256 0464035e4ef743d906b35000a418fa46196b6567ce09726721b5c3c20ec5c5d2
MD5 30536f1578c86e3d94c42726111ef454
BLAKE2b-256 9486ca7882c01f98ed4c629a2b1e3322892fbdd50f71ad0595e0c83c1ec8d291

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page