Skip to main content

Easy to use parser for simple XML

Project description

Help module to parse a simple XML buffer and store it as a read-only (mostly)
dictionary-type object (MyXml). This dictionary can hold other dictionaries,
nodes-lists, or leaf nodes. Access to the nodes is by using attributes.

>>> xml = parse("<Foo><Bar>Val</Bar></Foo>")
>>> xml.Foo.Bar == "Val"
>>> xml.Foo.Bar

I don't like to use the built in Python DOM parsers for simple XML data, but
this module is good only for simple XML! No name-spaces, CDATA and other fancy
features are supported.

There are three factory functions, "parse", "parse_file" and "parse_object".

- parse takes an XML string and builds MyXml object from it.

- parse_file takes a file name reads it and do the same.

Both functions take an optional list of tags names from the beginning of the
XML data, to ignore.

- parse_object takes a complex python object (of dictionaries, sequences and
scalars) and creates MyXml object from it.

It is possible, but not convenient, to construct an XML trees using this module.

Usage Examples:

>>> xml = parse('''
... <?xml bla bla bla>
... <!-- Comment -->
... <Main>
... <Text>One Two &amp; Three</Text>
... <List>
... <!-- This is a list of items -->
... <Item aaa="bbb" ></Item>
... <Item ccc = "ab&#43;c" />
... <Item>Bla Bla Bla</Item>
... </List>
... <BoolNum num="3.5" bool="Yes">No</BoolNum>
... <Double><Double>Value</Double></Double>
... </Main>
... ''')

- An XML node is an attribute of the MyXml object

>>> xml.Main.Text
<Text>One Two &amp; Three</Text>

- And also

>>> xml.Main.Text == "One Two & Three"

>>> xml.Main.Text.value == "One Two &amp; Three"

There is also a way to access a node with "nd_" prefix (so we can access
python reserved words), this will also return EMPY_NODE if the node doesn't

>>> xml.nd_Main.nd_Text
<Text>One Two &amp; Three</Text>

- A node can be looked at as a list with one item

>>> xml.Main.Double.Double[0] is xml.Main.Double.Double

- Nodes Lists are regular lists
>>> len(xml.Main.List.Item)
>>> unicode(xml.Main.List.Item[2])
u'Bla Bla Bla'

- MyXml object is a dictionary

>>> xml["Main"]["Text"] == xml.Main["Text"]
>>> xml.Main.get("Text") == xml["Main"].Text

- There is also a very simple XPath-like method

>>> xml.xpath("Main/List/Item")[2]
<Item>Bla Bla Bla</Item>

- Attributes can be accessed with an "at_" prefix

>>> xml.Main.List.Item[1].at_ccc

- Access the attributes dictionary with "at_dict"

>>> xml.Main.List.Item[0].at_dict["aaa"]

- Every value can be looked at as a number and a boolean

>>> xml.Main.BoolNum.boolean

- Also attribute can be looked at as booleans or numbers

>>> xml.Main.BoolNum.at_num.number * 2
>>> xml.xpath("Main/BoolNum").at_bool.boolean

- But if the value is not a number or boolean (yes, no, true, false, 1, 0) the
- return value is None

>>> xml.Main.List.Item[0].at_aaa.number

- "get" and "xpath" return an empty node by default, so we can still use the
- number/boolean attributes.

>>> bool(xml.get("foo").boolean)

>>> xml.xpath("Main/foo").number is None

- Printing MyXml objects keeps the original order and adds indentation.
- The indentation is not thread safe though.

>>> print xml.Main.List
<Item aaa="bbb" />
<Item ccc="ab&#43;c" />
<Item>Bla Bla Bla</Item>

- Constructing MyXml object from a python complex object:

>>> xml = parse_object({
... "foo1": "bar",
... "foo2": ["bar1", "bar2", "bar3"],
... "foo3": {"bar": "foo"},
... "foo4": 5
... }, "Main") # "Main" is the name of the top most node

>>> xml.xpath("Main/foo4").number

- The names of the nodes that hold a sequence items, are the type name of the
- sequence (list, tuple, set, generator).

>>> xml.xpath("Main/foo2/list")[1] == "bar2"

- Finally - not very useful - but you can modify MyXml object

>>> add_returns_self = xml.add(MyNode("bar5", "foo5")) # MyNode(value, name)
>>> xml.foo5.at_dict["attr"] = "attr value"
>>> xml.xpath("Main/foo5").at_attr == "attr value"

One can also use the other built in dictionary and list methods, but this is not

>>> xml # Here the order is not preserved because of the python dictionary
<foo5 attr="attr value">bar5</foo5>

Please note that this module is not efficient in parsing large XML buffers. It
uses string slicing heavily.

Erez Bibi

Please send comments and questions to
erezbibi AT users DOT sourceforge DOT net

Project details

Release history Release notifications

This version
History Node


History Node


History Node


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
my_xml-0.1.2-py2.6.egg (18.6 kB) Copy SHA256 hash SHA256 Egg 2.6 Jan 17, 2011 (12.7 kB) Copy SHA256 hash SHA256 Source None Jan 17, 2011

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page