Skip to main content

Easy to use parser for simple XML

Project description

Help module to parse a simple XML buffer and store it as a read-only (mostly)
dictionary-type object (MyXml). This dictionary can hold other dictionaries,
nodes-lists, or leaf nodes. Access to the nodes is by using attributes.

>>> xml = parse("<Foo><Bar>Val</Bar></Foo>")
>>> xml.Foo.Bar == "Val"
True
>>> xml.Foo.Bar
<Bar>Val</Bar>

I don't like to use the built in Python DOM parsers for simple XML data, but
this module is good only for simple XML! No name-spaces, CDATA and other fancy
features are supported.

There are three factory functions, "parse", "parse_file" and "parse_object".

- parse takes an XML string and builds MyXml object from it.

- parse_file takes a file name reads it and do the same.

Both functions take an optional list of tags names from the beginning of the
XML data, to ignore.

- parse_object takes a complex python object (of dictionaries, sequences and
scalars) and creates MyXml object from it.

It is possible, but not convenient, to construct an XML trees using this module.

Usage Examples:

>>> xml = parse('''
... <?xml bla bla bla>
... <!-- Comment -->
... <Main>
... <Text>One Two &amp; Three</Text>
... <List>
... <!-- This is a list of items -->
... <Item aaa="bbb" ></Item>
... <Item ccc = "ab&#43;c" />
... <Item>Bla Bla Bla</Item>
... </List>
... <BoolNum num="3.5" bool="Yes">No</BoolNum>
... <Double><Double>Value</Double></Double>
... </Main>
... ''')

- An XML node is an attribute of the MyXml object

>>> xml.Main.Text
<Text>One Two &amp; Three</Text>

- And also

>>> xml.Main.Text == "One Two & Three"
True

>>> xml.Main.Text.value == "One Two &amp; Three"
True

There is also a way to access a node with "nd_" prefix (so we can access
python reserved words), this will also return EMPY_NODE if the node doesn't
exists.

>>> xml.nd_Main.nd_Text
<Text>One Two &amp; Three</Text>

- A node can be looked at as a list with one item

>>> xml.Main.Double.Double[0] is xml.Main.Double.Double
True

- Nodes Lists are regular lists
>>> len(xml.Main.List.Item)
3
>>> unicode(xml.Main.List.Item[2])
u'Bla Bla Bla'

- MyXml object is a dictionary

>>> xml["Main"]["Text"] == xml.Main["Text"]
True
>>> xml.Main.get("Text") == xml["Main"].Text
True

- There is also a very simple XPath-like method

>>> xml.xpath("Main/List/Item")[2]
<Item>Bla Bla Bla</Item>

- Attributes can be accessed with an "at_" prefix

>>> xml.Main.List.Item[1].at_ccc
u'ab&#43;c'

- Access the attributes dictionary with "at_dict"

>>> xml.Main.List.Item[0].at_dict["aaa"]
u'bbb'

- Every value can be looked at as a number and a boolean

>>> xml.Main.BoolNum.boolean
False

- Also attribute can be looked at as booleans or numbers

>>> xml.Main.BoolNum.at_num.number * 2
7.0
>>> xml.xpath("Main/BoolNum").at_bool.boolean
True

- But if the value is not a number or boolean (yes, no, true, false, 1, 0) the
- return value is None

>>> xml.Main.List.Item[0].at_aaa.number

- "get" and "xpath" return an empty node by default, so we can still use the
- number/boolean attributes.

>>> bool(xml.get("foo").boolean)
False

>>> xml.xpath("Main/foo").number is None
True

- Printing MyXml objects keeps the original order and adds indentation.
- The indentation is not thread safe though.

>>> print xml.Main.List
<List>
<Item aaa="bbb" />
<Item ccc="ab&#43;c" />
<Item>Bla Bla Bla</Item>
</List>

- Constructing MyXml object from a python complex object:

>>> xml = parse_object({
... "foo1": "bar",
... "foo2": ["bar1", "bar2", "bar3"],
... "foo3": {"bar": "foo"},
... "foo4": 5
... }, "Main") # "Main" is the name of the top most node

>>> xml.xpath("Main/foo4").number
5

- The names of the nodes that hold a sequence items, are the type name of the
- sequence (list, tuple, set, generator).

>>> xml.xpath("Main/foo2/list")[1] == "bar2"
True

- Finally - not very useful - but you can modify MyXml object

>>> add_returns_self = xml.add(MyNode("bar5", "foo5")) # MyNode(value, name)
>>> xml.foo5.at_dict["attr"] = "attr value"
>>> xml.xpath("Main/foo5").at_attr == "attr value"
True

One can also use the other built in dictionary and list methods, but this is not
recommended

>>> xml # Here the order is not preserved because of the python dictionary
<Main>
<foo4>5</foo4>
<foo1>bar</foo1>
<foo2>
<list>bar1</list>
<list>bar2</list>
<list>bar3</list>
</foo2>
<foo3>
<bar>foo</bar>
</foo3>
<foo5 attr="attr value">bar5</foo5>
</Main>

Please note that this module is not efficient in parsing large XML buffers. It
uses string slicing heavily.

Erez Bibi

Please send comments and questions to
erezbibi AT users DOT sourceforge DOT net

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

my_xml-0.1.2.zip (12.7 kB view details)

Uploaded Source

Built Distribution

my_xml-0.1.2-py2.6.egg (18.6 kB view details)

Uploaded Egg

File details

Details for the file my_xml-0.1.2.zip.

File metadata

  • Download URL: my_xml-0.1.2.zip
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for my_xml-0.1.2.zip
Algorithm Hash digest
SHA256 929faa7798335a72daf0338678573ab2aaacc7fdf00d922e32b3562eb2bb68b9
MD5 6457ee5170b3b01d084f05aaa904ac6e
BLAKE2b-256 ed0559e46b3729162f0c67ec5cb25c535e434cfa119f4c11ebcd9437e93d6b17

See more details on using hashes here.

File details

Details for the file my_xml-0.1.2-py2.6.egg.

File metadata

  • Download URL: my_xml-0.1.2-py2.6.egg
  • Upload date:
  • Size: 18.6 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for my_xml-0.1.2-py2.6.egg
Algorithm Hash digest
SHA256 b4a39222f978d7edfd56f8afcd271c45fb4868c4c99e4fc90ed24beef0f10e0a
MD5 31be80e597b936509ea8ec818e5978a2
BLAKE2b-256 da4aaf8f314dce12643ebf2951fbf50245cc2590bc23be09d1035b6c417f9fad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page