Makes working with XML feel like you are working with JSON
Project description
xmltodict
xmltodict
is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":
>>> print(json.dumps(xmltodict.parse(""" ... <mydocument has="an attribute"> ... <and> ... <many>elements</many> ... <many>more elements</many> ... </and> ... <plus a="complex"> ... element as well ... </plus> ... </mydocument> ... """), indent=4)) { "mydocument": { "@has": "an attribute", "and": { "many": [ "elements", "more elements" ] }, "plus": { "@a": "complex", "#text": "element as well" } } }
Namespace support
By default, xmltodict
does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True
will make it expand namespaces for you:
>>> xml = """ ... <root xmlns="http://defaultns.com/" ... xmlns:a="http://a.com/" ... xmlns:b="http://b.com/"> ... <x>1</x> ... <a:y>2</a:y> ... <b:z>3</b:z> ... </root> ... """ >>> xmltodict.parse(xml, process_namespaces=True) == { ... 'http://defaultns.com/:root': { ... 'http://defaultns.com/:x': '1', ... 'http://a.com/:y': '2', ... 'http://b.com/:z': '3', ... } ... } True
It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:
>>> namespaces = { ... 'http://defaultns.com/': None, # skip this namespace ... 'http://a.com/': 'ns_a', # collapse "http://a.com/" -> "ns_a" ... } >>> xmltodict.parse(xml, process_namespaces=True, namespaces=namespaces) == { ... 'root': { ... 'x': '1', ... 'ns_a:y': '2', ... 'http://b.com/:z': '3', ... }, ... } True
Streaming mode
xmltodict
is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia:
>>> def handle_artist(_, artist): ... print(artist['name']) ... return True >>> >>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'), ... item_depth=2, item_callback=handle_artist) A Perfect Circle Fantômas King Crimson Chris Potter ...
It can also be used from the command line to pipe objects to a script like this:
import sys, marshal while True: _, article = marshal.load(sys.stdin) print(article['title'])
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | myscript.py AccessibleComputing Anarchism AfghanistanHistory AfghanistanGeography AfghanistanPeople AfghanistanCommunications Autism ...
Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:
$ bunzip2 enwiki-pages-articles.xml.bz2 | xmltodict.py 2 | gzip > enwiki.dicts.gz
And you reuse the dicts with every script that needs them:
$ gunzip enwiki.dicts.gz | script1.py $ gunzip enwiki.dicts.gz | script2.py ...
Roundtripping
You can also convert in the other direction, using the unparse()
method:
>>> mydict = { ... 'response': { ... 'status': 'good', ... 'last_updated': '2014-02-16T23:10:12Z', ... } ... } >>> print(unparse(mydict, pretty=True)) <?xml version="1.0" encoding="utf-8"?> <response> <status>good</status> <last_updated>2014-02-16T23:10:12Z</last_updated> </response>
Text values for nodes can be specified with the cdata_key
key in the python dict, while node properties can be specified with the attr_prefix
prefixed to the key name in the python dict. The default value for attr_prefix
is @
and the default value for cdata_key
is #text
.
>>> import xmltodict >>> >>> mydict = { ... 'text': { ... '@color':'red', ... '@stroke':'2', ... '#text':'This is a test' ... } ... } >>> print(xmltodict.unparse(mydict, pretty=True)) <?xml version="1.0" encoding="utf-8"?> <text stroke="2" color="red">This is a test</text>
Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the expand_iter
keyword argument to provide a tag as demonstrated below. Note that using expand_iter
will break roundtripping.
>>> mydict = { ... "line": { ... "points": [ ... [1, 5], ... [2, 6], ... ] ... } ... } >>> print(xmltodict.unparse(mydict, pretty=True)) <?xml version="1.0" encoding="utf-8"?> <line> <points>[1, 5]</points> <points>[2, 6]</points> </line> >>> print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord")) <?xml version="1.0" encoding="utf-8"?> <line> <points> <coord>1</coord> <coord>5</coord> </points> <points> <coord>2</coord> <coord>6</coord> </points> </line>
Ok, how do I get it?
Using pypi
You just need to
$ pip install xmltodict
RPM-based distro (Fedora, RHEL, …)
There is an official Fedora package for xmltodict.
$ sudo yum install python-xmltodict
Arch Linux
There is an official Arch Linux package for xmltodict.
$ sudo pacman -S python-xmltodict
Debian-based distro (Debian, Ubuntu, …)
There is an official Debian package for xmltodict.
$ sudo apt install python-xmltodict
FreeBSD
There is an official FreeBSD port for xmltodict.
$ pkg install py36-xmltodict
openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)
There is an official openSUSE package for xmltodict.
# Python2 $ zypper in python2-xmltodict # Python3 $ zypper in python3-xmltodict
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for xmltodict-0.13.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa89e8fd76320154a40d19a0df04a4695fb9dc5ba977cbb68ab3e4eb225e7852 |
|
MD5 | 1e71055cc8b757877fe2469906d1cf45 |
|
BLAKE2-256 | 94dbfd0326e331726f07ff7f40675cd86aa804bfd2e5016c727fa761c934990e |