Rust-accelerated XML↔dict library — drop-in replacement for xmltodict
Project description
xmltodict-fast
xmltodict — now with a Rust acceleration layer. Drop-in replacement: same API, same behaviour, dramatically faster.
>>> import xmltodict, json
>>> print(json.dumps(xmltodict.parse("""
... <mydocument has="an attribute">
... <and>
... <many>elements</many>
... <many>more elements</many>
... </and>
... <plus a="complex">
... element as well
... </plus>
... </mydocument>
... """), indent=4))
{
"mydocument": {
"@has": "an attribute",
"and": {
"many": [
"elements",
"more elements"
]
},
"plus": {
"@a": "complex",
"#text": "element as well"
}
}
}
What changed from the original
The original xmltodict is a well-loved, zero-dependency library that converts XML to Python dicts and back. This fork keeps every public API unchanged and adds a Rust extension module (PyO3 + quick-xml) that replaces the hot paths:
| Path | Original | This fork |
|---|---|---|
parse() |
Python + expat | Rust (quick-xml) |
unparse() |
Python + XMLGenerator | Rust |
parse(item_depth=N, item_callback=...) |
Python (streaming) | Rust |
If the Rust extension cannot be loaded (e.g., unsupported platform, PyPy), the library transparently falls back to the original Python implementation — no code changes needed.
Benchmarks — Rust vs pure Python
Measured on Apple Silicon (M-series); best of 20 runs per fixture, fresh subprocess per measurement. Python 3.13.
parse() throughput
| Fixture | Pure Python | Rust | Speedup |
|---|---|---|---|
| small.xml (~1 KB) | 22 MB/s | 57 MB/s | 2.6× |
| medium.xml (~600 KB) | 42 MB/s | 96 MB/s | 2.3× |
| large.xml (~7 MB) | 27 MB/s | 78 MB/s | 2.8× |
| wide.xml (~800 KB, flat) | 26 MB/s | 67 MB/s | 2.5× |
| namespaced.xml (~300 KB) | 39 MB/s | 89 MB/s | 2.3× |
| deep.xml (~500 KB, 500 levels) | 495 MB/s | 194 MB/s | 0.4× |
Note on deep nesting: The Rust path is slower on pathologically deep XML (500+ nesting levels) due to per-element PyO3 overhead. Python's expat handles this case in C with lower per-element cost. Most real-world XML has moderate nesting depth where Rust wins comfortably.
unparse() throughput
| Fixture | Pure Python | Rust | Speedup |
|---|---|---|---|
| medium.xml (~600 KB) | 34 MB/s | 277 MB/s | 8.2× |
| large.xml (~7 MB) | 22 MB/s | 185 MB/s | 8.4× |
| wide.xml (~800 KB) | 18 MB/s | 110 MB/s | 6.1× |
When to use the Rust backend
The Rust extension is used automatically when available. It provides the best speedup for:
unparse()— always faster (6–8× speedup on all inputs)parse()with typical XML — 2.3–2.8× faster for documents with moderate nesting- Streaming with file objects — file-like inputs are read into memory and routed through Rust for ~2× faster throughput
The Rust path falls back to Python automatically for:
- Deeply nested XML (500+ levels) where expat's C-level SAX is faster
- Features not yet implemented in Rust:
process_namespaces,process_comments,postprocessor, callableforce_list/force_cdata, non-defaultdict_constructor - Generator inputs (processed incrementally by Python's SAX parser)
- PyPy or platforms without a pre-built wheel
To force the pure-Python path:
import xmltodict
xmltodict._RUST_AVAILABLE = False # must be set before any parse/unparse call
Installation
pip install xmltodict
The package ships pre-built wheels for Linux, macOS, and Windows (x86-64 and arm64). If no wheel matches your platform, pip falls back to building from source (requires a Rust toolchain: rustup).
Quick start
import xmltodict
# XML → dict
result = xmltodict.parse("<root><item id='1'>hello</item></root>")
# {'root': {'item': {'@id': '1', '#text': 'hello'}}}
# dict → XML
xml = xmltodict.unparse(result, pretty=True)
Streaming large files
Use item_depth and item_callback to process large XML files without building the full document tree in memory. Each item is emitted to the callback and discarded.
def handle_article(path, item):
print(item["title"])
return True # return falsy to stop early
with open("enwiki-pages-articles.xml", "rb") as f:
xmltodict.parse(f, item_depth=2, item_callback=handle_article)
item_callback receives:
path— list of(element_name, attributes_or_None)tuples from the root down to (but not including) the current item.item— the fully parsed dict for the current element.
Return False (or any falsy value) to stop parsing early. ParsingInterrupted is raised to signal the stop — catch it if needed:
from xmltodict import ParsingInterrupted
try:
xmltodict.parse(data, item_depth=2, item_callback=my_callback)
except ParsingInterrupted:
pass # stopped by callback returning False
Namespace support
By default, namespace declarations are treated as regular attributes. Pass process_namespaces=True to expand them:
xml = """
<root xmlns="http://defaultns.com/"
xmlns:a="http://a.com/">
<x>1</x>
<a:y>2</a:y>
</root>
"""
xmltodict.parse(xml, process_namespaces=True)
# {'http://defaultns.com/:root': {'http://defaultns.com/:x': '1',
# 'http://a.com/:y': '2'}}
Collapse or skip namespaces with the namespaces dict:
xmltodict.parse(xml, process_namespaces=True, namespaces={
"http://defaultns.com/": None, # skip — strip the namespace
"http://a.com/": "ns_a", # shorten to prefix
})
# {'root': {'x': '1', 'ns_a:y': '2'}}
Roundtripping
mydict = {
"response": {
"status": "good",
"last_updated": "2024-01-01T00:00:00Z",
}
}
print(xmltodict.unparse(mydict, pretty=True))
<?xml version="1.0" encoding="utf-8"?>
<response>
<status>good</status>
<last_updated>2024-01-01T00:00:00Z</last_updated>
</response>
Attributes and CDATA use configurable prefixes (attr_prefix='@', cdata_key='#text' by default):
xmltodict.unparse({"text": {"@color": "red", "#text": "hello"}}, pretty=True)
# <text color="red">hello</text>
API Reference
xmltodict.parse(xml_input, **kwargs)
| Parameter | Default | Description |
|---|---|---|
xml_input |
— | String, bytes, file-like object, or generator of strings |
encoding |
None |
Input encoding (auto-detected if None) |
process_namespaces |
False |
Expand XML namespace URIs |
namespace_separator |
':' |
Separator between namespace URI and local name |
disable_entities |
True |
Block entity expansion (security default — do not disable) |
process_comments |
False |
Include XML comments in output |
xml_attribs |
True |
Include element attributes |
attr_prefix |
'@' |
Prefix for attribute keys |
cdata_key |
'#text' |
Key for element text content |
force_cdata |
False |
Force text-as-CDATA for all, selected, or matched elements |
cdata_separator |
'' |
Join string for adjacent text chunks |
postprocessor |
None |
fn(path, key, value) → (key, value) applied to every item; returning None drops the item |
dict_constructor |
dict |
Dict class to use (e.g. OrderedDict) |
strip_whitespace |
True |
Trim whitespace in text nodes |
namespaces |
None |
Namespace URI → prefix mapping (requires process_namespaces=True) |
force_list |
None |
Force list wrapping for all, selected, or matched elements |
item_depth |
0 |
Element depth at which to call item_callback (0 = disabled) |
item_callback |
lambda *a: True |
Called with (path, item) for each element at item_depth |
comment_key |
'#comment' |
Key used for comments when process_comments=True |
xmltodict.unparse(input_dict, **kwargs)
| Parameter | Default | Description |
|---|---|---|
input_dict |
— | Dict to convert |
output |
None |
File-like object; returns string if None |
encoding |
'utf-8' |
Output encoding |
full_document |
True |
Prepend <?xml ...?> declaration |
short_empty_elements |
False |
Use <tag/> for empty elements |
attr_prefix |
'@' |
Attribute key prefix |
cdata_key |
'#text' |
Text content key |
pretty |
False |
Indent output |
indent |
'\t' |
Indent string (or integer number of spaces) |
newl |
'\n' |
Newline string |
expand_iter |
None |
Tag for items in nested lists (breaks roundtripping) |
Examples
force_cdata — selective CDATA wrapping
xml = "<a><b>data1</b><c>data2</c></a>"
# Only wrap specific elements
xmltodict.parse(xml, force_cdata=("b",))
# {'a': {'b': {'#text': 'data1'}, 'c': 'data2'}}
# All elements
xmltodict.parse(xml, force_cdata=True)
# {'a': {'b': {'#text': 'data1'}, 'c': {'#text': 'data2'}}}
# Callable
xmltodict.parse(xml, force_cdata=lambda path, key, val: key == "b")
# {'a': {'b': {'#text': 'data1'}, 'c': 'data2'}}
force_list — consistent list output
Useful when an element may appear once or multiple times and you always want a list:
xml = "<a><item>one</item></a>"
xmltodict.parse(xml, force_list=("item",))
# {'a': {'item': ['one']}} ← always a list, even for a single element
postprocessor — transform values on the fly
def int_postprocessor(path, key, value):
try:
return key, int(value)
except (ValueError, TypeError):
return key, value
xmltodict.parse("<root><count>42</count></root>", postprocessor=int_postprocessor)
# {'root': {'count': 42}}
Nested lists with expand_iter
mydict = {"line": {"points": [[1, 5], [2, 6]]}}
print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))
<?xml version="1.0" encoding="utf-8"?>
<line>
<points>
<coord>1</coord>
<coord>5</coord>
</points>
<points>
<coord>2</coord>
<coord>6</coord>
</points>
</line>
Security
disable_entities=True(default) blocks XML entity expansion (billion-laughs / XML-bomb attacks). Do not disable this._validate_nameguards element and attribute names duringunparse()to prevent tag-injection attacks._validate_commentrejects--inside XML comments (illegal per spec).
A CVE (CVE-2025-9375) was filed against xmltodict but is disputed. The root issue is in Python's xml.sax.saxutils.XMLGenerator, which does not validate element names. The same behaviour exists throughout the standard library. The disclosure timeline (10 days from first contact to publication) did not allow a maintainer response.
Compatibility notes
- Python 3.9+
- Falls back to pure Python automatically when the Rust extension is unavailable (PyPy, unsupported architectures, source installs without Rust)
- Full backwards compatibility with the original
xmltodictAPI xmltodict.py— the original single-file implementation — is preserved as the fallback
License
MIT. Copyright (C) 2012 Martin Blech and individual contributors. Rust acceleration layer Copyright (C) 2025 Andrei Voicu Tomuț.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xmltodict_fast-1.1.0.tar.gz.
File metadata
- Download URL: xmltodict_fast-1.1.0.tar.gz
- Upload date:
- Size: 69.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1925f9c8ba318e4477c7880e85351c569f6231c34843426476ba817069fed0f4
|
|
| MD5 |
215b1c6055307294ec11ecf20b58214c
|
|
| BLAKE2b-256 |
784b671784e7bb8d97dbb28269ab074f75cd9181a711346f420a69da994baf5e
|
File details
Details for the file xmltodict_fast-1.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 461.2 kB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c8376192d194f137656d6cd0fd2443b753d49d072d147aa18c047555426f733
|
|
| MD5 |
63fced32d61986f0fdb9c12310e43860
|
|
| BLAKE2b-256 |
f807cabf6afa9767be57a6ca547979f0b7cdda00f31417021e194a5bfb8ba15a
|
File details
Details for the file xmltodict_fast-1.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 465.8 kB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a1f305557c768d2f704545e1a848b565735d70a9bb6bcd500f8abeb660012dd
|
|
| MD5 |
ec8fd40f982a1606052dd5caf771c1bc
|
|
| BLAKE2b-256 |
906c824e2259a94419d1129d5713b06e14eaf42df900790e4c1eff33673fae94
|
File details
Details for the file xmltodict_fast-1.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 322.9 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94eacd14010574ad2ea11a734d0c756688469f98bf9e6bc132f22c11af1cb06b
|
|
| MD5 |
a410ad457abcb64928097ccf4bae8a1e
|
|
| BLAKE2b-256 |
add8e86b22ac0fff765490f4f68d302d3a146b13de61b4322636580f6b00d39d
|
File details
Details for the file xmltodict_fast-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 465.6 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47baf6b5309bd25c3d86d9804395957ceab1677e395f83880b5e310de660705d
|
|
| MD5 |
ae9f99904b203beed7b586763332fcf6
|
|
| BLAKE2b-256 |
a3aa35701a7c00a6b9df9032336d45a1502bf047e7026e3c2116ee9b9f5f7f13
|
File details
Details for the file xmltodict_fast-1.1.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 428.6 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0618b96daadcb07eecfdd2f6bfa79f397cc065957cf4d43c527a16701417080
|
|
| MD5 |
5f1985bb1d4437ebb8592092ed79768d
|
|
| BLAKE2b-256 |
f8042dbf2da040c627fe59ff66486c555241b37404b62be1236ce7b9ce54ff42
|
File details
Details for the file xmltodict_fast-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: xmltodict_fast-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 474.1 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a115d1400a2adc0b5aa46a58514710e5b54c0a572e5a22cdf931704c1d51bf9e
|
|
| MD5 |
234dad7c1371790563f56289d1ea244a
|
|
| BLAKE2b-256 |
84e9c96a89510655b702b9c83ce859993c3226cad74e32d43443dae732479f44
|