Skip to main content

Rust-accelerated XML↔dict library — drop-in replacement for xmltodict

Project description

xmltodict-fast

xmltodict — now with a Rust acceleration layer. Drop-in replacement: same API, same behaviour, dramatically faster.

Tests

>>> import xmltodict, json
>>> print(json.dumps(xmltodict.parse("""
...  <mydocument has="an attribute">
...    <and>
...      <many>elements</many>
...      <many>more elements</many>
...    </and>
...    <plus a="complex">
...      element as well
...    </plus>
...  </mydocument>
...  """), indent=4))
{
    "mydocument": {
        "@has": "an attribute",
        "and": {
            "many": [
                "elements",
                "more elements"
            ]
        },
        "plus": {
            "@a": "complex",
            "#text": "element as well"
        }
    }
}

What changed from the original

The original xmltodict is a well-loved, zero-dependency library that converts XML to Python dicts and back. This fork keeps every public API unchanged and adds a Rust extension module (PyO3 + quick-xml) that replaces the hot paths:

Path Original This fork
parse() Python + expat Rust (quick-xml)
unparse() Python + XMLGenerator Rust
parse(item_depth=N, item_callback=...) Python (streaming) Rust

If the Rust extension cannot be loaded (e.g., unsupported platform, PyPy), the library transparently falls back to the original Python implementation — no code changes needed.


Benchmarks — Rust vs pure Python

Measured on Apple Silicon (M-series); best of 20 runs per fixture, fresh subprocess per measurement. Python 3.13.

parse() throughput

Fixture Pure Python Rust Speedup
small.xml (~1 KB) 22 MB/s 57 MB/s 2.6×
medium.xml (~600 KB) 42 MB/s 96 MB/s 2.3×
large.xml (~7 MB) 27 MB/s 78 MB/s 2.8×
wide.xml (~800 KB, flat) 26 MB/s 67 MB/s 2.5×
namespaced.xml (~300 KB) 39 MB/s 89 MB/s 2.3×
deep.xml (~500 KB, 500 levels) 495 MB/s 194 MB/s 0.4×

Note on deep nesting: The Rust path is slower on pathologically deep XML (500+ nesting levels) due to per-element PyO3 overhead. Python's expat handles this case in C with lower per-element cost. Most real-world XML has moderate nesting depth where Rust wins comfortably.

unparse() throughput

Fixture Pure Python Rust Speedup
medium.xml (~600 KB) 34 MB/s 277 MB/s 8.2×
large.xml (~7 MB) 22 MB/s 185 MB/s 8.4×
wide.xml (~800 KB) 18 MB/s 110 MB/s 6.1×

When to use the Rust backend

The Rust extension is used automatically when available. It provides the best speedup for:

  • unparse() — always faster (6–8× speedup on all inputs)
  • parse() with typical XML — 2.3–2.8× faster for documents with moderate nesting
  • Streaming with file objects — file-like inputs are read into memory and routed through Rust for ~2× faster throughput

The Rust path falls back to Python automatically for:

  • Deeply nested XML (500+ levels) where expat's C-level SAX is faster
  • Features not yet implemented in Rust: process_namespaces, process_comments, postprocessor, callable force_list/force_cdata, non-default dict_constructor
  • Generator inputs (processed incrementally by Python's SAX parser)
  • PyPy or platforms without a pre-built wheel

To force the pure-Python path:

import xmltodict
xmltodict._RUST_AVAILABLE = False   # must be set before any parse/unparse call

Installation

pip install xmltodict

The package ships pre-built wheels for Linux, macOS, and Windows (x86-64 and arm64). If no wheel matches your platform, pip falls back to building from source (requires a Rust toolchain: rustup).


Quick start

import xmltodict

# XML → dict
result = xmltodict.parse("<root><item id='1'>hello</item></root>")
# {'root': {'item': {'@id': '1', '#text': 'hello'}}}

# dict → XML
xml = xmltodict.unparse(result, pretty=True)

Streaming large files

Use item_depth and item_callback to process large XML files without building the full document tree in memory. Each item is emitted to the callback and discarded.

def handle_article(path, item):
    print(item["title"])
    return True  # return falsy to stop early

with open("enwiki-pages-articles.xml", "rb") as f:
    xmltodict.parse(f, item_depth=2, item_callback=handle_article)

item_callback receives:

  • path — list of (element_name, attributes_or_None) tuples from the root down to (but not including) the current item.
  • item — the fully parsed dict for the current element.

Return False (or any falsy value) to stop parsing early. ParsingInterrupted is raised to signal the stop — catch it if needed:

from xmltodict import ParsingInterrupted

try:
    xmltodict.parse(data, item_depth=2, item_callback=my_callback)
except ParsingInterrupted:
    pass  # stopped by callback returning False

Namespace support

By default, namespace declarations are treated as regular attributes. Pass process_namespaces=True to expand them:

xml = """
<root xmlns="http://defaultns.com/"
      xmlns:a="http://a.com/">
  <x>1</x>
  <a:y>2</a:y>
</root>
"""

xmltodict.parse(xml, process_namespaces=True)
# {'http://defaultns.com/:root': {'http://defaultns.com/:x': '1',
#                                  'http://a.com/:y': '2'}}

Collapse or skip namespaces with the namespaces dict:

xmltodict.parse(xml, process_namespaces=True, namespaces={
    "http://defaultns.com/": None,   # skip — strip the namespace
    "http://a.com/": "ns_a",         # shorten to prefix
})
# {'root': {'x': '1', 'ns_a:y': '2'}}

Roundtripping

mydict = {
    "response": {
        "status": "good",
        "last_updated": "2024-01-01T00:00:00Z",
    }
}
print(xmltodict.unparse(mydict, pretty=True))
<?xml version="1.0" encoding="utf-8"?>
<response>
	<status>good</status>
	<last_updated>2024-01-01T00:00:00Z</last_updated>
</response>

Attributes and CDATA use configurable prefixes (attr_prefix='@', cdata_key='#text' by default):

xmltodict.unparse({"text": {"@color": "red", "#text": "hello"}}, pretty=True)
# <text color="red">hello</text>

API Reference

xmltodict.parse(xml_input, **kwargs)

Parameter Default Description
xml_input String, bytes, file-like object, or generator of strings
encoding None Input encoding (auto-detected if None)
process_namespaces False Expand XML namespace URIs
namespace_separator ':' Separator between namespace URI and local name
disable_entities True Block entity expansion (security default — do not disable)
process_comments False Include XML comments in output
xml_attribs True Include element attributes
attr_prefix '@' Prefix for attribute keys
cdata_key '#text' Key for element text content
force_cdata False Force text-as-CDATA for all, selected, or matched elements
cdata_separator '' Join string for adjacent text chunks
postprocessor None fn(path, key, value) → (key, value) applied to every item; returning None drops the item
dict_constructor dict Dict class to use (e.g. OrderedDict)
strip_whitespace True Trim whitespace in text nodes
namespaces None Namespace URI → prefix mapping (requires process_namespaces=True)
force_list None Force list wrapping for all, selected, or matched elements
item_depth 0 Element depth at which to call item_callback (0 = disabled)
item_callback lambda *a: True Called with (path, item) for each element at item_depth
comment_key '#comment' Key used for comments when process_comments=True

xmltodict.unparse(input_dict, **kwargs)

Parameter Default Description
input_dict Dict to convert
output None File-like object; returns string if None
encoding 'utf-8' Output encoding
full_document True Prepend <?xml ...?> declaration
short_empty_elements False Use <tag/> for empty elements
attr_prefix '@' Attribute key prefix
cdata_key '#text' Text content key
pretty False Indent output
indent '\t' Indent string (or integer number of spaces)
newl '\n' Newline string
expand_iter None Tag for items in nested lists (breaks roundtripping)

Examples

force_cdata — selective CDATA wrapping

xml = "<a><b>data1</b><c>data2</c></a>"

# Only wrap specific elements
xmltodict.parse(xml, force_cdata=("b",))
# {'a': {'b': {'#text': 'data1'}, 'c': 'data2'}}

# All elements
xmltodict.parse(xml, force_cdata=True)
# {'a': {'b': {'#text': 'data1'}, 'c': {'#text': 'data2'}}}

# Callable
xmltodict.parse(xml, force_cdata=lambda path, key, val: key == "b")
# {'a': {'b': {'#text': 'data1'}, 'c': 'data2'}}

force_list — consistent list output

Useful when an element may appear once or multiple times and you always want a list:

xml = "<a><item>one</item></a>"
xmltodict.parse(xml, force_list=("item",))
# {'a': {'item': ['one']}}   ← always a list, even for a single element

postprocessor — transform values on the fly

def int_postprocessor(path, key, value):
    try:
        return key, int(value)
    except (ValueError, TypeError):
        return key, value

xmltodict.parse("<root><count>42</count></root>", postprocessor=int_postprocessor)
# {'root': {'count': 42}}

Nested lists with expand_iter

mydict = {"line": {"points": [[1, 5], [2, 6]]}}
print(xmltodict.unparse(mydict, pretty=True, expand_iter="coord"))
<?xml version="1.0" encoding="utf-8"?>
<line>
	<points>
		<coord>1</coord>
		<coord>5</coord>
	</points>
	<points>
		<coord>2</coord>
		<coord>6</coord>
	</points>
</line>

Security

  • disable_entities=True (default) blocks XML entity expansion (billion-laughs / XML-bomb attacks). Do not disable this.
  • _validate_name guards element and attribute names during unparse() to prevent tag-injection attacks.
  • _validate_comment rejects -- inside XML comments (illegal per spec).

A CVE (CVE-2025-9375) was filed against xmltodict but is disputed. The root issue is in Python's xml.sax.saxutils.XMLGenerator, which does not validate element names. The same behaviour exists throughout the standard library. The disclosure timeline (10 days from first contact to publication) did not allow a maintainer response.


Compatibility notes

  • Python 3.9+
  • Falls back to pure Python automatically when the Rust extension is unavailable (PyPy, unsupported architectures, source installs without Rust)
  • Full backwards compatibility with the original xmltodict API
  • xmltodict.py — the original single-file implementation — is preserved as the fallback

License

MIT. Copyright (C) 2012 Martin Blech and individual contributors. Rust acceleration layer Copyright (C) 2025 Andrei Voicu Tomuț.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmltodict_fast-1.1.0.tar.gz (69.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xmltodict_fast-1.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (461.2 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

xmltodict_fast-1.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (465.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

xmltodict_fast-1.1.0-cp39-abi3-win_amd64.whl (322.9 kB view details)

Uploaded CPython 3.9+Windows x86-64

xmltodict_fast-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (465.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

xmltodict_fast-1.1.0-cp39-abi3-macosx_11_0_arm64.whl (428.6 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

xmltodict_fast-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (474.1 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file xmltodict_fast-1.1.0.tar.gz.

File metadata

  • Download URL: xmltodict_fast-1.1.0.tar.gz
  • Upload date:
  • Size: 69.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for xmltodict_fast-1.1.0.tar.gz
Algorithm Hash digest
SHA256 1925f9c8ba318e4477c7880e85351c569f6231c34843426476ba817069fed0f4
MD5 215b1c6055307294ec11ecf20b58214c
BLAKE2b-256 784b671784e7bb8d97dbb28269ab074f75cd9181a711346f420a69da994baf5e

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2c8376192d194f137656d6cd0fd2443b753d49d072d147aa18c047555426f733
MD5 63fced32d61986f0fdb9c12310e43860
BLAKE2b-256 f807cabf6afa9767be57a6ca547979f0b7cdda00f31417021e194a5bfb8ba15a

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9a1f305557c768d2f704545e1a848b565735d70a9bb6bcd500f8abeb660012dd
MD5 ec8fd40f982a1606052dd5caf771c1bc
BLAKE2b-256 906c824e2259a94419d1129d5713b06e14eaf42df900790e4c1eff33673fae94

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 94eacd14010574ad2ea11a734d0c756688469f98bf9e6bc132f22c11af1cb06b
MD5 a410ad457abcb64928097ccf4bae8a1e
BLAKE2b-256 add8e86b22ac0fff765490f4f68d302d3a146b13de61b4322636580f6b00d39d

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 47baf6b5309bd25c3d86d9804395957ceab1677e395f83880b5e310de660705d
MD5 ae9f99904b203beed7b586763332fcf6
BLAKE2b-256 a3aa35701a7c00a6b9df9032336d45a1502bf047e7026e3c2116ee9b9f5f7f13

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d0618b96daadcb07eecfdd2f6bfa79f397cc065957cf4d43c527a16701417080
MD5 5f1985bb1d4437ebb8592092ed79768d
BLAKE2b-256 f8042dbf2da040c627fe59ff66486c555241b37404b62be1236ce7b9ce54ff42

See more details on using hashes here.

File details

Details for the file xmltodict_fast-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xmltodict_fast-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a115d1400a2adc0b5aa46a58514710e5b54c0a572e5a22cdf931704c1d51bf9e
MD5 234dad7c1371790563f56289d1ea244a
BLAKE2b-256 84e9c96a89510655b702b9c83ce859993c3226cad74e32d43443dae732479f44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page