XML parser with streaming iterator interface
Project description
Xml Iterator
An XML parser for Python with streaming iterator interface and protection against infinite depth attacks.
Features
- Streaming XML parsing - processes XML without loading entire document into memory
- Infinite depth protection - iterator-based approach allows user-controlled limits
- xmltodict compatibility -
xml_to_dict()function produces identical results to xmltodict library - High performance - Rust implementation 1.2x faster than xmltodict, 734x faster for early termination
- Unicode support - handles UTF-8 encoding correctly
Performance
Benchmarks comparing xml_to_dict() against xmltodict.parse():
| Elements | File Size | xml_iterator | xmltodict | Speedup |
|---|---|---|---|---|
| 500 | 0.2 MB | 0.020s | 0.024s | 1.2x |
| 2,000 | 0.7 MB | 0.095s | 0.099s | 1.1x |
| 5,000 | 1.8 MB | 0.231s | 0.251s | 1.1x |
Streaming advantage: 734x faster when processing only first 1,000 events from large files.
Run benchmarks yourself:
make benchmark- Synthetic data comparison vs xmltodictmake benchmark-real- Real-world ESMA FIRDS XML file (downloads ~100MB)
Usage
from xml_iterator.xml_iterator import iter_xml
from xml_iterator.core import xml_to_dict
# Streaming iteration
for count, event, value in iter_xml('file.xml'):
print(f"{event}: {value}")
if count > 1000: # User-controlled limits
break
# Convert to dictionary (xmltodict compatible)
data = xml_to_dict('file.xml', max_depth=100, max_events=10000)
Testing
Run the test suite with pytest:
# Install test dependencies
pip install -e ".[test]"
# Run all tests
pytest
# Run specific test types
pytest tests/test_basic.py # Core functionality
pytest tests/test_xmltodict.py # xmltodict compatibility
pytest tests/test_performance.py # Performance regression tests
# Run benchmarks (separate from tests)
make benchmark # Synthetic data vs xmltodict
make benchmark-real # Real-world ESMA FIRDS XML
The test suite includes:
- ✅ Basic functionality tests - streaming, encoding, deep nesting
- ✅ xmltodict compatibility tests - 100% exact result compatibility
- ✅ Performance regression tests - ensure no slowdowns
Example Output
In [1]: from xml_iterator.xml_iterator import get_edge_counts, iter_xml
In [2]: get_edge_counts('simple.xml')
xml_iterator::reading "simple.xml"
Out[2]:
{('breakfast_menu', 'food', 'price'): 5,
('breakfast_menu', 'food', 'description'): 5,
('breakfast_menu', 'food'): 5,
('breakfast_menu', 'food', 'calories'): 5,
('breakfast_menu',): 1,
('breakfast_menu', 'food', 'name'): 5}
In [3]: for x in iter_xml('simple.xml'):
...: print(x)
...:
xml_iterator::reading "simple.xml"
(0, 'start', 'breakfast_menu')
(1, 'start', 'food')
(2, 'start', 'name')
(3, 'text', 'Belgian Waffles')
(4, 'end', 'name')
(5, 'start', 'price')
(6, 'text', '$5.95')
(7, 'end', 'price')
(8, 'start', 'description')
(9, 'text', 'Two of our famous Belgian Waffles with plenty of real maple syrup')
(10, 'end', 'description')
(11, 'start', 'calories')
(12, 'text', '650')
(13, 'end', 'calories')
(14, 'end', 'food')
(15, 'start', 'food')
(16, 'start', 'name')
(17, 'text', 'Strawberry Belgian Waffles')
(18, 'end', 'name')
(19, 'start', 'price')
(20, 'text', '$7.95')
(21, 'end', 'price')
(22, 'start', 'description')
(23, 'text', 'Light Belgian waffles covered with strawberries and whipped cream')
(24, 'end', 'description')
(25, 'start', 'calories')
(26, 'text', '900')
(27, 'end', 'calories')
(28, 'end', 'food')
(29, 'start', 'food')
(30, 'start', 'name')
(31, 'text', 'Berry-Berry Belgian Waffles')
(32, 'end', 'name')
(33, 'start', 'price')
(34, 'text', '$8.95')
(35, 'end', 'price')
(36, 'start', 'description')
(37, 'text', 'Light Belgian waffles covered with an assortment of fresh berries and whipped cream')
(38, 'end', 'description')
(39, 'start', 'calories')
(40, 'text', '900')
(41, 'end', 'calories')
(42, 'end', 'food')
(43, 'start', 'food')
(44, 'start', 'name')
(45, 'text', 'French Toast')
(46, 'end', 'name')
(47, 'start', 'price')
(48, 'text', '$4.50')
(49, 'end', 'price')
(50, 'start', 'description')
(51, 'text', 'Thick slices made from our homemade sourdough bread')
(52, 'end', 'description')
(53, 'start', 'calories')
(54, 'text', '600')
(55, 'end', 'calories')
(56, 'end', 'food')
(57, 'start', 'food')
(58, 'start', 'name')
(59, 'text', 'Homestyle Breakfast')
(60, 'end', 'name')
(61, 'start', 'price')
(62, 'text', '$6.95')
(63, 'end', 'price')
(64, 'start', 'description')
(65, 'text', 'Two eggs, bacon or sausage, toast, and our ever-popular hash browns')
(66, 'end', 'description')
(67, 'start', 'calories')
(68, 'text', '950')
(69, 'end', 'calories')
(70, 'end', 'food')
(71, 'end', 'breakfast_menu')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xml_iterator-0.1.4-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: xml_iterator-0.1.4-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 290.0 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d21dfb7f23358c1ae01ab1f3979f1770f47d346e8c0cea4b4e3a84d11e7d7f34
|
|
| MD5 |
422ee39667564dcda4e1586ace8de147
|
|
| BLAKE2b-256 |
f96f292dbda8f7ab2df7b06cf8ecadbfbcc58a4e63f590297c144275c0d77150
|
File details
Details for the file xml_iterator-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: xml_iterator-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 432.4 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1e67abe1675f3f746e0d17c9ad317e60fef9d82b63f8bf55d4092a93145b3d1
|
|
| MD5 |
dc4a054e04f88dc9d94e232ed851d299
|
|
| BLAKE2b-256 |
c8db375df481dc8c545c3ca606ae43bae2e187b3cb1979933c89951296f747f4
|