Python SAX parser to extract xml
Project description
SaxTract
Python SAX parser to extract xml
Free software: MIT license
Documentation: https://saxtract.readthedocs.io
Features
Uses a SAXParser to maintain a fix memory footprint to parse and ‘extract’ tags from an xml file and push it to an output stream.
With performance tests on a trimmed down to 10k records from the dbpl dataset, SaxTrack ran in about half the time and half the memory footprint
python tests/perf_tests.py --filename test.xml --tag authors --runs 5
SaxTrack run took ~0.05381571219999999s
DOM Parser run took ~0.09159613900000001s
Todo’s
allow xsd/dtd input for validation
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
The main parser code was copied from tutorialspoint
History
0.1.0 (2021-02-26)
First release on PyPI.