Python SAX parser to extract xml
Project description
SaxTract
Python SAX parser to extract xml
Free software: MIT license
Documentation: https://saxtract.readthedocs.io
Features
Uses a SAXParser to maintain a fix memory footprint to parse and ‘extract’ tags from an xml file and push it to an output stream.
With performance tests on a trimmed down to 10k records from the dbpl dataset, SaxTrack ran in about half the time and half the memory footprint
python tests/perf_tests.py --filename test.xml --tag authors --runs 5
SaxTrack run took ~0.05381571219999999s
DOM Parser run took ~0.09159613900000001s
Todo’s
allow xsd/dtd input for validation
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
The main parser code was copied from tutorialspoint
History
0.1.0 (2021-02-26)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for saxtract-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e29fa9425269c0f29c1a63c2092a1bec09077eda1099408f724d3ed871ae7961 |
|
MD5 | 593c0a77725c5d8b8beb7036864fd3e4 |
|
BLAKE2b-256 | 10dc698781cf4a89ea174c1df824d21ba0b1fc31a1b244448e2d409b5fd8cf8f |