A python module to read and parse ALTO files
Project description
simple-alto-parser
This is a simple parser for ALTO XML files. It is designed to do two tasks separately:
- Extract the text from the ALTO XML file with the AltoTextParser class.
- Extract structured information from the text with different parsing methods.
Usage
from simple_alto_parser import AltoTextParser
alto_parser = AltoTextParser()
alto_parser.add_file('path/to/alto.xml')
alto_parser.parse_text()
result = alto_parser.get_alto_files()
regions = result[0].get_text_regions()
lines = regions[0].get_text_lines()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simple-alto-parser-0.0.5.tar.gz
(21.6 kB
view hashes)
Built Distribution
Close
Hashes for simple_alto_parser-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db60bdb6b447adcf8b48b5cdcb4b3c95bb9c9340397d8780280b634142f80e92 |
|
MD5 | f264e29ddc62bc012596bc054e7f7bfa |
|
BLAKE2b-256 | a115e2c1bfdec2fa68a0a9792aae477a3dad9bbd12eedd5961b82d560145fdb0 |