A python module to read and parse ALTO files
Project description
simple-alto-parser
This is a simple parser for ALTO XML files. It is designed to do two tasks separately:
- Extract the text from the ALTO XML file with the AltoTextParser class.
- Extract structured information from the text with different parsing methods.
Usage
from simple_alto_parser import AltoTextParser
alto_parser = AltoTextParser()
alto_parser.add_file('path/to/alto.xml')
alto_parser.parse_text()
result = alto_parser.get_alto_files()
regions = result[0].get_text_regions()
lines = regions[0].get_text_lines()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simple-alto-parser-0.0.4.tar.gz
(21.4 kB
view hashes)
Built Distribution
Close
Hashes for simple_alto_parser-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7a75ad7f33e2b47d3d09afd731c696122f2fa912e277a281858294aba0e0631 |
|
MD5 | e28a9f322a525b904f81036f54aff47b |
|
BLAKE2b-256 | d034b4eb7d54c03bb4bfb1342c56230dcc5b5e69086fd0bdae8ab78c47cb2b1d |