Toolset to perform various operations on PAGE XML datasets
Project description
PAGETools - WIP
Small collection of PAGE XML related Python scripts.
Installing
Installation using pip
The suggested method is to install pagetools
into a virtual environment using pip:
python -m venv VENV_NAME
source VENV_NAME/bin/activate
pip install pagetools
To install the package from its source, clone this repository and run
pip install pagetools
Install from source
python setup.py install
Usage
Line extraction
Usage: pagetools-extract-lines [OPTIONS] [XMLS]...
Options:
-ie, --image-extension TEXT Extension of image files (must be in the
same directory as XML files to be
considered).
-o, --output TEXT Path where generated files will get stored.
-e, --enumerate-output Enumerates output file names instead of
using original names.
-z, --zip-output Add output to zip archive.
-bg, --background-color INTEGER...
RGB color code used to fill up background.
Used when padding and / or deskewing.
--background-mode [median|mean|dominant]
Color calc mode to fill up background
(overwrites -bg / --background-color).
-p, --padding INTEGER... Padding in pixels around the line image
cutout (top, bottom, left, right).
-ad, --auto-deskew Autodeskew extracted line images
(Experimental!).
-d, --deskew FLOAT Angle for manuel clockwise rotation of the
line images.
-gt, --gt-index INTEGER Index of the TextEquiv elements containing
ground truth.
-pred, --pred-index INTEGER Index of the TextEquiv elements containing
predicted text.
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PAGETools-0.1.tar.gz
(8.7 kB
view hashes)
Built Distribution
PAGETools-0.1-py3-none-any.whl
(12.5 kB
view hashes)