Toolset to perform various operations on PAGE XML datasets
Project description
PAGETools - WIP
Small collection of PAGE XML related Python scripts.
Installing
Installation using pip
The suggested method is to install pagetools into a virtual environment using pip:
python -m venv VENV_NAME
source VENV_NAME/bin/activate
pip install pagetools
To install the package from its source, clone this repository and run
pip install pagetools
Install from source
python setup.py install
Usage
Line extraction
Usage: pagetools-extract-lines [OPTIONS] [XMLS]...
Options:
-ie, --image-extension TEXT Extension of image files (must be in the
same directory as XML files to be
considered).
-o, --output TEXT Path where generated files will get stored.
-e, --enumerate-output Enumerates output file names instead of
using original names.
-z, --zip-output Add output to zip archive.
-bg, --background-color INTEGER...
RGB color code used to fill up background.
Used when padding and / or deskewing.
--background-mode [median|mean|dominant]
Color calc mode to fill up background
(overwrites -bg / --background-color).
-p, --padding INTEGER... Padding in pixels around the line image
cutout (top, bottom, left, right).
-ad, --auto-deskew Autodeskew extracted line images
(Experimental!).
-d, --deskew FLOAT Angle for manuel clockwise rotation of the
line images.
-gt, --gt-index INTEGER Index of the TextEquiv elements containing
ground truth.
-pred, --pred-index INTEGER Index of the TextEquiv elements containing
predicted text.
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PAGETools-0.1.tar.gz
(8.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
PAGETools-0.1-py3-none-any.whl
(12.5 kB
view details)
File details
Details for the file PAGETools-0.1.tar.gz.
File metadata
- Download URL: PAGETools-0.1.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
669bd652a11363c5b22396cd725a781e8c07ead70d1e8190a3f914c971e45f9f
|
|
| MD5 |
06701d6c7b829a9186dc5d32840401a9
|
|
| BLAKE2b-256 |
8a668af471c78b67b736dd999965477c1b2989ea82a29d4b6f11f0af18851a72
|
File details
Details for the file PAGETools-0.1-py3-none-any.whl.
File metadata
- Download URL: PAGETools-0.1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f7494a05881a3265f0fd9a30f5b837a0d71236c0f59001b1b7cf9802477eb45
|
|
| MD5 |
d73a347dfc91b475105fc791f1fced9f
|
|
| BLAKE2b-256 |
c21d7e5d094400be1d3c0fa6a083b544f6478a9f7bc9ad9d2526a344272b5a80
|