Skip to main content

Toolset to perform various operations on PAGE XML datasets

Project description

PAGETools - WIP

Small collection of PAGE XML related Python scripts.

Installing

Installation using pip

The suggested method is to install pagetools into a virtual environment using pip:

python -m venv VENV_NAME
source VENV_NAME/bin/activate
pip install pagetools

To install the package from its source, clone this repository and run

pip install pagetools

Install from source

python setup.py install

Usage

Line extraction

Usage: pagetools-extract-lines [OPTIONS] [XMLS]...

Options:
  -ie, --image-extension TEXT     Extension of image files (must be in the
                                  same directory as XML files to be
                                  considered).

  -o, --output TEXT               Path where generated files will get stored.
  -e, --enumerate-output          Enumerates output file names instead of
                                  using original names.

  -z, --zip-output                Add output to zip archive.
  -bg, --background-color INTEGER...
                                  RGB color code used to fill up background.
                                  Used when padding and / or deskewing.

  --background-mode [median|mean|dominant]
                                  Color calc mode to fill up background
                                  (overwrites -bg / --background-color).

  -p, --padding INTEGER...        Padding in pixels around the line image
                                  cutout (top, bottom, left, right).

  -ad, --auto-deskew              Autodeskew extracted line images
                                  (Experimental!).

  -d, --deskew FLOAT              Angle for manuel clockwise rotation of the
                                  line images.

  -gt, --gt-index INTEGER         Index of the TextEquiv elements containing
                                  ground truth.

  -pred, --pred-index INTEGER     Index of the TextEquiv elements containing
                                  predicted text.

  --help                          Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PAGETools-0.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PAGETools-0.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file PAGETools-0.1.tar.gz.

File metadata

  • Download URL: PAGETools-0.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for PAGETools-0.1.tar.gz
Algorithm Hash digest
SHA256 669bd652a11363c5b22396cd725a781e8c07ead70d1e8190a3f914c971e45f9f
MD5 06701d6c7b829a9186dc5d32840401a9
BLAKE2b-256 8a668af471c78b67b736dd999965477c1b2989ea82a29d4b6f11f0af18851a72

See more details on using hashes here.

File details

Details for the file PAGETools-0.1-py3-none-any.whl.

File metadata

  • Download URL: PAGETools-0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for PAGETools-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4f7494a05881a3265f0fd9a30f5b837a0d71236c0f59001b1b7cf9802477eb45
MD5 d73a347dfc91b475105fc791f1fced9f
BLAKE2b-256 c21d7e5d094400be1d3c0fa6a083b544f6478a9f7bc9ad9d2526a344272b5a80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page