Skip to main content

HTRVX, HTR Validation with XSD

Project description

HTRVX : HTR Validation for eXtra-quality controlled documents

Test library

HTRVX - pronounced Ashterux - allows for quality control of XML using XSD schema validation, Segmonto validation and other verifications.

How to install

Simply run pip install htrvx

How to run

The basic way to run the script is htrvx PATHTOFILES --format FORMAT, eg. htrvx ./tests/test_data/page/*.xml --format page

Each verification is an opt-in verification: you need to express the fact that you want to check it.

  • --segmonto will check for Segmonto compliancy
  • --xsd will check if the data are compliant with XML Schemas
  • --check-empty will check if regions have no lines or if lines have no text
    • --check-empty can be refined with --raise-empty to throw an error if empty elements are found, otherwise it's simply reported.

Other parameters mainly have to do with verbosity: --verbose displays details about errors, --group groups errors (instead of showing one line per error, groups by error types).

Parameters Default Function
-v, --verbose False Prints more information
-f, --format [alto,page] ALTO Format of files
-s, --segmonto False Apply Segmonto Zoning verification
-e, --check-empty False Check for empty lines or empty zones
-r, --raise-empty False Warns but not fails if empty lines or empty zones are found
-x, --xsd False Apply XSD Schema verification
-g, --group False Group error types (reduce verbosity)

Github Action code

If you want to add this to your github repository, as a continuous integration workflow, add a file htrux.yml at in the path .github/workflows of your repository.

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: HTRVX

on: [push, pull_request] # You can edit this of course !

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install htrvx
    - name: Run HTRVX
      run: |
        htrvx --verbose --group --format alto --segmonto --xsd --check-empty --raise-empty UNIX/Path/to/**/your/*.xml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htrvx-0.0.7.tar.gz (45.5 kB view details)

Uploaded Source

Built Distribution

htrvx-0.0.7-py2.py3-none-any.whl (56.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file htrvx-0.0.7.tar.gz.

File metadata

  • Download URL: htrvx-0.0.7.tar.gz
  • Upload date:
  • Size: 45.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for htrvx-0.0.7.tar.gz
Algorithm Hash digest
SHA256 5a95d2121a8d91513253a68d5f96861620c8f6723341d912996ed53ded2af60d
MD5 dd3b5b6f880b2a98613ffc38182c4c79
BLAKE2b-256 3616e20d029edcf0145fa65a382fa94726c7981c24e43c410e0d54b0c1f756ae

See more details on using hashes here.

File details

Details for the file htrvx-0.0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: htrvx-0.0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for htrvx-0.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ca4747d511e47b7a7f1dcd98ad369420a17a3ec59f49de22def1549c3f0aeea7
MD5 066ce5fbfa8868f6679c7d8d202a8943
BLAKE2b-256 d21a29a88065350675a4ec79c127db2ede209e36f822adf6204adce6f5d6260c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page