Skip to main content

Toolkit for extracting logical chunks from complex document pages.

Project description

semANT-TextBite

Tool for extracting logical chunks from complex document pages.

Install from PyPi:

pip install textbite

Run by simply providing a folder of images alongside a folder of corresponding PAGE XMLs (such as obtained from pero-ocr):

textbite --xml-input page_xmls/ --images jpegs/ --xml-output textbite-out/ [--model best-weights.pt]

By default, TextBite downloads a detection model from the internet. In case you are working in an offline environment, you can download it yourself and provide path as an argument.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textbite-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

textbite-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file textbite-0.1.0.tar.gz.

File metadata

  • Download URL: textbite-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for textbite-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cc9e4ccf5ccd2b48413ef303aabf4e5622cbe85d95d97b60fe34b2146ac0303e
MD5 9371eab0313c8c780381aebd04da3432
BLAKE2b-256 f01f9b98b81f73fb2f2e9ebade2f7e0809617f4d6257a2c11e3805bf7339200a

See more details on using hashes here.

File details

Details for the file textbite-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: textbite-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for textbite-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c3a6a4ca80cc406cddcc82e1fa1bd17f0236f6b9257927c63e35baddc07005d
MD5 104be4a99366c70c58f649fb284b1ca8
BLAKE2b-256 afc5a339ac6414082b2a45fdcdbad3ab745d23a0b3674ec67cb6a5a95e2e1427

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page