Skip to main content

A package for representing documents as a tree of document, pages, paragraphs, lines, words, and characters

Project description

Trullion image

Docstruct

Overview

Docstruct is a package that parses the results of optical character recognition (OCR) algorithms, such as Tesseract (using the hOCR output) or Textract (AWS), into a tree structure. This tree structure allows for the visual representation of the document, with each node representing a document, page, paragraph, line, word, or character, along with its bounding box. The package also includes support for paragraph detection and text splitting that preserves logical units.

Documentation

For more information read the docs at: [Docstruct](https://smrt-co.github.io/docstruct/)

pip install docstruct

Contributions

Contributions to the Docstruct package are always welcome. If you have a bug fix or a new feature, feel free to create a pull request on the GitHub repository.

License

The Docstruct package is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docstruct-1.0.239.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

docstruct-1.0.239-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file docstruct-1.0.239.tar.gz.

File metadata

  • Download URL: docstruct-1.0.239.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for docstruct-1.0.239.tar.gz
Algorithm Hash digest
SHA256 cd15f83cfdec34b76593a6bb7c6dad789408d14dbf6219fbee87cdf319c155f4
MD5 dd14cf2d8dd2a9e8b0b0ae10d4259a34
BLAKE2b-256 864ac8e0c0aa30f416374dbd53aada1bf59c1cc819a6079ba9e4b3588e5e3b50

See more details on using hashes here.

File details

Details for the file docstruct-1.0.239-py3-none-any.whl.

File metadata

  • Download URL: docstruct-1.0.239-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for docstruct-1.0.239-py3-none-any.whl
Algorithm Hash digest
SHA256 57e996dcb42ad5b94e83589804fe98014acc0e5edeebdb7364189b6b92bb3619
MD5 e8d7e45910d12a9e20b9a8cf572ad1c5
BLAKE2b-256 749429daa5e0c81850930e3c03c1ad68b622135e146b02d41450a63dab447b50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page