Skip to main content

A package for representing documents as a tree of document, pages, paragraphs, lines, words, and characters

Project description

Docstruct

Overview

Docstruct is a package that parses the results of optical character recognition (OCR) algorithms, such as Tesseract (using the hOCR output) or Textract (AWS), into a tree structure. This tree structure allows for the visual representation of the document, with each node representing a document, page, paragraph, line, word, or character, along with its bounding box. The package also includes support for paragraph detection and text splitting that preserves logical units.

Documentation

For more information read the docs at: [Docstruct](https://smrt-co.github.io/docstruct/)

pip install docstruct

Contributions

Contributions to the Docstruct package are always welcome. If you have a bug fix or a new feature, feel free to create a pull request on the GitHub repository.

License

The Docstruct package is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docstruct-1.0.15.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

docstruct-1.0.15-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file docstruct-1.0.15.tar.gz.

File metadata

  • Download URL: docstruct-1.0.15.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for docstruct-1.0.15.tar.gz
Algorithm Hash digest
SHA256 4cd0443452fc91f8be1d1238abcdb82ea7e15f267773b2ddb0fb9d7f4a401e18
MD5 c7427cf35a85e9fb1aaa38fd0c306cf7
BLAKE2b-256 90f7f68df0683928aae41c54728f69ec5a87b24e93bc57ecaf59ffc2659746fc

See more details on using hashes here.

File details

Details for the file docstruct-1.0.15-py3-none-any.whl.

File metadata

  • Download URL: docstruct-1.0.15-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for docstruct-1.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 a29ed98d278960b2bc7964a02e97b24fb8cb81d0297b2bf620dc66c1d9d9ab1a
MD5 e9c4da2fc240f6f6fe7ae1df474c4f17
BLAKE2b-256 672e6b6a90ab13856941128ed94dd0f65d3b13c1738d50a7ede12a9c537239c2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page