Skip to main content

The main standards for Latis Document AI project

Project description

DocumentAI-std

DocumentAI-std is a Python library designed to facilitate and standardize document analysis and processing tasks. It offers functionality for handling document elements, performing optical character recognition (OCR), and managing document datasets.

Installation

To install DocumentAI-std, you can follow these steps:

  1. Clone the repository from GitHub:
pip install DocumentAI-std

Example of Usage

Here's an example demonstrating how to use the Wildreceipt dataset:

from DocumentAI_std.datasets import Wildreceipt

# Define train and test sets
train_set = Wildreceipt(
    train=True,
    img_folder="/path/to/train/images/",
    label_path="/path/to/train/annotations.txt",
)
test_set = Wildreceipt(
    train=False,
    img_folder="/path/to/test/images/",
    label_path="/path/to/test/annotations.txt",
)

# Assert the number of data samples in train and test sets
assert len(train_set.data) == 1267
assert len(test_set.data) == 472

In the above example:

  • We import the Wildreceipt dataset from the DocumentAI_std library.
  • We create train and test dataset instances, specifying the paths to image folders and annotation files.
  • We assert that the number of data samples in the train and test sets matches the expected counts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DocumentAI_std-0.2.7.dev1.tar.gz (15.7 kB view hashes)

Uploaded Source

Built Distribution

DocumentAI_std-0.2.7.dev1-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page