Skip to main content

Convert PDF to structured data

Project description

This is Lumina Invoice Reader Project

Data_Processor Class Documentation

The Data_Processor class is designed to process various types of data, including images, text, and tables. It sends these data types to a GPT model for classification and translates the results from Vietnamese to English.

Initialization

The class is initialized with a dictionary of credentials that should include the URL and headers for the GPT model.

processor = Data_Processor(credentials)

Methods

send_request_to_gpt

This method sends a request to the GPT model and returns the extracted content from the response.

processor.send_request_to_gpt(system_prompt, user_prompt, temperature, max_tokens, top_p, version)

classify_image

This method classifies the provided image using the GPT model and returns the result as a dictionary. The image should be provided as a base64 encoded string.

processor.classify_image(image_base64, image_extraction_prompt, translation_prompt)

classify_text

This method classifies the provided text using the GPT model and returns the result as a dictionary.

processor.classify_text(classify_prompt, translation_prompt, text)

classify_and_translate_table

This method classifies the provided table data and translates the result. The table data should be provided as a string.

processor.classify_and_translate_table(classification_prompt, translation_prompt, user_prompt, result_key)

classify_and_translate_multiple_tables

This method sends multiple tabular data to GPT for classification and returns the result as JSON. The tabular data should be stored in .txt files in the specified directory.

processor.classify_and_translate_multiple_tables(classification_prompt, translation_prompt, directory_path, file_name, num_workers)

Error Handling

All methods in the Data_Processor class are designed to handle exceptions and will raise an error if something goes wrong during the classification or translation process.

Other note

(invoice_reader_ct) => this is for building the library (invoice_reader_test) => this is for test the library

Further developemnt

How to hide these warnings from camelot: C:\Users\ftt.cuong.ton\anaconda3\envs\invoice_reader_test\lib\site-packages\camelot\parsers\lattice.py:416: UserWarning: No tables found on page-2 C:\Users\ftt.cuong.ton\anaconda3\envs\invoice_reader_test\lib\site-packages\camelot\parsers\lattice.py:411: UserWarning: page-1 is image-based, camelot only works on text-based pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumina_invoice_reader-0.0.2.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

lumina_invoice_reader-0.0.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file lumina_invoice_reader-0.0.2.tar.gz.

File metadata

  • Download URL: lumina_invoice_reader-0.0.2.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.19

File hashes

Hashes for lumina_invoice_reader-0.0.2.tar.gz
Algorithm Hash digest
SHA256 58394250399e2c4dc32c555349a31e28c0c602c258feda0a699dfa50ea80c836
MD5 aa4630ddb5f7c4274c62e2822d1545de
BLAKE2b-256 cebbeddb2ffbe15d1a9f85195c169d57a46fb476fd74bdbd491c667814e57a79

See more details on using hashes here.

File details

Details for the file lumina_invoice_reader-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for lumina_invoice_reader-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c67f02f52fc1f59ed09170bc7bc31d7e9d3ce28d66275282a2ad59e88c06742c
MD5 bad85660ccea56a7abf3f50643865b6e
BLAKE2b-256 72cd546bc3dfeb15f5f5618f759ac439925933185a32beb204c0cc73b45a56db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page