Convert PDF to structured data
Project description
This is Lumina Invoice Reader Project
Data_Processor Class Documentation
The Data_Processor class is designed to process various types of data, including images, text, and tables. It sends these data types to a GPT model for classification and translates the results from Vietnamese to English.
Initialization
The class is initialized with a dictionary of credentials that should include the URL and headers for the GPT model.
processor = Data_Processor(credentials)
Methods
send_request_to_gpt
This method sends a request to the GPT model and returns the extracted content from the response.
processor.send_request_to_gpt(system_prompt, user_prompt, temperature, max_tokens, top_p, version)
classify_image
This method classifies the provided image using the GPT model and returns the result as a dictionary. The image should be provided as a base64 encoded string.
processor.classify_image(image_base64, image_extraction_prompt, translation_prompt)
classify_text
This method classifies the provided text using the GPT model and returns the result as a dictionary.
processor.classify_text(classify_prompt, translation_prompt, text)
classify_and_translate_table
This method classifies the provided table data and translates the result. The table data should be provided as a string.
processor.classify_and_translate_table(classification_prompt, translation_prompt, user_prompt, result_key)
classify_and_translate_multiple_tables
This method sends multiple tabular data to GPT for classification and returns the result as JSON. The tabular data should be stored in .txt files in the specified directory.
processor.classify_and_translate_multiple_tables(classification_prompt, translation_prompt, directory_path, file_name, num_workers)
Error Handling
All methods in the Data_Processor class are designed to handle exceptions and will raise an error if something goes wrong during the classification or translation process.
Other note
(invoice_reader_ct) => this is for building the library (invoice_reader_test) => this is for test the library
Further developemnt
How to hide these warnings from camelot: C:\Users\ftt.cuong.ton\anaconda3\envs\invoice_reader_test\lib\site-packages\camelot\parsers\lattice.py:416: UserWarning: No tables found on page-2 C:\Users\ftt.cuong.ton\anaconda3\envs\invoice_reader_test\lib\site-packages\camelot\parsers\lattice.py:411: UserWarning: page-1 is image-based, camelot only works on text-based pages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lumina_invoice_reader-0.0.2.tar.gz
.
File metadata
- Download URL: lumina_invoice_reader-0.0.2.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58394250399e2c4dc32c555349a31e28c0c602c258feda0a699dfa50ea80c836 |
|
MD5 | aa4630ddb5f7c4274c62e2822d1545de |
|
BLAKE2b-256 | cebbeddb2ffbe15d1a9f85195c169d57a46fb476fd74bdbd491c667814e57a79 |
File details
Details for the file lumina_invoice_reader-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: lumina_invoice_reader-0.0.2-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c67f02f52fc1f59ed09170bc7bc31d7e9d3ce28d66275282a2ad59e88c06742c |
|
MD5 | bad85660ccea56a7abf3f50643865b6e |
|
BLAKE2b-256 | 72cd546bc3dfeb15f5f5618f759ac439925933185a32beb204c0cc73b45a56db |