Skip to main content

A package for extracting handwritten data from PDF documents, and returning an Excel workbook output.

Project description

feather-extract

feather-extract is a Python package designed to extract handwritten data from PDF documents, format the extracted data, and save it as an Excel workbook. This package is particularly useful for businesses in the bar and restaurant industry that need to manage inventory.

video-tutorial documentation

Installation

To install feather-extract, run the following command: pip install feather-extract

Extracting Text from Documents

feather_extract is trained on a standarized form, to download this form simply run the get_form() function

by setting get_form(filled_out=True) you can also download an already filled out form for testing and practice.

from feather_extract import get_form

form = get_form()

To extract text from a document, use the extract_text_from_document function:

from feather_extract import extract_text_from_document

extracted_text = extract_text_from_document('path/to/document.pdf')

This function uses the Azure Form Recognizer service to extract text from the document. You'll need to provide your Azure Form Recognizer API key and endpoint when prompted.

Formatting Extracted Text

The format_extracted_text function takes the extracted text and formats it into a list of rows, each containing an item, quantity, and bar designation: from feather_extract import format_extracted_text

formatted_data = format_extracted_text(extracted_text)

The formatted_data variable will contain a list of lists, where each inner list represents a row with the following format: [item, quantity, bar_designation]. Saving Data to Excel

To save the formatted data to an Excel workbook, use the save_to_excel function:

from feather_extract import save_to_excel

save_to_excel(headers, formatted_data, 'output.xlsx')

This function creates a new Excel workbook, writes the headers and formatted data rows to the active worksheet, and saves the workbook to the specified file name ('output.xlsx' in this example).

Dependencies

feather-extract relies on the following dependencies:

azure-ai-formrecognizer - For extracting text from PDF documents using the Azure Form Recognizer service.

openpyxl - For creating and writing data to Excel workbooks.

These dependencies will be automatically installed when you install feather-extract using pip.

Contributing

Contributions to feather_extract are welcome!

If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository.

To contribute code changes, follow these steps:

  • Fork the repository

  • Create a new branch for your changes

  • Make your changes and commit them with descriptive commit messages

  • Push your changes to your forked repository

  • Open a pull request against the main repository

License

feather-extract is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feather_extract-0.1.16.tar.gz (491.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feather_extract-0.1.16-py3-none-any.whl (489.1 kB view details)

Uploaded Python 3

File details

Details for the file feather_extract-0.1.16.tar.gz.

File metadata

  • Download URL: feather_extract-0.1.16.tar.gz
  • Upload date:
  • Size: 491.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for feather_extract-0.1.16.tar.gz
Algorithm Hash digest
SHA256 aa2a8224f9bd98a1c9391a5bc05b9782c7fb0d263141dbefe4cf1a0a5441b7a7
MD5 912a521ecc6a2af6e32916336e8ccb95
BLAKE2b-256 b949f6dc4515e43d520d5ecb62602e7ba674eaf2f404d08ce60e01f8fd4afaab

See more details on using hashes here.

File details

Details for the file feather_extract-0.1.16-py3-none-any.whl.

File metadata

File hashes

Hashes for feather_extract-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 b87131c26aa61fb42990ce356a315e1b99e006a83152c5c5346378717afa559d
MD5 e0e9c1822a50ac4dfc67b52d6045d3b8
BLAKE2b-256 f436608b531c2aa5ff0f732f10a8a4d5a0ccdb234486ae703231c2e4e0591d93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page