Skip to main content

Deep learning for document processing

Project description

Doc Transformers

Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (key - value pairs)

pip install -q doc-transformers

Pre-requisites

Please install the following seperately

pip install pip --upgrade
pip install -q git+https://github.com/huggingface/transformers.git

pip install pyyaml==5.1

# workaround: install old version of pytorch since detectron2 hasn't released packages for pytorch 1.9 (issue: https://github.com/facebookresearch/detectron2/issues/3158)
pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install detectron2 that matches pytorch 1.8
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

Implementation

# loads the pretrained dataset also 
from doc_transformers import parser

# loads the image and labels
image = parser.load_image(input_path_image)
labels = parser.load_tags()

# loads the model
feature_extractor, processor, model = parser.load_models()

# gets the bounding boxes, predictions, extracted words and image processed
kp = parser.process_image(image, feature_extractor, processor, model, labels)

Results

Input & Output

Table

  • After saving to csv the result looks like the following
LABEL TEXT
title CREDIT CARD VOUCHER ANY RESTAURANT
title ANYWHERE
key DATE:
value 02/02/2014
key TIME:
value 11:11
key CARD
key TYPE:
value MC
key ACCT:
value XXXX XXXX XXXX
value 1111
key TRANS
key KEY:
value HYU8789798234
key AUTH
key CODE:
value 12345
key EXP
key DATE:
value XX/XX
key CHECK:
value 1111
key TABLE:
value 11/11
key SERVER:
value 34
value MONIKA
key Subtotal:
value $1969
value .69
key Gratuity: Total:

Code credits

@HuggingFace

  • Please note that this is still in development phase and will be improved in the near future

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_transformers-1.0.2.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file doc_transformers-1.0.2.tar.gz.

File metadata

  • Download URL: doc_transformers-1.0.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for doc_transformers-1.0.2.tar.gz
Algorithm Hash digest
SHA256 17ee79d4484bb319e79371c83cf8a1e266d16bd983806db37aade0a1bba9db9c
MD5 57256940754696e97fc6e49e5e3dd0e1
BLAKE2b-256 491abf348e38a5690bd2670c4639a996c2919542aa6b4ecc98d17ca8875b5b08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page