Skip to main content

Library provides a useful operations over PDF/Image

Project description

pdfutil [Under Development]

Library provides a lot of operations over PDF/Image.

Input and Output

The Libarary expose each function with a standard set of argument which are fixed for eevry function

import pdfutil
coordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])
Name Description
pdf_location input location of PDF, image can also be passed libaray will autodetect the image
save_result Default False, If True will save the result pdf/img in location specified by result_location
show_result Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels
result_location Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False
args custom set of args in form of dictionaty specific to each function
coordinates Output returned by the function call, this will contain json output in following format
[
  {
    "type": "text",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
        ["pageno_2", "startx_2", "starty_2", "width_2", "height_2"]
      ]
    }
  },
  {
    "type": "table",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
      ]
    }
  }
]

operations

Detecting Tables

import pdfutil
coordinates = pdfutil.detect_tables(pdf_location)

Detecting Text Regions [Paragrahs / Unstructured Content]

import pdfutil
coordinates = pdfutil.detect_text(pdf_location)

Detecting Non-Text Regions [Images / Logos]

import pdfutil
coordinates = pdfutil.detect_non_text(pdf_location)

Detecting Language

import pdfutil
coordinates = pdfutil.detect_non_language(pdf_location)

Detecting Key Value Pairs

import pdfutil
coordinates = pdfutil.detect_key_value_pairs(pdf_location)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfutil-0.0.1.tar.gz (2.0 kB view hashes)

Uploaded source

Built Distribution

pdfutil-0.0.1-py3-none-any.whl (1.9 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page