Skip to main content

Library provides a useful operations over PDF/Image

Project description

pdfutil [Under Development]

Library provides a lot of operations over PDF/Image.

Input and Output

The Libarary expose each function with a standard set of argument which are fixed for eevry function

import pdfutil
coordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])
Name Description
pdf_location input location of PDF, image can also be passed libaray will autodetect the image
save_result Default False, If True will save the result pdf/img in location specified by result_location
show_result Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels
result_location Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False
args custom set of args in form of dictionaty specific to each function
coordinates Output returned by the function call, this will contain json output in following format
[
  {
    "type": "text",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
        ["pageno_2", "startx_2", "starty_2", "width_2", "height_2"]
      ]
    }
  },
  {
    "type": "table",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
      ]
    }
  }
]

operations

Detecting Tables

import pdfutil
coordinates = pdfutil.detect_tables(pdf_location)

Detecting Text Regions [Paragrahs / Unstructured Content]

import pdfutil
coordinates = pdfutil.detect_text(pdf_location)

Detecting Non-Text Regions [Images / Logos]

import pdfutil
coordinates = pdfutil.detect_non_text(pdf_location)

Detecting Language

import pdfutil
coordinates = pdfutil.detect_non_language(pdf_location)

Detecting Key Value Pairs

import pdfutil
coordinates = pdfutil.detect_key_value_pairs(pdf_location)

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pdfutil, version 0.0.1
Filename, size & hash File type Python version Upload date
pdfutil-0.0.1-py3-none-any.whl (1.9 kB) View hashes Wheel py3
pdfutil-0.0.1.tar.gz (2.0 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page