Skip to main content

Library provides a useful operations over PDF/Image

Project description

pdfutil [Under Development]

Library provides a lot of operations over PDF/Image.

Input and Output

The Libarary expose each function with a standard set of argument which are fixed for eevry function

import pdfutil
coordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])
Name Description
pdf_location input location of PDF, image can also be passed libaray will autodetect the image
save_result Default False, If True will save the result pdf/img in location specified by result_location
show_result Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels
result_location Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False
args custom set of args in form of dictionaty specific to each function
coordinates Output returned by the function call, this will contain json output in following format
[
  {
    "type": "text",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
        ["pageno_2", "startx_2", "starty_2", "width_2", "height_2"]
      ]
    }
  },
  {
    "type": "table",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
      ]
    }
  }
]

operations

Detecting Tables

import pdfutil
coordinates = pdfutil.detect_tables(pdf_location)

Detecting Text Regions [Paragrahs / Unstructured Content]

import pdfutil
coordinates = pdfutil.detect_text(pdf_location)

Detecting Non-Text Regions [Images / Logos]

import pdfutil
coordinates = pdfutil.detect_non_text(pdf_location)

Detecting Language

import pdfutil
coordinates = pdfutil.detect_non_language(pdf_location)

Detecting Key Value Pairs

import pdfutil
coordinates = pdfutil.detect_key_value_pairs(pdf_location)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pdfutil, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size pdfutil-0.0.1-py3-none-any.whl (1.9 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size pdfutil-0.0.1.tar.gz (2.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page