Skip to main content

Library provides a useful operations over PDF/Image

Project description

pdfutil [Under Development]

Library provides a lot of operations over PDF/Image.

Input and Output

The Libarary expose each function with a standard set of argument which are fixed for eevry function

import pdfutil
coordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])
Name Description
pdf_location input location of PDF, image can also be passed libaray will autodetect the image
save_result Default False, If True will save the result pdf/img in location specified by result_location
show_result Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels
result_location Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False
args custom set of args in form of dictionaty specific to each function
coordinates Output returned by the function call, this will contain json output in following format
[
  {
    "type": "text",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
        ["pageno_2", "startx_2", "starty_2", "width_2", "height_2"]
      ]
    }
  },
  {
    "type": "table",
    "output": {
      "coord": [
        ["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
      ]
    }
  }
]

operations

Detecting Tables

import pdfutil
coordinates = pdfutil.detect_tables(pdf_location)

Detecting Text Regions [Paragrahs / Unstructured Content]

import pdfutil
coordinates = pdfutil.detect_text(pdf_location)

Detecting Non-Text Regions [Images / Logos]

import pdfutil
coordinates = pdfutil.detect_non_text(pdf_location)

Detecting Language

import pdfutil
coordinates = pdfutil.detect_non_language(pdf_location)

Detecting Key Value Pairs

import pdfutil
coordinates = pdfutil.detect_key_value_pairs(pdf_location)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfutil-0.0.1.tar.gz (2.0 kB view details)

Uploaded Source

Built Distribution

pdfutil-0.0.1-py3-none-any.whl (1.9 kB view details)

Uploaded Python 3

File details

Details for the file pdfutil-0.0.1.tar.gz.

File metadata

  • Download URL: pdfutil-0.0.1.tar.gz
  • Upload date:
  • Size: 2.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pdfutil-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1cf034e71e888c7993ece8a84d4ee0721ecb2aa6a276504a7720af9b00b8de99
MD5 cf4ad266aafb06816cd6764e21d35244
BLAKE2b-256 47bea0c846f9d6976ce892a176319b1cf2fc9ea40b81f825b179f2a457de2740

See more details on using hashes here.

File details

Details for the file pdfutil-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pdfutil-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pdfutil-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 83b635a34207db8d4d4b6f57956cb00d54ce63ba2ab0d3c0380d7bcb6cf8f749
MD5 98c30cc98ba8035411120b2da97d7d6a
BLAKE2b-256 0671b94fa5e1cd14ce1cf1296dbd59a362ca81ecd41fd7e52d1e71fd57ae7022

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page