Library provides a useful operations over PDF/Image
Project description
pdfutil [Under Development]
Library provides a lot of operations over PDF/Image.
Input and Output
The Libarary expose each function with a standard set of argument which are fixed for eevry function
import pdfutil
coordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])
Name | Description |
---|---|
pdf_location | input location of PDF, image can also be passed libaray will autodetect the image |
save_result | Default False, If True will save the result pdf/img in location specified by result_location |
show_result | Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels |
result_location | Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False |
args | custom set of args in form of dictionaty specific to each function |
coordinates | Output returned by the function call, this will contain json output in following format |
[
{
"type": "text",
"output": {
"coord": [
["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
["pageno_2", "startx_2", "starty_2", "width_2", "height_2"]
]
}
},
{
"type": "table",
"output": {
"coord": [
["pageno_1", "startx_1", "starty_1", "width_1", "height_1"],
]
}
}
]
operations
Detecting Tables
import pdfutil
coordinates = pdfutil.detect_tables(pdf_location)
Detecting Text Regions [Paragrahs / Unstructured Content]
import pdfutil
coordinates = pdfutil.detect_text(pdf_location)
Detecting Non-Text Regions [Images / Logos]
import pdfutil
coordinates = pdfutil.detect_non_text(pdf_location)
Detecting Language
import pdfutil
coordinates = pdfutil.detect_non_language(pdf_location)
Detecting Key Value Pairs
import pdfutil
coordinates = pdfutil.detect_key_value_pairs(pdf_location)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfutil-0.0.1.tar.gz
(2.0 kB
view details)
Built Distribution
File details
Details for the file pdfutil-0.0.1.tar.gz
.
File metadata
- Download URL: pdfutil-0.0.1.tar.gz
- Upload date:
- Size: 2.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cf034e71e888c7993ece8a84d4ee0721ecb2aa6a276504a7720af9b00b8de99 |
|
MD5 | cf4ad266aafb06816cd6764e21d35244 |
|
BLAKE2b-256 | 47bea0c846f9d6976ce892a176319b1cf2fc9ea40b81f825b179f2a457de2740 |
File details
Details for the file pdfutil-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: pdfutil-0.0.1-py3-none-any.whl
- Upload date:
- Size: 1.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83b635a34207db8d4d4b6f57956cb00d54ce63ba2ab0d3c0380d7bcb6cf8f749 |
|
MD5 | 98c30cc98ba8035411120b2da97d7d6a |
|
BLAKE2b-256 | 0671b94fa5e1cd14ce1cf1296dbd59a362ca81ecd41fd7e52d1e71fd57ae7022 |