Skip to main content

PDF Table to JSON Converter

Project description

pdf-table-extract

Extract tables data from pdf files To JSON

  • Locate the table with oepncv and read the contents with a text reader (Your table should be blocked by a border)

  • (If you don't have a border, add a border through adjustment)

  • Currently, only the basic table is supported. (Supports only tables with horizontal headers)

    Header 1 Header 2 Header 3
    cel1 cel2 cel3
    cel1 cel2 cel3
    cel1 cel2 cel3
  • The pdf must be readable by a text reader. Drag on pdf to see if the text is captured

Installation

  • Rquired Python >= 3.8
  • install with pip
pip install pdf-table2json

Example

import

import pdf_table2json.converter as converter

path = "PATH/PDF_NAME.pdf"
result = converter.main(path)
print(result)

CLI

python converter.py -i "pdf_path/pdf_name.pdf" [-j] [-o]
  • "-i", "--input", required=True, help="[Required] Input PDF file path"
  • "-j", "--json_file", action="store_true", help="[Optional] Create JSON Data file"
  • "-o", "--image_file", action="store_true", help="[Optional] Save Image Data file"

Colab

[Open In Colab]

License

  • GPL-3.0 license

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_table2json-0.0.11.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

pdf_table2json-0.0.11-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file pdf_table2json-0.0.11.tar.gz.

File metadata

  • Download URL: pdf_table2json-0.0.11.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.11.tar.gz
Algorithm Hash digest
SHA256 4144e0b538452ee181d95a720571ab360eac218668f33c68b88cca41153a83a6
MD5 fdeb31632fce8210cc00118a35f8552d
BLAKE2b-256 81f58b03c36189a54eb20b1c0df9dcdb34227b0f72791b07f77f11931a69e272

See more details on using hashes here.

File details

Details for the file pdf_table2json-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_table2json-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 f98cbb8aad00d87b7086e7b7a05633b8bbfb1ca49ef02457c7b797526ab9e50d
MD5 bef4d5cd57fa1da1927747e56749f18f
BLAKE2b-256 0c83662f5c4289663e53b3c7559fa4ee20b1c3d0c0575a7b95bc609f30ffcd39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page