Skip to main content

PDF Table to JSON Converter

Project description

pdf-table-extract

Extract tables data from pdf files To JSON

  • Locate the table with oepncv and read the contents with a text reader (Your table should be blocked by a border)

  • (If you don't have a border, add a border through adjustment)

  • Currently, only the basic table is supported. (Supports only tables with horizontal headers)

    Header 1 Header 2 Header 3
    cel1 cel2 cel3
    cel1 cel2 cel3
    cel1 cel2 cel3
  • The pdf must be readable by a text reader. Drag on pdf to see if the text is captured

Installation

  • Rquired Python >= 3.8
  • install with pip
pip install pdf-table2json

Example

import

import pdf_table2json.converter as converter

path = "PATH/PDF_NAME.pdf"
result = converter.main(path)
print(result)

CLI

python converter.py -i "pdf_path/pdf_name.pdf" [-j] [-o]
  • "-i", "--input", required=True, help="[Required] Input PDF file path"
  • "-j", "--json_file", action="store_true", help="[Optional] Create JSON Data file"
  • "-o", "--image_file", action="store_true", help="[Optional] Save Image Data file"

Colab

[Open In Colab]

License

  • GPL-3.0 license

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_table2json-0.0.12.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

pdf_table2json-0.0.12-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file pdf_table2json-0.0.12.tar.gz.

File metadata

  • Download URL: pdf_table2json-0.0.12.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.12.tar.gz
Algorithm Hash digest
SHA256 4b0a903cad080d0c77ce2e5fff16a456105efab591916c09fa779b71d1b6fe20
MD5 cd4209f034f3e39c91a31788c7f0b8cf
BLAKE2b-256 3aae8a7954dccfe90575b37c884672d30f71ac31e712c11884ee77fcaea41bce

See more details on using hashes here.

File details

Details for the file pdf_table2json-0.0.12-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_table2json-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 0f53cb026951919973363146f3c3e39ab48c84804d290b5f68600d3fa5e819ac
MD5 ce1095a6afc03d63992bc5eeca90a255
BLAKE2b-256 906808dcee41afc4fdb24476c348b307386c719f4c6b7d70e80edb6cda7f26ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page