Skip to main content

PDF Table to JSON Converter

Project description

pdf-table-extract

Extract tables data from pdf files To JSON

  • Locate the table with oepncv and read the contents with a text reader (Your table should be blocked by a border)

  • (If you don't have a border, add a border through adjustment)

  • Currently, only the basic table is supported. (Supports only tables with horizontal headers)

    Header 1 Header 2 Header 3
    cel1 cel2 cel3
    cel1 cel2 cel3
    cel1 cel2 cel3
  • The pdf must be readable by a text reader. Drag on pdf to see if the text is captured

Installation

  • Rquired Python >= 3.8
  • install with pip
pip install pdf-table2json

Example

import

import pdf_table2json.converter as converter

path = "PATH/PDF_NAME.pdf"
result = converter.main(path)
print(result)

CLI

python a.py -i "pdf_path/pdf_name.pdf" -o "output_path/" -j "" -p ""

Colab

[Open In Colab]

License

  • GPL-3.0 license

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_table2json-0.0.8.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

pdf_table2json-0.0.8-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_table2json-0.0.8.tar.gz.

File metadata

  • Download URL: pdf_table2json-0.0.8.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.8.tar.gz
Algorithm Hash digest
SHA256 ce769b710406fe59657be32019ca63e10f932ccabcec4110ffb22b6979b05708
MD5 1ac3c423c3805571b8cdd1bde11e98f3
BLAKE2b-256 f7770d8926a00df52c4b667a8ac5fb067fef9e45806505cb2ac581e52929bf47

See more details on using hashes here.

File details

Details for the file pdf_table2json-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: pdf_table2json-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a2680344c5491ba083b1fdbce0d94b7b09ddb839d904f1b720b6e4b96098b75d
MD5 195cdff6e9af9fa141e9fb32242638ef
BLAKE2b-256 06529f567f3b9279a465ca2ec58d29beda4125e8414641f45575183aa9e9ab12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page