Skip to main content

PDF Table to JSON Converter

Project description

pdf-table-extract

Extract tables data from pdf files To JSON

  • Locate the table with oepncv and read the contents with a text reader (Your table should be blocked by a border)

  • (If you don't have a border, add a border through adjustment)

  • Currently, only the basic table is supported. (Supports only tables with horizontal headers)

    Header 1 Header 2 Header 3
    cel1 cel2 cel3
    cel1 cel2 cel3
    cel1 cel2 cel3
  • The pdf must be readable by a text reader. Drag on pdf to see if the text is captured

Installation

  • Rquired Python >= 3.8
  • install with pip
pip install pdf-table2json

Example

import

import pdf_table2json.converter as converter

path = "PATH/PDF_NAME.pdf"
result = converter.main(path)
print(result)

CLI

python a.py -i "pdf_path/pdf_name.pdf" -o "output_path/" -j "" -p ""

Colab

[Open In Colab]

License

  • GPL-3.0 license

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_table2json-0.0.7.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

pdf_table2json-0.0.7-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_table2json-0.0.7.tar.gz.

File metadata

  • Download URL: pdf_table2json-0.0.7.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.7.tar.gz
Algorithm Hash digest
SHA256 d7b6f41701f5a4ece6c2bdfe8435997a98f509c43ea0013e99e84ced1ee3e3c9
MD5 76b8b7ed8673394786c65ef23c2b72e5
BLAKE2b-256 6cdf435891f696ffb7299e4f4faeec99ff62d96eb1e80651207cf8b9236863a1

See more details on using hashes here.

File details

Details for the file pdf_table2json-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pdf_table2json-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pdf_table2json-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f86739199c55403e4731e041c727f9f68526bc23eb829abd2b9aeab489c17d94
MD5 3522af5180df4ebde46bb81ae0318062
BLAKE2b-256 912d5287d1e9937244293875d94bd4a2d3b22015b1c70b051156c4ee68a2820d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page