Skip to main content

PDF Table to JSON Converter

Project description

pdf-table-extract

Extract tables data from pdf files To JSON

  • Locate the table with oepncv and read the contents with a text reader (Your table should be blocked by a border)

  • (If you don't have a border, add a border through adjustment)

  • Currently, only the basic table is supported. (Supports only tables with horizontal headers)

    Header 1 Header 2 Header 3
    cel1 cel2 cel3
    cel1 cel2 cel3
    cel1 cel2 cel3
  • The pdf must be readable by a text reader. Drag on pdf to see if the text is captured

Installation

  • Rquired Python >= 3.8
  • install with pip
pip install pdf-table2json

Example

import

import pdf_table2json.converter as converter

path = "PATH/PDF_NAME.pdf"
result = converter.main(path)
print(result)

CLI

python converter.py -i "pdf_path/pdf_name.pdf" [-j] [-o]
  • "-i", "--input", required=True, help="[Required] Input PDF file path"
  • "-j", "--json_file", action="store_true", help="[Optional] Create JSON Data file"
  • "-o", "--image_file", action="store_true", help="[Optional] Save Image Data file"

Colab

[Open In Colab]

License

  • GPL-3.0 license

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_table2json-0.0.11.tar.gz (17.9 kB view hashes)

Uploaded Source

Built Distribution

pdf_table2json-0.0.11-py3-none-any.whl (24.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page