Skip to main content

End-to-End table structure detector

Project description

Tabledetector

PyPI

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

Features

  • PDF Input: Accepts PDF/Image files as input for table detection.
  • Alignment Check: Verifies and adjusts alignment of input.
  • Table Detection: Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
  • Table Extraction: Extract the tabular data in the form of dataframe.

Libraries Used

  • Python 3.x
  • OpenCV
  • NumPy
  • pdf2image
  • Pillow
  • scipy
  • jinja2
  • easyocr
  • pandas

Create and Activate Environment

conda create -n <env_name> python=3.7
conda activate <env_name>

Installation of package using pip

pip install tabledetector

Clone the repository for latest development release

git clone https://github.com/rajban94/TableDetector.git

Dependency

To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

Usage

For bordered table detection:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="bordered")

For semibordered table detection:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered")

For unbordered table detection:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered")

If no method is mentioned in that case it will check for all the methods and will provide the result accordingly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tabledetector-1.0.1-py3-none-any.whl (19.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page