End-to-End table structure detector
Project description
Tabledetector
Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.
Features
- PDF Input: Accepts PDF/Image files as input for table detection.
- Alignment Check: Verifies and adjusts alignment of input.
- Table Detection: Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
- Table Extraction: Extract the tabular data in the form of dataframe.
Libraries Used
- Python 3.x
- OpenCV
- NumPy
- pdf2image
- Pillow
- scipy
- jinja2
- easyocr
- pandas
Create and Activate Environment
conda create -n <env_name> python=3.7
conda activate <env_name>
Installation of package using pip
pip install tabledetector
Clone the repository for latest development release
git clone https://github.com/rajban94/TableDetector.git
Dependency
To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.
Usage
For bordered table detection:
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="bordered")
For semibordered table detection:
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered")
For unbordered table detection:
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered")
If no method is mentioned in that case it will check for all the methods and will provide the result accordingly.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for tabledetector-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ce6a5336725d9449af9d7680e65252a45ea1e2c705022ec6959d8667df80d88 |
|
MD5 | 17a8f71b5e2861e908bb3636af9fb716 |
|
BLAKE2b-256 | db52d861ad6a82ce2bb067556f4f1ffb4287d9c1a669f54c1540bfad2f773414 |