Skip to main content

Streamlined Scanned Image Table Extraction and Excel Conversion

Project description

TabulaScan

TabulaScan: Streamlined Scanned Image Table Extraction and Excel Conversion

Project Overview :

TabulaScan is a cutting-edge solution designed to automate the process of table detection, recognition, and extraction from scanned images, transforming them into Excel files with remarkable accuracy and efficiency. With TabulaScan, you can swiftly transform paper-based tables into structured, editable Excel files, enabling seamless integration into your data management processes.

Key Features :

Precise Table Identification : Our algorithm can precisely locate tables within scanned images, even in cases with complex layouts and diverse fonts.

Robust Image Quality Handling : It's capable of handling varying image quality levels, ensuring reliable performance across different scanned documents.

Data Extraction : Beyond table detection, this algorithm excels at extracting data from these tables, making it a comprehensive tool for data analysis

Output to Excel : Convert recognized tables into Excel files, preserving data structure and format.

Getting Started :

These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites :

To run this project, you'll need:

- paddleocr
- ultralyticsplus (version 0.0.23)
- ultralytics (version 8.0.21)
- opencv2
- pandas
- csv
- tensorflow
- PIL

Install the required libraries :

!pip install paddlepaddle
!pip install paddleocr
!pip install pytesseract transformers ultralyticsplus==0.0.23 ultralytics==8.0.21

If you are using Google Colab Add (Optional)

!wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
!sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb

Run the algorithm:

python TabulaScan.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TabulaScan-0.1.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

TabulaScan-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file TabulaScan-0.1.0.tar.gz.

File metadata

  • Download URL: TabulaScan-0.1.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for TabulaScan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c51b1cc418e828e7fb7ee989fba3b2755cad4342db7c689af4a153d8ae5c7dda
MD5 67f0d81e6d76e1497bccec7e38fbaf09
BLAKE2b-256 fbb97e4b0ee8a97edef7a06520d1e3a991bb29ac477032eb17731aa34f1cd170

See more details on using hashes here.

File details

Details for the file TabulaScan-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: TabulaScan-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for TabulaScan-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eebfd9449880660324c87a36dcffd7182239974030d27d2df468d8d88b283486
MD5 87751f7741e1ebb7315fcbc5bb783c8a
BLAKE2b-256 bb6a7a1143399b80566d5551c677e7f097a06e7995921493df22b746cdcefdd9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page