Streamlined Scanned Image Table Extraction and Excel Conversion
Project description
TabulaScan
TabulaScan: Streamlined Scanned Image Table Extraction and Excel Conversion
Project Overview :
TabulaScan is a cutting-edge solution designed to automate the process of table detection, recognition, and extraction from scanned images, transforming them into Excel files with remarkable accuracy and efficiency. With TabulaScan, you can swiftly transform paper-based tables into structured, editable Excel files, enabling seamless integration into your data management processes.
Key Features :
Precise Table Identification : Our algorithm can precisely locate tables within scanned images, even in cases with complex layouts and diverse fonts.
Robust Image Quality Handling : It's capable of handling varying image quality levels, ensuring reliable performance across different scanned documents.
Data Extraction : Beyond table detection, this algorithm excels at extracting data from these tables, making it a comprehensive tool for data analysis
Output to Excel : Convert recognized tables into Excel files, preserving data structure and format.
Getting Started :
These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites :
To run this project, you'll need:
- paddleocr
- ultralyticsplus (version 0.0.23)
- ultralytics (version 8.0.21)
- opencv2
- pandas
- csv
- tensorflow
- PIL
Install the required libraries :
!pip install paddlepaddle
!pip install paddleocr
!pip install pytesseract transformers ultralyticsplus==0.0.23 ultralytics==8.0.21
If you are using Google Colab Add (Optional)
!wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
!sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
Run the algorithm:
python TabulaScan.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file TabulaScan-0.1.0.tar.gz
.
File metadata
- Download URL: TabulaScan-0.1.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c51b1cc418e828e7fb7ee989fba3b2755cad4342db7c689af4a153d8ae5c7dda |
|
MD5 | 67f0d81e6d76e1497bccec7e38fbaf09 |
|
BLAKE2b-256 | fbb97e4b0ee8a97edef7a06520d1e3a991bb29ac477032eb17731aa34f1cd170 |
File details
Details for the file TabulaScan-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: TabulaScan-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eebfd9449880660324c87a36dcffd7182239974030d27d2df468d8d88b283486 |
|
MD5 | 87751f7741e1ebb7315fcbc5bb783c8a |
|
BLAKE2b-256 | bb6a7a1143399b80566d5551c677e7f097a06e7995921493df22b746cdcefdd9 |