Extract table data from images and scanned PDFs. Easily convert image to excel, convert pdf to table

These details have not been verified by PyPI

Project links

Homepage

Project description

Overview

ExtractTable - API to extract tabular data from images and scanned PDFs

The motivation is to make it easy for developers to extract tabular data from images or scanned PDF files without worrying about the table area, column coordinates, rotation et al.

Prerequisite

API Key: All requests to ExtractTable are authorized by an API Key. FREE credits here. The same API Key can also be used for conversions on the browser at Web Pro.

Installation

pip install -U ExtractTable

Basic Usage

Ok, enough selling. Let the ease in coding do the talk, and the output encourages you to buy credits; put that timer on and count the LOC.

from ExtractTable import ExtractTable
et_sess = ExtractTable(api_key=YOUR_API_KEY)        # Replace your VALID API Key here
print(et_sess.check_usage())        # Checks the API Key validity as well as shows associated plan usage 
table_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format="df")

# To process PDF, make use of pages ("1", "1,3-4", "all") params in the read_pdf function
table_data = et_sess.process_file(filepath=Location_of_PDF_with_Tables, output_format="df", pages="all")

Detailed Library Usage

The tutorial available at takes you through

1. Installation
2. Import and check version
3. Create Session & Validate API Key
    3.1 Create Session with your API Key
    3.2 Validate the Key and check the plan usage
    3.3 Check Usage Details
4. Trigger the extraction process
    4.1 Accepted Input Types
    4.2 Process an IMAGE Input
    4.3 Process a PDF Input
    4.4 Output options
    4.5 Explore session objects
5. Explore the Output
    5.1 Output Structure
    5.2 Output Details
6. Make Corrections
    6.1 Split Merged Rows
    6.2 Split Merged Columns
    6.3 Fix Decimal Format
    6.4 Fix Date Format
7. Helpful Code Snippets
    7.1 Get text data
    7.2 Table output to Excel

Woahh, as simple as that ?!

Certainly. Do you know the current ExtractTable users use it for

Bank Statement
Medical Records
Invoice Details
Tax forms
Tender Notices

Its up to you now to explore the ways.

Explore

check the complete server response of the latest job with et_sess.ServerResponse.json()

{
    "JobStatus": <string>,                              # Status of the triggered Process  @ JOB-LEVEL
    "Pages": <integer>,                                 # Number of pages processed in this request @ PAGE-LEVEL
    "Tables": [<list of key-value objects of table>     # List of all tables found @ TABLE-LEVEL
        {
            "Page": <integer>,                              ## Page number in which this table is found
            "CharacterConfidence": <float>,                 ## Accuracy of Characters recognized from the input-page
            "LayoutConfidence": <float>,                    ## Accuracy of table layout's design decision
            "TableJson": <dict>,                            ## Table Cell Text in key-value format with index orientation - {row#: {col#: <str>}}
            "TableCoordinates": <dict>,                     ## Top-left & Bottom-right Cell Coordinates - {row#: {col#: <list(x1,y1,x2,y2)>}}
            "TableConfidence": <dict>                       ## Cell level accuracy of detected characters - {row#: {col#: <float>}}
        },
    {...}                                               ## ... more "Tables" objects
    ],
    "Lines": [<list of key-value objects>               # Pagewise Line details @ PAGE-LEVEL
        {
            "Page": <integer>,                          # Page number in which the lines are found
            "CharacterConfidence": <float>,             # Average Accuracy of all Characters recognized from the input-page
            "LinesArray": [
                <list of key-value objects of line>     # Ordered list of lines in this page @ LINE-LEVEL
                {
                    "Line": <str>,                          ## Detected text of the complete line
                    "WordsArray": [
                        <list of key-value objects>         ## Word level datails in this line @ WORD-LEVEL
                        {
                            "Conf": <float>,                    ### Accuracy of recognized characters of the word
                            "Word": <str>,                      ### Detected text of the word
                            "Loc": [x1, y1, x2, y2]             ### Top-left & Bottom-right coordinates, w.r.t the input-page width-height dimensions
                        },
                    {...}                                   ### More "WordsArray" objects
                    ]
                },
            {...}                                       ## More "LinesArray" objects
            ]
        },
    {...}                                               # More Pagewise "Lines" details
    ]
}

Bug Reports

Bug reports/fixes are most welcome and greatly appreciated with API credits. For support reach us at pydevs@extracttable.com

License

This project is licensed under the Apache License 2.0, see the LICENSE file for details.

Social Media

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.4.0

Jul 18, 2022

2.3.1

May 6, 2022

2.2.0

Apr 20, 2021

2.1.2

Nov 6, 2020

2.1.1

Sep 26, 2020

2.1.0

Aug 27, 2020

2.0.2

Jul 4, 2020

2.0.1

Jul 3, 2020

2.0.1b0 pre-release yanked

May 26, 2020

Reason this release was yanked:

Upgraded to 2.1.0

2.0.0

Apr 30, 2020

2.0.0b1 pre-release

Apr 19, 2020

2.0.0b0 pre-release

Apr 18, 2020

1.2.1.2

Dec 1, 2019

1.2.1.1

Nov 23, 2019

1.2.0

Oct 20, 2019

1.0.1

Oct 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ExtractTable-2.4.0-py3-none-any.whl (19.4 kB view details)

Uploaded Jul 18, 2022 Python 3

File details

Details for the file ExtractTable-2.4.0-py3-none-any.whl.

File metadata

Download URL: ExtractTable-2.4.0-py3-none-any.whl
Upload date: Jul 18, 2022
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.9

File hashes

Hashes for ExtractTable-2.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`17defe519d98af42358641a3f34b6a63b14a2619b7a4ecbacb6c75ec7d60b456`
MD5	`f905f9de2d6c2b595bd9be87e599966b`
BLAKE2b-256	`b66b60994ebbcd72e6776151d64e74eb246d2a51c3fbf274a2f8cefc9264445b`

See more details on using hashes here.

ExtractTable 2.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Prerequisite

Installation

Basic Usage

Detailed Library Usage

Woahh, as simple as that ?!

Explore

Bug Reports

License

Social Media

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes