Skip to main content

Table Transformer

Project description

Table Transformer Library

Original repository: https://github.com/microsoft/table-transformer

Introduction

This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.

Installation

pip install table-transformer

Usage

The full model usage can be found here:

from table_transformer import TableExtractionPipeline, get_cell_coordinates_by_row, apply_ocr, get_detection_class_thresholds, get_structure_class_thresholds

import numpy as np
import csv
import easyocr
from tqdm.auto import tqdm

det_config = "table_transformer\src\detection_config.json"
str_config = "table_transformer\src\structure_config.json"


pipe = TableExtractionPipeline(det_device="cpu", str_device="cpu",
                 det_model_path=".\pubtables1m_detection_detr_r18.pth",
                 str_model_path=".\TATR-v1.1-Pub-msft.pth",
                 det_config_path=det_config, str_config_path=str_config,)

from PIL import Image

img = Image.open("table.jpg")

reader = easyocr.Reader(['en','vi']) # this needs to run only once to load the model into memory

pipe(reader, img)
     

Evaluation

With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.

You can check out the score and run the evaluation with your own dataset in this link.

Version history

  • v1.0.5: Added Table Detection, ending up with a full Table Extraction Pipeline.
  • v1.0.3: Removed unnecessary code and added new functionalities.
  • v1.0.2: Initial version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

table_transformer-1.0.5-py3-none-any.whl (582.0 kB view details)

Uploaded Python 3

File details

Details for the file table_transformer-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for table_transformer-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0f95e9c5464c237ef91ec5171dc458a71e2e41ffa22f7c4803fa3e9bf60c1c4f
MD5 1ff079b21bf8921796efaea7a216bce6
BLAKE2b-256 85583c915a91559c2fb7f72f5d849e069f158f5af1052293ca448950687e5532

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page