Skip to main content

Table Transformer

Project description

Table Transformer Library

Original repository: https://github.com/microsoft/table-transformer

Introduction

This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.

Installation

pip install table-transformer

Usage

The full model usage can be found here:

from table_transformer import TableExtractionPipeline

pipe = TableExtractionPipeline(det_device="cpu", str_device="cpu",
                 det_model_path=".\path\to\pubtables1m_detection_detr_r18.pth",
                 str_model_path=".\path\to\TATR-v1.1-Pub-msft.pth")

img = "\path\to\image.jpg"

table_objects, table_cells_coordinates, table_cells_text = pipe(img)

print(table_cells_text[0])  # Should be DataFrame

Evaluation

With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.

You can check out the score and run the evaluation with your own dataset in this link.

Version history

  • v1.0.6: Added Table Detection, ending up with a full Table Extraction Pipeline, fixed bug.
  • v1.0.3: Removed unnecessary code and added new functionalities.
  • v1.0.2: Initial version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

table_transformer-1.0.6-py3-none-any.whl (158.3 kB view details)

Uploaded Python 3

File details

Details for the file table_transformer-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for table_transformer-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6ab5cc39ab38dbdde7ece3681acc495ba2820f10bd1119cbcdd121c3280b4ecb
MD5 ee42c03836481baa6897f32f071f803c
BLAKE2b-256 814700410f269198c276ef936330702699d66e5e0a751f1c28f290f8b02b4bc6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page