Skip to main content

Table Transformer

Project description

Table Transformer Library

Original repository: https://github.com/microsoft/table-transformer

Introduction

This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.

Installation

pip install table-transformer

Usage

The full model usage can be found here:

from table_transformer import (TableExtractionPipeline,
                               get_structure_class_thresholds,
                               visualize_cells)
from PIL import Image
from easyocr import Reader

# Get the structure class thresholds
structure_class_threshold = get_structure_class_thresholds(
    table=0.5,
    table_column=0.5,
    table_row=0.5,
    table_column_header=0.5,
    table_projected_row_header=0.5,
    table_spanning_cell=0.5,
    no_object=10
    )

# Initiliaze EasyOCR
reader = Reader(['en'], gpu=False)

# Initialize the Pipeline with the thresholds
table_transformer = TableExtractionPipeline(
    structure_class_thresholds=structure_class_threshold,
    model_path = "TATR-v1.1-All-msft.onnx",
    ocr_model=reader)

# Open the table image in PIL.Image format
directory = "image.png"

img = Image.open(directory)
# Run the pipeline
result = table_transformer(
    page_image=img,
    out_cells=True,
    out_csv=False,
    out_html=True)

# Print the result
#print(result["objects"][0])
# print(result["cells"][0])
#print(result["html"][0])
# print(result["csv"][0])

# Visualize the result as Matplotlib plot and return the figure
visualize = visualize_cells(img=img, cells=result["cells"][0], show_plot=False)
#print(type(visualize)) # PIL.Image.Image
print("Test successful!")

Evaluation

With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.

You can check out the score and run the evaluation with your own dataset in this link.

Version history

  • v1.0.3: Removed unnecessary code and added new functionalities.
  • v1.0.2: Initial version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

table_transformer-1.0.3-py3-none-any.whl (27.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page