Table Transformer
Project description
Table Transformer Library
Original repository: https://github.com/microsoft/table-transformer
Introduction
This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.
Installation
pip install table-transformer
Usage
The full model usage can be found here:
from table_transformer import (TableExtractionPipeline,
get_structure_class_thresholds,
visualize_cells)
from PIL import Image
from easyocr import Reader
# Get the structure class thresholds
structure_class_threshold = get_structure_class_thresholds(
table=0.5,
table_column=0.5,
table_row=0.5,
table_column_header=0.5,
table_projected_row_header=0.5,
table_spanning_cell=0.5,
no_object=10
)
# Initiliaze EasyOCR
reader = Reader(['en'], gpu=False)
# Initialize the Pipeline with the thresholds
table_transformer = TableExtractionPipeline(
structure_class_thresholds=structure_class_threshold,
model_path = "TATR-v1.1-All-msft.onnx",
ocr_model=reader)
# Open the table image in PIL.Image format
directory = "image.png"
img = Image.open(directory)
# Run the pipeline
result = table_transformer(
page_image=img,
out_cells=True,
out_csv=False,
out_html=True)
# Print the result
#print(result["objects"][0])
# print(result["cells"][0])
#print(result["html"][0])
# print(result["csv"][0])
# Visualize the result as Matplotlib plot and return the figure
visualize = visualize_cells(img=img, cells=result["cells"][0], show_plot=False)
#print(type(visualize)) # PIL.Image.Image
print("Test successful!")
Evaluation
With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.
You can check out the score and run the evaluation with your own dataset in this link.
Version history
- v1.0.3: Removed unnecessary code and added new functionalities.
- v1.0.2: Initial version.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for table_transformer-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ee09b3d5321844f4940cd51bbf588f5e6127910c76696096706e8f73211db23 |
|
MD5 | 93e388bc5daef84d935094e80c2261db |
|
BLAKE2b-256 | 0d205161def23fbdad417569e86a0ccd3e3a18feaef312a759a2c9f6aeaa3842 |