Table Transformer
Project description
Table Transformer Library
Original repository: https://github.com/microsoft/table-transformer
Introduction
This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.
Installation
pip install table-transformer
Usage
The full model usage can be found here:
from table_transformer import TableExtractionPipeline, get_cell_coordinates_by_row, apply_ocr, get_detection_class_thresholds, get_structure_class_thresholds
import numpy as np
import csv
import easyocr
from tqdm.auto import tqdm
det_config = "table_transformer\src\detection_config.json"
str_config = "table_transformer\src\structure_config.json"
pipe = TableExtractionPipeline(det_device="cpu", str_device="cpu",
det_model_path=".\pubtables1m_detection_detr_r18.pth",
str_model_path=".\TATR-v1.1-Pub-msft.pth",
det_config_path=det_config, str_config_path=str_config,)
from PIL import Image
img = Image.open("table.jpg")
reader = easyocr.Reader(['en','vi']) # this needs to run only once to load the model into memory
pipe(reader, img)
Evaluation
With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.
You can check out the score and run the evaluation with your own dataset in this link.
Version history
- v1.0.5: Added Table Detection, ending up with a full Table Extraction Pipeline.
- v1.0.3: Removed unnecessary code and added new functionalities.
- v1.0.2: Initial version.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file table_transformer-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: table_transformer-1.0.5-py3-none-any.whl
- Upload date:
- Size: 582.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f95e9c5464c237ef91ec5171dc458a71e2e41ffa22f7c4803fa3e9bf60c1c4f |
|
MD5 | 1ff079b21bf8921796efaea7a216bce6 |
|
BLAKE2b-256 | 85583c915a91559c2fb7f72f5d849e069f158f5af1052293ca448950687e5532 |