Table Transformer
Project description
Table Transformer Library
Original repository: https://github.com/microsoft/table-transformer
Introduction
This is the Table Transformer Model developed by Brandon Smock et al. of Microsoft AI. This repository consists of Table Structure Recognition (TATR) for detecting and extracting table infomation into popular formats such as CSV or HTML table, plus text recognition using EasyOCR.
Installation
pip install table-transformer
Usage
The full model usage can be found here:
from table_transformer import TableExtractionPipeline
pipe = TableExtractionPipeline(det_device="cpu", str_device="cpu",
det_model_path=".\path\to\pubtables1m_detection_detr_r18.pth",
str_model_path=".\path\to\TATR-v1.1-Pub-msft.pth")
img = "\path\to\image.jpg"
table_objects, table_cells_coordinates, table_cells_text = pipe(img)
print(table_cells_text[0]) # Should be DataFrame
Evaluation
With structure recognition, the original author has evaluated the v1.0 model on PubTables-1M with great results. With other datasets such as PubTabNet, the score is quite good.
You can check out the score and run the evaluation with your own dataset in this link.
Version history
- v1.0.6: Added Table Detection, ending up with a full Table Extraction Pipeline, fixed bug.
- v1.0.3: Removed unnecessary code and added new functionalities.
- v1.0.2: Initial version.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file table_transformer-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: table_transformer-1.0.6-py3-none-any.whl
- Upload date:
- Size: 158.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ab5cc39ab38dbdde7ece3681acc495ba2820f10bd1119cbcdd121c3280b4ecb |
|
MD5 | ee42c03836481baa6897f32f071f803c |
|
BLAKE2b-256 | 814700410f269198c276ef936330702699d66e5e0a751f1c28f290f8b02b4bc6 |