TorchicTab-Heuristic: Semantic Table Annotation with Wikidata
Project description
TorchicTab Heuristic
TorchicTab is a semantic table annotation system that automatically understands the content of a table and assigns semantic tags to its elements with high accuracy. It was originally developed for the SemTab challenge. You can find more about the full system in our dedicated article and paper.
This repository contains TorchicTab-Heuristic, the TorchicTab subsystem that annotates tables, using the Wikidata knowledge graph as a reference knowledge base. TorchicTab-Heuristic produces annotations for the following semantic annotation tasks:
- The Cell Entity Annotation (CEA) task associates a table cell with an entity.
- The Column Type Annotation (CTA) task assigns a semantic type to a column.
- The Column Property Annotation (CPA) task discovers a semantic relation contained in the RDF graph that best represents the relation between two columns.
- The Topic Detection (TD) task identifies the topic of a table that lacks a subject column and assigns a class.
Installation
TorchicTab-Heuristic requires a Python 3.9, 3.10 or 3.11 version. In case of conflicts, create a new virtual environment. For example, if you use conda, run:
conda create -n torchictab_env python=3.11
conda activate torchictab_env
Simple installation:
pip install torchic_tab_heuristic
Optional:
TorchicTab also allows the creation of an Elasticsearch index which contains all Wikidata entity-labels pairs. This allows for enhanced lookup tecnhiques leveraging powerful Elasticsearch functionalities, such as fuzzy querying. To use TorchicTab-Heuristic with Elasticsearch:
-
Download a Wikidata RDF dump from Zenodo
-
Install Elasticsearch. Recommended version: Elasticsearch 8
-
Process
config.pyfile to configure index name and RDF dump adress. -
Run elasticsearch server:
cd elasticsearch-X.X.X ./bin/elasticsearch
-
Create the elasticsearch index:
python elasticsearch/create_index.py
Usage
Example usage of TorchicTab-Heuristic with Wikidata:
Without Elasticsearch
python examples/sta_demo.py -i "examples/tables/cities.csv"
With Elasticsearch
python examples/sta_demo.py -i "examples/tables/cities.csv" -e
Cite
Thank you for reading! To cite our resource:
@InProceedings{dasoulas2023torchictab,
author = {Dasoulas, Ioannis and Yang, Duo and Duan, Xuemin and Dimou, Anastasia},
journal = {CEUR Workshop Proceedings},
publisher = {CEUR Workshop Proceedings},
title = {TorchicTab: Semantic Table Annotation with Wikidata and Language Models},
year = {2023-11-02},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torchic_tab_heuristic-0.1.2.tar.gz.
File metadata
- Download URL: torchic_tab_heuristic-0.1.2.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c0fdb86115996336f80b875fb639dff722f82eed796152b5faddf5706b2fe3f
|
|
| MD5 |
e0536cc45cae3593af5fe53be79b051c
|
|
| BLAKE2b-256 |
ac56bfc0ed5b655b49a01c9bba00480da1c55cfa43b4730a82303b2982c3b7e1
|
File details
Details for the file torchic_tab_heuristic-0.1.2-py3-none-any.whl.
File metadata
- Download URL: torchic_tab_heuristic-0.1.2-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43d14b6ab0204d485455752839120e9a2985d055b5f577943e259fe426711a47
|
|
| MD5 |
53f4d776f41f9ec52b67edaab4707d86
|
|
| BLAKE2b-256 |
8c78c95d6a2cfb3c27efbd1dfd188ef29684b811d8710ea270212506d9be59d9
|