Skip to main content

TorchicTab-Heuristic: Semantic Table Annotation with Wikidata

Project description

TorchicTab Heuristic

License Python Versions

TorchicTab is a semantic table annotation system that automatically understands the content of a table and assigns semantic tags to its elements with high accuracy. It was originally developed for the SemTab challenge. You can find more about the full system in our dedicated article and paper.

This repository contains TorchicTab-Heuristic, the TorchicTab subsystem that annotates tables, using the Wikidata knowledge graph as a reference knowledge base. TorchicTab-Heuristic produces annotations for the following semantic annotation tasks:

  • The Cell Entity Annotation (CEA) task associates a table cell with an entity.
  • The Column Type Annotation (CTA) task assigns a semantic type to a column.
  • The Column Property Annotation (CPA) task discovers a semantic relation contained in the RDF graph that best represents the relation between two columns.
  • The Topic Detection (TD) task identifies the topic of a table that lacks a subject column and assigns a class.

TorchicTab-Heuristic Overview

Installation

TorchicTab-Heuristic requires a Python 3.9, 3.10 or 3.11 version. In case of conflicts, create a new virtual environment. For example, if you use conda, run:

conda create -n torchictab_env python=3.11
conda activate torchictab_env

Simple installation:

pip install torchic_tab_heuristic

Optional:

TorchicTab also allows the creation of an Elasticsearch index which contains all Wikidata entity-labels pairs. This allows for enhanced lookup tecnhiques leveraging powerful Elasticsearch functionalities, such as fuzzy querying. To use TorchicTab-Heuristic with Elasticsearch:

  • Download a Wikidata RDF dump from Zenodo

  • Install Elasticsearch. Recommended version: Elasticsearch 8

  • Process config.py file to configure index name and RDF dump adress.

  • Run elasticsearch server:

    cd elasticsearch-X.X.X
    ./bin/elasticsearch
    
  • Create the elasticsearch index:

    python elasticsearch/create_index.py
    

Usage

Example usage of TorchicTab-Heuristic with Wikidata:

Without Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv"

With Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv" -e

Cite

Thank you for reading! To cite our resource:

@InProceedings{dasoulas2023torchictab,
    author    = {Dasoulas, Ioannis and Yang, Duo and Duan, Xuemin and Dimou, Anastasia},
    journal = {CEUR Workshop Proceedings},
    publisher = {CEUR Workshop Proceedings},
    title = {TorchicTab: Semantic Table Annotation with Wikidata and Language Models},
    year = {2023-11-02},
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchic_tab_heuristic-0.1.2.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchic_tab_heuristic-0.1.2-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file torchic_tab_heuristic-0.1.2.tar.gz.

File metadata

  • Download URL: torchic_tab_heuristic-0.1.2.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for torchic_tab_heuristic-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7c0fdb86115996336f80b875fb639dff722f82eed796152b5faddf5706b2fe3f
MD5 e0536cc45cae3593af5fe53be79b051c
BLAKE2b-256 ac56bfc0ed5b655b49a01c9bba00480da1c55cfa43b4730a82303b2982c3b7e1

See more details on using hashes here.

File details

Details for the file torchic_tab_heuristic-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for torchic_tab_heuristic-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 43d14b6ab0204d485455752839120e9a2985d055b5f577943e259fe426711a47
MD5 53f4d776f41f9ec52b67edaab4707d86
BLAKE2b-256 8c78c95d6a2cfb3c27efbd1dfd188ef29684b811d8710ea270212506d9be59d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page