Skip to main content

TorchicTab-Heuristic: Semantic Table Annotation with Wikidata

Project description

TorchicTab Heuristic

License Python Versions

TorchicTab is a semantic table annotation system that automatically understands the content of a table and assigns semantic tags to its elements with high accuracy. It was originally developed for the SemTab challenge. You can find more about the full system in our dedicated article and paper.

This repository contains TorchicTab-Heuristic, the TorchicTab subsystem that annotates tables, using the Wikidata knowledge graph as a reference knowledge base. TorchicTab-Heuristic produces annotations for the following semantic annotation tasks:

  • The Cell Entity Annotation (CEA) task associates a table cell with an entity.
  • The Column Type Annotation (CTA) task assigns a semantic type to a column.
  • The Column Property Annotation (CPA) task discovers a semantic relation contained in the RDF graph that best represents the relation between two columns.
  • The Topic Detection (TD) task identifies the topic of a table that lacks a subject column and assigns a class.

TorchicTab-Heuristic Overview

Installation

TorchicTab-Heuristic requires a Python 3.9, 3.10 or 3.11 version.

Simple installation:

pip install -e .

Optional:

TorchicTab also allows the creation of an Elasticsearch index which contains all Wikidata entity-labels pairs. This allows for enhanced lookup tecnhiques leveraging powerful Elasticsearch functionalities, such as fuzzy querying. To use TorchicTab-Heuristic with Elasticsearch:

  • Download a Wikidata RDF dump from Zenodo

  • Install (Elasticsearch)[https://www.elastic.co/downloads/elasticsearch]. Recommended version: Elasticsearch 8

  • Process config.py file to configure index name and RDF dump adress.

  • Run elasticsearch server:

    cd elasticsearch-X.X.X
    ./bin/elasticsearch
    
  • Create the elasticsearch index:

    python elasticsearch/create_index.py
    

Usage

Example usage of TorchicTab-Heuristic with Wikidata:

Without Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv"

With Elasticsearch

python examples/sta_demo.py -i "examples/tables/cities.csv" -e

Cite

Thank you for reading! To cite our resource:

@InProceedings{dasoulas2023torchictab,
    author    = {Dasoulas, Ioannis and Yang, Duo and Duan, Xuemin and Dimou, Anastasia},
    journal = {CEUR Workshop Proceedings},
    publisher = {CEUR Workshop Proceedings},
    title = {TorchicTab: Semantic Table Annotation with Wikidata and Language Models},
    year = {2023-11-02},
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchic_tab_heuristic-0.1.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchic_tab_heuristic-0.1.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file torchic_tab_heuristic-0.1.0.tar.gz.

File metadata

  • Download URL: torchic_tab_heuristic-0.1.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.19

File hashes

Hashes for torchic_tab_heuristic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 10866d8e74ebd596897394f58678a3ddafeaecc7e724191e50630bca933d0b2f
MD5 ebe3e1d4bbd0b46581ad7f5f4579b8f6
BLAKE2b-256 d3a626ad73f056ab0a7166e61d0c3be0141c212b27ef2649634939d2de7ac6a2

See more details on using hashes here.

File details

Details for the file torchic_tab_heuristic-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for torchic_tab_heuristic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3985f9baab9d2c1d751c1eee4bbd04fcb0935b06bdd6a53e70cd647742b50db
MD5 8955a26ea4ebd8d905d5900e560c251a
BLAKE2b-256 248426058f69666958563d992b40b8f1b31ab93747afb4ee087118f5097565f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page