TTQAKit: A toolkit for Text-Table Hybrid Question Answering.

These details have not been verified by PyPI

Project links

Homepage

Project description

🌐Website | 🎥Video | 📦PyPI | 🤗Huggingface Datasets

TableQAKit: A Toolkit for Table Question Answering

🔥 Updates

[2023-8-7]: We released our code, datasets and PyPI Package. Check it out!

✨ Features

TableQAKit is a unified platform for TableQA (especially in the LLM era). Its main features includes:

Extensible design: You can use the interfaces defined by the toolkit, extend methods and models, and implement your own new models based on your own data.
Equipped with LLM: TableQAKit supports LLM-based methods, including LLM-prompting methods and LLM-finetuning methods.
Comprehensive datasets: We design a unified data interface to process data and store them in Huggingface datasets.
Powerful methods: Using our toolkit, you can reproduce most of the SOTA methods for TableQA tasks.
Efficient LLM benchmark: TableQAEval, a benchmark to evaluate the performance of LLM for TableQA. It evaluates LLM's modeling ability of long tables (context) and comprehension capabilities (numerical reasoning, multi-hop reasoning).
Comprehensive Survey: We are about to release a systematic TableQA Survey, this project is a pre-work.

⚙️ Install

pip install tableqakit
or
git clone git@github.com:lfy79001/TableQAKit.git
pip install -r requirements.txt

pip install ttqakit

📁 Folder

The TableQAKit repository is structured as follows:

├── icl/ # LLM-prompting toolkit
├── llama/ # LLM-finetuning toolkit
├── mmqa_utils/ # EncyclopediaQA toolkit
│   ├── classifier_module/ # The package for classifier
│   ├── retriever_module/ # The package for encyclopedia retrieval
├── structuredqa/ # Read model TaLMs
│   ├── builder/
│   ├── utils/
├── retriever/ # TableQA's general retriever （SpreadSheet examples）
├── multihop/ # Readers for encyclopediaQA
│   ├── Retrieval/
│   └── Read/
├── numerical/ # Readers for some TableQA datasets
├── TableQAEval/ # The proposed new LLM-Long-Table Benchmark
│   ├── Baselines/ # Add your LLMs
│   │   ├── turbo16k-table.py
│   │   ├── llama2-chat-table.py
│   │   └── ...
│   ├── Evaluation/ # metrics
│   └── TableQAEval.json  
├── outputs/ # the results of some models
├── loaders/ 
│   ├── WikiSQL.py
│   └── ...
├── structs/ 
│   ├── data.py
├── static/ 
├── LICENSE
└── README.md

🗃️ Dataset

According to our taxonomy, we classify the TableQA task into three categories of tasks, as shown in the following figure:

🔧 Get started

Retrieval Modules

QuickStart

MultiHiertt Dataset as a demonstration

from TableQAKit.retriever import MultiHierttTrainer


trainer = MultiHierttTrainer()

# train stage:
trainer.train()

# infer stage:
trainer.infer()

Train

python main.py \
--train_mode row \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--dataloader_pin_memory False \
--output_dir ./ckpt \
--train_path ./data/train.json \
--val_path ./data/val.json \
--save_steps 1000 \
--logging_steps 20 \
--learning_rate 0.00001 \
--top_n_for_eval 10 \
--encoder_path ./PLM/bert-base-uncased/

Inference

python infer.py \
--output_dir ./ckpt \
--encoder_path ./ckpt/encoder/deberta-large \
--dataloader_pin_memory False \
--ckpt_for_test ./ckpt/retriever/deberta/epoch1_step30000.pt \
--test_path ./data/MultiHiertt/test.json \
--test_out_path ./prediction.json \
--top_n_for_test 10

Create Trainer for New Dataset

from TableQAKit.retriever import RetrieverTrainer as RT

class NewTrainer(RT):
    def read_data(self, data_path: str) -> List[Dict]:
        """

        :param data_path: The path of data
        :return: List of raw data
        [
            data_1,
            data_2,
            ……
        ]
        """
        data = json.load(
            open(data_path, 'r', encoding='utf-8')
        )
        return data

    def data_proc(self, instance) -> Dict:
        """

        :return:
        {
            "id": str,
            "question": str,
            "rows": list[str],
            "labels": list[int]
        }
        """
        rows = instance["paragraphs"]
        labels = [0] * len(instance["paragraphs"])
        if len(instance["qa"]["text_evidence"]):
            for text_evidence in instance["qa"]["text_evidence"]:
                labels[text_evidence] = 1
        for k, v in instance["table_description"].items():
            rows.append(v)
            labels.append(1 if k in instance["qa"]["table_evidence"] else 0)
        return {
            "id": instance["uid"],
            "question": instance["qa"]["question"],
            "rows": rows,
            "labels": labels
        }

LLM-Prompting Methods

Check hear for more details.

LLM-Finetuning Methods

Check hear for more details.

Reading Modules

TaLM Reasoner

Check hear for more details.

Multimodal Reasoner

Check hear for more details.

TableQAEval

TableQAEval is a benchmark to evaluate the performance of LLM for TableQA. It evaluates LLM's modeling ability of long tables (context) and comprehension capabilities (numerical reasoning, multi-hop reasoning).

Leaderboard

Model	Parameters	Numerical Reasoning	Multi-hop Reasoning	Structured Reasoning	Total
Turbo-16k-0613	-	20.3	52.8	54.3	43.5
LLaMA2-7b-chat	7B	2.0	14.2	13.4	12.6
ChatGLM2-6b-8k	6B	1.4	10.1	11.5	10.2
LLaMA2-7b-4k	7B	0.8	9.2	5.4	6.6
longchat-7b-16k	7B	0.3	7.1	5.1	5.2
LLaMA-7b-2k	7B	0.5	7.3	4.1	4.5
MPT-7b-65k	7B	0.3	3.2	2.0	2.3
LongLLaMA-3b	3B	0.0	4.3	1.7	2.0

More details are shown in TableQAEval.

✅ TODO

We will continue to optimize the toolkit.

Acknowledge

Primary contributors: Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yifan Wei, Shizhu He and Kang Liu.

Thank you very much to Yilun Zhao（Yale University）and Yongwei Zhou (HIT) for their assistance.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.0

Aug 9, 2023

0.1.0

Aug 8, 2023

0.0.2

Aug 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttqakit-1.0.0.tar.gz (231.6 kB view details)

Uploaded Aug 9, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ttqakit-1.0.0-py3-none-any.whl (294.1 kB view details)

Uploaded Aug 9, 2023 Python 3

File details

Details for the file ttqakit-1.0.0.tar.gz.

File metadata

Download URL: ttqakit-1.0.0.tar.gz
Upload date: Aug 9, 2023
Size: 231.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.11

File hashes

Hashes for ttqakit-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`db5c488b179be220f73b2a60726d54b8d9cc288530dcd0c8f9ba3d3090ab5758`
MD5	`df2e1d053f4ee99d441b6f88ac0f4cf6`
BLAKE2b-256	`7fcc0d95410ce6c9ab22d8cc4ea2bf01e8aaa4b4ad8618eb329ee1c8df246eb2`

See more details on using hashes here.

File details

Details for the file ttqakit-1.0.0-py3-none-any.whl.

File metadata

Download URL: ttqakit-1.0.0-py3-none-any.whl
Upload date: Aug 9, 2023
Size: 294.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.11

File hashes

Hashes for ttqakit-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a07504bfc0c8820ffae0f6574852b7f7568a2953e7da92c795d241fdea9aef79`
MD5	`0340482e44cbdbed2b8d672b798cdc49`
BLAKE2b-256	`904ba0092baf2b5e0766361fe3c0ecf22ceb0e5250f328e6c4aff79f4d154d7d`

See more details on using hashes here.

ttqakit 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TableQAKit: A Toolkit for Table Question Answering

🔥 Updates

✨ Features

⚙️ Install

📁 Folder

🗃️ Dataset

🔧 Get started

Retrieval Modules

QuickStart

Train

Inference

Create Trainer for New Dataset

LLM-Prompting Methods

LLM-Finetuning Methods

Reading Modules

TaLM Reasoner

Multimodal Reasoner

TableQAEval

Leaderboard

✅ TODO

Acknowledge

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes