package for QA-based Semantics - representing textual information via question-answer pairs
Project description
QASem - Question-Answer based Semantics
This repository includes software for parsing natural language sentence with various layers of QA-based semantic annotations. We currently support three layers of semantic annotations - QASRL, QANom, and QADiscourse. See an overview of our approach at our paper on QASem Parsing.
QASRL (Question Answer driven Semantic Role Labeling) is a lightweight semantic framework for annotating "who did what to whom, how, when and where". For every verb in the sentence, it provides a set of question-answer pairs, where the answer mark a participant of the event denoted by the verb, while the question captures its semantic role (that is, what is the role of the participant in the event).
"QANom" stands for "QASRL for Nominalizations", which is an adaptation of QASRL to (deverbal) nominalization. See the QANom paper for details about the task.
You can find more information on QASRL's official website, including links to all the papers and datasets and a data browsing utility. We also wrapped the datasets into Huggingface Datasets (QASRL; QANom), which are easier to plug-and-play with (check out our HF profile for other related datasets, such as QAMR, QADiscourse, and QA-Align).
QADiscourse annotates intra-sentential discourse relations with question-answer pairs. It focus on discourse relations that carry information, rather than specifying structural or pragmatic properties of the realied sentencs. Each question starts with one of 17 crafted question prefixes, roughly mapped into PDTB relation senses.
Note: In the future, we will also combine additional layers of QA-based semantic annotations for adjectives and noun modifiers, currently at the stage of ongoing work.
Demo
Check out the live QASem demo on Huggingface.
Installation
Pre-requisite: Python 3.7
Installation is available via pip:
pip install qasem
Installation from source
Clone the repo and install using setup.py
:
git clone https://github.com/kleinay/QASem.git
cd QASem
pip install -e .
Alternatively, If you want to install the dependencies explicitly:
pip install transformers==4.15.0 spacy>=2.3.7 qanom
pip install git+https://github.com/rubenwol/RoleQGeneration.git
In addition, you would need to download a spacy model for pre-requisite tokenization & POS-tagging:
python -m spacy download en_core_web_sm
Usage
The QASemEndToEndPipeline
class would, by demand, parse sentences with any of the QASem semantic annotation layers --- currenlty including 'qasrl', 'qanom' and 'qadiscourse'.
Features
Annotation layers:
By default, the pipeline would parse all layers.
To specify a subset of desired layers, e.g. QASRL and QADiscourse alone, use annotation_layers=('qasrl', 'qadiscourse')
in initialization.
QA-SRL contextualization:
For the sake of generality, QA-SRL and QANom generate ``abstractive'' questions, that replace arguments with placeholders, e.g. "Why was someone interested in something?". However, in some use-cases you might want to have a more natural question with contextualized arguments, e.g. "Why was the doctor interested in Luke 's treatment?". Utilizing the model from Pyatkin et. al., 2021, one can additionally get contextualized questions for QA-SRL and QANom by setting QASemEndToEndPipeline(contextualize=True)
(see example below).
Nominal predicate detection:
nominalization_detection_threshold
--- which can be set globally in initialization and per __call__
--- is the threshold for the nominalization detection model.
A higher threshold (e.g. 0.8
) means capturing less nominal predicates with higher confidence of them being, in context, verb-derived event markers. Default threshold is 0.7
.
OpenIE converter:
Set output_openie=True
(in __call__
) in order to get a reduction of output QAs into Open Information Extraction's tuples format. This option uses the qasem.openie_converter.OpenIEConverter
class to linearize the arguments along with the predicate by the order of occurrence in the source sentence.
The pipeline's output would then be in the form {"qasem": <regular QA outputs>, "openie": <OpenIE tuple outputs>}
.
By default, only verbal QA-SRL QAs would be converted, but one can specify layers_included=["qasrl", "qanom"]
when initializing OpenIEConverter
to also include nominalizations' QAs.
You can set arguments for OpenIEConverter
in the QASemEndToEndPipeline
constructor using the openie_converter_kwargs
argument, e.g. QASemEndToEndPipeline(openie_converter_kwargs={"layers_included": ["qasrl", "qanom"]})
.
Example
from qasem.end_to_end_pipeline import QASemEndToEndPipeline
pipe = QASemEndToEndPipeline(annotation_layers=('qasrl', 'qanom', 'qadiscourse'), nominalization_detection_threshold=0.75, contextualize = True)
sentences = ["The doctor was interested in Luke 's treatment as he was still not feeling well .", "Tom brings the dog to the park."]
outputs = pipe(sentences)
print(outputs)
Outputs
[{'qanom': [
{'QAs': [{
'question': 'who was treated ?',
'answers': ['Luke'],
'contextual_question': 'Who was treated?'}],
'predicate_idx': 7,
'predicate': 'treatment',
'predicate_detector_probability': 0.8152085542678833,
'verb_form': 'treat'}
],
'qasrl': [
...
],
'qadiscourse': [{
'question': 'What is the cause of the doctor being interested in Luke 's treatment?',
'answer': 'he was still not feeling well'}
]},
},
{'qanom': [],
'qasrl': [{'QAs': [
{'question': 'who brings something ?',
'answers': ['Tom'],
'contextual_question': 'Who brings the dog?'},
{'question': ' what does someone bring ?',
'answers': ['the dog'],
'contextual_question': 'What does Tom bring?'},
{'question': ' where does someone bring something ?',
'answers': ['to the park'],
'contextual_question': 'Where does Tom bring the dog?'}],
'predicate_idx': 1,
'predicate': 'brings',
'verb_form': 'bring'}]}
],
'qadiscourse': []
}
Repository for Model Training & Experiments
The underlying QA-SRL and QANom models were trained and evaluated using the code at qasrl-seq2seq repository.
The code for training and evaluating the QADiscourse model will be uploaded soon.
Cite
@article{klein2022qasem,
title={QASem Parsing: Text-to-text Modeling of QA-based Semantics},
author={Klein, Ayal and Hirsch, Eran and Eliav, Ron and Pyatkin, Valentina and Caciularu, Avi and Dagan, Ido},
journal={arXiv preprint arXiv:2205.11413},
year={2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file qasem-0.1.10.tar.gz
.
File metadata
- Download URL: qasem-0.1.10.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e68b8f7b1f3e53a7f3c32ca39b3c8e29d68a4e1249e7165aecf3976b77a24a07 |
|
MD5 | d55914f1692fbd25c3c5ca411e1f9465 |
|
BLAKE2b-256 | ee6d9a452d79024183332f2b56b8ddb97057d3c8613ae7e2a7fffb056b76aa5d |