Skip to main content

package for QA-based Semantics - representing textual information via question-answer pairs

Project description

QASem - Question-Answer based Semantics

This repository includes software for parsing natural language sentence with various layers of QA-based semantic annotations. We currently support three layers of semantic annotations - QASRL, QANom, and QADiscourse. See an overview of our approach at our paper on QASem Parsing.

QASRL (Question Answer driven Semantic Role Labeling) is a lightweight semantic framework for annotating "who did what to whom, how, when and where". For every verb in the sentence, it provides a set of question-answer pairs, where the answer mark a participant of the event denoted by the verb, while the question captures its semantic role (that is, what is the role of the participant in the event).

"QANom" stands for "QASRL for Nominalizations", which is an adaptation of QASRL to (deverbal) nominalization. See the QANom paper for details about the task.

You can find more information on QASRL's official website, including links to all the papers and datasets and a data browsing utility. We also wrapped the datasets into Huggingface Datasets (QASRL; QANom), which are easier to plug-and-play with (check out our HF profile for other related datasets, such as QAMR, QADiscourse, and QA-Align).

QADiscourse annotates intra-sentential discourse relations with question-answer pairs. It focus on discourse relations that carry information, rather than specifying structural or pragmatic properties of the realied sentencs. Each question starts with one of 17 crafted question prefixes, roughly mapped into PDTB relation senses.

Note: In the future, we will also combine additional layers of QA-based semantic annotations for adjectives and noun modifiers, currently at the stage of ongoing work.

Demo

Check out the live QASem demo on Huggingface.

Installation

Pre-requisite: Python 3.7

We will soon release a first version to pypi. Meantime, the simplest way to get it work is to clone the repo and install using setup.py:

git clone https://github.com/kleinay/QASem.git
cd QASem
pip install -e .

Alternatively, If you want to install the dependencies explicitly:

pip install transformers==4.15.0 spacy>=2.3.7 qanom 
pip install git+https://github.com/rubenwol/RoleQGeneration.git

In addition, you would need to download a spacy model for pre-requisite tokenization & POS-tagging:

python -m spacy download en_core_web_sm

Usage

The QASemEndToEndPipeline class would, by demand, parse sentences with any of the QASem semantic annotation layers --- currenlty including 'qasrl', 'qanom' and 'qadiscourse'.

Features

Annotation layers: By default, the pipeline would parse all layers. To specify a subset of desired layers, e.g. QASRL and QADiscourse alone, use annotation_layers=('qasrl', 'qadiscourse') in initialization.

QA-SRL contextualization: For the sake of generality, QA-SRL and QANom generate ``abstractive'' questions, that replace arguments with placeholders, e.g. "Why was someone interested in something?". However, in use-cases you might want to have a more natural question with contextualized arguments, e.g. "Why was the doctor interested in Luke 's treatment?". Utilizing the model from Pyatkin et. al., 2021, one can additionally get contextualized questions for QA-SRL and QANom by setting QASemEndToEndPipeline(contextualize=True) (see example below).

Nominal predicate detection: nominalization_detection_threshold --- which can be set globally in initialization and per __call__ --- is the threshold for the nominalization detection model. A higher threshold (e.g. 0.8) means capturing less nominal predicates with higher confidence of them being, in context, verb-derived event markers. Default threshold is 0.7.

OpenIE converter: Set output_openie=True (in __call__) in order to get a reduction of output QAs into Open Information Extraction's tuples format. This option uses the qasem.openie_converter.OpenIEConverter class to linearize the arguments along with the predicate by the order of occurrence in the source sentence. The pipeline's output would then be in the form {"qasem": <regular QA outputs>, "openie": <OpenIE tuple outputs>}.

By default, only verbal QA-SRL QAs would be converted, but one can also sepcify layers_included=["qasrl", "qanom"] when initializing OpenIEConverter to also include nominalizations' QAs. You can set arguments for OpenIEConverter in the QASemEndToEndPipeline constructor using the openie_converter_kwargs argument, e.g. QASemEndToEndPipeline(openie_converter_kwargs={"layers_included": ["qasrl", "qanom"]}).

Example

from qasem.end_to_end_pipeline import QASemEndToEndPipeline 
pipe = QASemEndToEndPipeline(annotation_layers=('qasrl', 'qanom', 'qadiscourse'),  nominalization_detection_threshold=0.75, contextualize = True)  
sentences = ["The doctor was interested in Luke 's treatment as he was still not feeling well .", "Tom brings the dog to the park."]
outputs = pipe(sentences)

print(outputs)

Outputs

[{'qanom': [
  {'QAs': [{
     'question': 'who was treated ?',
     'answers': ['Luke'],
     'contextual_question': 'Who was treated?'}],
   'predicate_idx': 7,
   'predicate': 'treatment',
   'predicate_detector_probability': 0.8152085542678833,
   'verb_form': 'treat'}
 ],
 'qasrl': [
   ...
 ],
 'qadiscourse': [{
   'question': 'What is the cause of the doctor being interested in Luke 's treatment?',
   'answer': 'he was still not feeling well'}
 ]},
},

{'qanom': [],
 'qasrl': [{'QAs': [
    {'question': 'who brings something ?',
     'answers': ['Tom'],
     'contextual_question': 'Who brings the dog?'},
    {'question': ' what does someone bring ?',
     'answers': ['the dog'],
     'contextual_question': 'What does Tom bring?'},
    {'question': ' where does someone bring something ?',
     'answers': ['to the park'],
     'contextual_question': 'Where does Tom bring the dog?'}],
   'predicate_idx': 1,
   'predicate': 'brings',
   'verb_form': 'bring'}]}
 ],
 'qadiscourse': []
}

Repository for Model Training & Experiments

The underlying QA-SRL and QANom models were trained and evaluated using the code at qasrl-seq2seq repository.

The code for training and evaluating the QADiscourse model will be uploaded soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qasem-0.1.0.tar.gz (13.1 kB view details)

Uploaded Source

File details

Details for the file qasem-0.1.0.tar.gz.

File metadata

  • Download URL: qasem-0.1.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.5

File hashes

Hashes for qasem-0.1.0.tar.gz
Algorithm Hash digest
SHA256 284883c7583968adaa15f16642f7eb60cdf20ec995e7136aff1eb05e4a0f180d
MD5 1dfd3f30f1a25de4c9107904c9f64bb4
BLAKE2b-256 9fcc0cd00ecf3a8d7a3072041a6adea9de4cf4ea3c148779a3dc3581f4151f18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page