package for Question-Answer driven Semantic Role Labeling for Nominalizations (QANom)

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

QANom - Annotating Nominal Predicates with QA-SRL

QANom is a research project aiming for a natural representation of nominalization's predicate-argument relations. It extends the Question Answer driven Semantic Role Labeling (QASRL) framework (see website), which tackled verbal predicates, to the more challenging space of deverbal nominalizations.

This repository is the reference point for the data and software described in the paper QANom: Question-Answer driven SRL for Nominalizations (COLING 2020). To find information for replicating the work described by the QANom paper (crowdsourcing a QANom dataset, identifying nominalization candidates, training and evaluating the baseline models), please refer to the paper_reference_readme.md.

The repo also consists software for using QANom downstream. This mainly includes pipelines for easy usage of the nominalization detection model and of the QANom parsers. This README will guide you through using this software.

Pre-requisite

Python 3.7

Installation

From pypi: pip install qanom

If you want to install from source, clone this repository and then install requirements:

git clone https://github.com/kleinay/QANom.git
cd QANom
pip install requirements.txt

End-to-End Pipeline

If you wish to parse sentences with QANom, the best place to start is the QANomEndToEndPipeline class from the qanom.qanom_end_to_end_pipeline module.

This pipeline is first running the Nominalization Detector for identifying the nominal predicates in the sentence (see demo). Then, it sends each nominal predicate to the QAnom-Seq2Seq model (see demo) to parse them with Question-Answer driven Semantic Role Labeling (QASRL).

Usage Example

from qanom.qanom_end_to_end_pipeline import QANomEndToEndPipeline
pipe = QANomEndToEndPipeline(detection_threshold=0.75)
sentence = "The construction of the officer 's building finished right after the beginning of the destruction of the previous construction ."
print(pipe([sentence]))

Output:

[[{'QAs': [{'question': 'what was constructed ?',
     'answers': ["the officer 's"]}],
   'predicate_idx': 1,
   'predicate': 'construction',
   'predicate_detector_probability': 0.7623529434204102,
   'verb_form': 'construct'},
  {'QAs': [{'question': 'what began ?',
     'answers': ['the destruction of the']}],
   'predicate_idx': 11,
   'predicate': 'beginning',
   'predicate_detector_probability': 0.8923847675323486,
   'verb_form': 'begin'},
  {'QAs': [{'question': 'what was destructed ?', 
     'answers': ['the previous']}],
   'predicate_idx': 14,
   'predicate': 'destruction',
   'predicate_detector_probability': 0.849774956703186,
   'verb_form': 'destruct'}]]

Nominalization Detection Model

This model identifies "predicative nominalizations", that is, nominalizations that carry an eventive (or "verbal") meaning in context. It is a bert-base-cased pretrained model, fine-tuned for token classification on top of the "nominalization detection" task as defined and annotated by the QANom project.

The model is trained as a binary classifier, classifying candidate nominalizations. The candidates are extracted using a POS tagger (filtering common nouns) and additionally lexical resources (e.g. WordNet and CatVar), filtering nouns that have (at least one) derivationally-related verb. In the QANom annotation project, these candidates are given to annotators to decide whether they carry a "verbal" meaning in the context of the sentence. The current model reproduces this binary classification.

Under the hood, the NominalizationDetector class encapsulates the full nominalization detection pipeline (i.e. candidate extraction + predicate classification). It leverages the qanom.candidate_extraction.candidate_extraction.py module, and additionally downloads and wraps the nominalization-candidate-classifier model, hosted at Huggingface model hub.

Usage Example

from qanom.nominalization_detector import NominalizationDetector
detector = NominalizationDetector()

raw_sentences = ["The construction of the officer 's building finished right after the beginning of the destruction of the previous construction ."]

print(detector(raw_sentences, return_all_candidates=True))
print(detector(raw_sentences, threshold=0.75, return_probability=False))

Outputs:

[[{'predicate_idx': 1,
   'predicate': 'construction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.7626778483390808,
   'verb_form': 'construct'},
  {'predicate_idx': 4,
   'predicate': 'officer',
   'predicate_detector_prediction': False,
   'predicate_detector_probability': 0.19832570850849152,
   'verb_form': 'officer'},
  {'predicate_idx': 6,
   'predicate': 'building',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.5794129371643066,
   'verb_form': 'build'},
  {'predicate_idx': 11,
   'predicate': 'beginning',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.8937646150588989,
   'verb_form': 'begin'},
  {'predicate_idx': 14,
   'predicate': 'destruction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.8501205444335938,
   'verb_form': 'destruct'},
  {'predicate_idx': 18,
   'predicate': 'construction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.7022264003753662,
   'verb_form': 'construct'}]]

[[{'predicate_idx': 1, 'predicate': 'construction', 'verb_form': 'construct'},
  {'predicate_idx': 11, 'predicate': 'beginning', 'verb_form': 'begin'},
  {'predicate_idx': 14, 'predicate': 'destruction', 'verb_form': 'destruct'}]]

SpaCy Custom Component 'nominalization_detector'

If you are using SpaCy, you can easily plug-in our nominalization detection algorithm as a custom component into the SpaCy pipeline. Load the qanom.spacy_component_nominalization_detector module to have our "nominalization_detector" component registered by spacy.

For example:

from qanom.spacy_component_nominalization_detector import *
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("nominalization_detector", after="tagger", 
             config={"threshold": 0.7, "device": -1}) # you may specify config settings or stay with these defaults
# Now you `nlp` pipeline also identifies verbal nominalizations:
doc = nlp("The medical student asked about the progress in Luke's treatment.")
print(doc._.nominalizations)  # a Doc extension attribute with the list of tokens identified as verbal nominalizations
print([(nn.text, nn._.verb_form, nn._.is_nominalization_confidence) for nn in doc._.nominalizations]) # Token extension attributes

[progress, treatment]
[('progress', 'progress', 0.8063599467277527),
 ('treatment', 'treat', 0.8211929798126221)]

QANom Sequence-to-Sequence Models

We have finetuned T5, a pretrained Seq-to-Seq language model, on the task of parsing QANom QAs. Given a sentence and a highlighted nominal predicate, the models produce an output sequence consisting of the QANom-formatted question-answer pairs for this predicate.

We currently have two models:

qanom-seq2seq-model-baseline (HF repo) - trained only on the QANom dataset. Performance: 57.6 Unlabled Arg F1, 34.9 Labeled Arg F1.
qanom-seq2seq-model-joint (HF repo) - trained jointly on the QANom and verbal QASRL. Performance: 60.1 Unlabled Arg F1, 40.6 Labeled Arg F1.

We provide the QASRL_Pipeline class (at `qanom.qasrl_seq2seq_pipeline) which is a Huggingface Pipeline for applying the models out-of-the-box on new texts:

from pipeline import QASRL_Pipeline
pipe = QASRL_Pipeline("kleinay/qanom-seq2seq-model-baseline")
pipe("The student was interested in Luke 's <predicate> research about see animals .", verb_form="research", predicate_type="nominal")

Which will output:

[{'generated_text': 'who _ _ researched something _ _ ?<extra_id_7> Luke', 
  'QAs': [{'question': 'who researched something ?', 'answers': ['Luke']}]}]

You can learn more about using transformers.pipelines in the official docs.

Notice that you need to specify which word in the sentence is the predicate, about which the question will interrogate. By default, you should precede the predicate with the <predicate> symbol, but you can also specify your own predicate marker:

pipe("The student was interested in Luke 's <PRED> research about see animals .", verb_form="research", predicate_type="nominal", predicate_marker="<PRED>")

In addition, you can specify additional kwargs for controling the model's decoding algorithm:

pipe("The student was interested in Luke 's <predicate> research about see animals .", verb_form="research", predicate_type="nominal", num_beams=3)

Cite

@inproceedings{klein2020qanom,
 title={QANom: Question-Answer driven SRL for Nominalizations},
 author={Klein, Ayal and Mamou, Jonathan and Pyatkin, Valentina and Stepanov, Daniela and He, Hangfeng and Roth, Dan and Zettlemoyer, Luke and Dagan, Ido},
 booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
 pages={3069--3083},
 year={2020}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.33

Jul 18, 2023

0.0.32

May 8, 2023

0.0.31

Jan 31, 2023

0.0.30

Jan 4, 2023

0.0.29

Dec 5, 2022

0.0.28

Nov 21, 2022

0.0.27

Nov 21, 2022

0.0.26

Jul 18, 2022

0.0.25

May 23, 2022

0.0.24

May 10, 2022

0.0.23

Apr 13, 2022

0.0.22

Apr 4, 2022

0.0.21

Feb 15, 2022

0.0.20

Feb 14, 2022

0.0.12

Jan 11, 2022

0.0.11

Jan 11, 2022

0.0.10

Jan 10, 2022

0.0.9

Jan 9, 2022

0.0.6

Dec 2, 2021

0.0.5

Dec 2, 2021

0.0.4

Dec 2, 2021

0.0.3

Sep 30, 2021

0.0.2

Sep 5, 2021

0.0.1 yanked

Sep 5, 2021

Reason this release was yanked:

have

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qanom-0.0.33.tar.gz (1.1 MB view details)

Uploaded Jul 18, 2023 Source

File details

Details for the file qanom-0.0.33.tar.gz.

File metadata

Download URL: qanom-0.0.33.tar.gz
Upload date: Jul 18, 2023
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.5

File hashes

Hashes for qanom-0.0.33.tar.gz
Algorithm	Hash digest
SHA256	`7952516ed25937ae33f02f82c717c183c6ffbe6535724ac09e57b3f6f7c93b83`
MD5	`3d1cdd31ed029be12472dcba68fe9447`
BLAKE2b-256	`00951e3bc441a156fab50cce36222b52f884f892666c0b06b38efc8ccbdcb837`

See more details on using hashes here.

qanom 0.0.33

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

QANom - Annotating Nominal Predicates with QA-SRL

Pre-requisite

Installation

End-to-End Pipeline

Nominalization Detection Model

SpaCy Custom Component 'nominalization_detector'

QANom Sequence-to-Sequence Models

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes