Skip to main content

Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.

Project description

tests linter

python 3.7 release (latest by date) license

pre-commit code style: black

pypi version pypi downloads

QaNER

Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.

You can adopt this pipeline for arbitrary BIO-markup data.

Installation

pip install qaner

CoNLL-2003

Pipeline results on CoNLL-2003 dataset:

How to use

Training

Script for training QaNER model:

qaner-train \
--bert_model_name 'bert-base-uncased' \
--path_to_prompt_mapper 'data/conll2003/prompt_mapper.json' \
--path_to_train_data 'data/conll2003/train.bio' \
--path_to_test_data 'data/conll2003/test.bio' \
--path_to_save_model 'dayyass/qaner-conll-bert-base-uncased' \
--n_epochs 2 \
--batch_size 128 \
--learning_rate 1e-5 \
--seed 42 \
--log_dir 'runs/qaner'

Required arguments:

  • --bert_model_name - base bert model for QaNER fine-tuning
  • --path_to_prompt_mapper - path to prompt mapper json file
  • --path_to_train_data - path to train data (BIO-markup)
  • --path_to_test_data - path to test data (BIO-markup)
  • --path_to_save_model - path to save trained QaNER model
  • --n_epochs - number of epochs to fine-tune
  • --batch_size - batch size
  • --learning_rate - learning rate

Optional arguments:

  • --seed - random seed for reproducibility (default: 42)
  • --log_dir - tensorboard log_dir (default: 'runs/qaner')

Infrerence

Script for inference trained QaNER model:

qaner-inference \
--context 'EU rejects German call to boycott British lamb .' \
--question 'What is the organization?' \
--path_to_prompt_mapper 'data/conll2003/prompt_mapper.json' \
--path_to_trained_model 'dayyass/qaner-conll-bert-base-uncased' \
--n_best_size 1 \
--max_answer_length 100 \
--seed 42

Result:

question: What is the organization?

context: EU rejects German call to boycott British lamb .

answer: [Span(token='EU', label='ORG', start_context_char_pos=0, end_context_char_pos=2)]

Required arguments:

  • --context - sentence to extract entities from
  • --question - question prompt with entity name to extract (examples below)
  • --path_to_prompt_mapper - path to prompt mapper json file
  • --path_to_trained_model - path to trained QaNER model
  • --n_best_size - number of best QA answers to consider

Optional arguments:

  • --max_answer_length - entity max length to eliminate very long entities (default: 100)
  • --seed - random seed for reproducibility (default: 42)

Possible inference questions for CoNLL-2003:

  • What is the location? (LOC)
  • What is the person? (PER)
  • What is the organization? (ORG)
  • What is the miscellaneous entity? (MISC)

Requirements

Python >= 3.7

Citation

@misc{liu2022qaner,
    title         = {QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition},
    author        = {Andy T. Liu and Wei Xiao and Henghui Zhu and Dejiao Zhang and Shang-Wen Li and Andrew Arnold},
    year          = {2022},
    eprint        = {2203.01543},
    archivePrefix = {arXiv},
    primaryClass  = {cs.LG}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qaner-0.1.1.tar.gz (11.4 kB view hashes)

Uploaded Source

Built Distribution

qaner-0.1.1-py3-none-any.whl (14.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page