Skip to main content

A package for NER evaluation

Project description

eval4ner: An All-Round Evaluation for Named Entity Recognition

Stable version Python3wheel:eval4ner Download MIT License

Table of Contents

This is a Python toolkit of MUC-5 evaluation metrics for evaluating Named Entity Recognition (NER) results.

TL;DR

It considers not only the mode of strict matching, i.e., extracted entities are correct w.r.t both boundaries and types, but that of partial match, summarizing as following four modes:

  • Strict:exact match (Both entity boundary and type are correct)
  • Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
  • Partial boundary matching:entity boundaries overlap, regardless of entity boundary
  • Type matching:some overlap between the system tagged entity and the gold annotation is required;

Refer to the blog Evaluation Metrics of Name Entity Recognition for explanations of MUC metric.

Preliminaries for NER Evaluation

In research and production, following scenarios of NER systems can occur frequently:

Scenario Golden Standard NER system prediction Measure
Entity Type Entity Boundary (Surface String) Entity Type Entity Boundary (Surface String) Type Partial Exact Strict
III MUSIC_NAME 告白气球 MIS MIS MIS MIS
II MUSIC_NAME 年轮 SPU SPU SPU SPU
V MUSIC_NAME 告白气球 MUSIC_NAME 一首告白气球 COR PAR INC INC
IV MUSIC_NAME 告白气球 SINGER 告白气球 INC COR COR INC
I MUSIC_NAME 告白气球 MUSIC_NAME 告白气球 COR COR COR COR
VI MUSIC_NAME 告白气球 SINGER 一首告白气球 INC PAR INC INC

Thus, MUC-5 takes into account all these scenarios for all-sided evaluation.

Then we can compute:

Number of golden standard:

Number of predictee:

The evaluation type of exact match and partial match are as follows:

Exact match(i.e. Strict, Exact)

Partial match (i.e. Partial, Type)

F-Measure

Therefore, we can get the results:

Measure Type Partial Exact Strict
Correct 2 2 2 1
Incorrect 2 0 2 3
Partial 0 2 0 0
Missed 1 1 1 1
Spurius 1 1 1 1
Precision 0.4 0.6 0.4 0.2
Recall 0.4 0.6 0.4 0.2
F1 score 0.4 0.6 0.4 0.2

User Guide

Installation

pip install [-U] eval4ner

Usage

1. Evaluate single prediction

import eval4ner.muc as muc
import pprint
grount_truth = [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
prediction = [('PER', 'John Jones and Peter Peters came to York')]
text = 'John Jones and Peter Peters came to York'
one_result = muc.evaluate_one(prediction, grount_truth, text)
pprint.pprint(one_result)

Output:

{'exact': {'actual': 1,
           'correct': 0,
           'f1_score': 0,
           'incorrect': 1,
           'missed': 2,
           'partial': 0,
           'possible': 3,
           'precision': 0.0,
           'recall': 0.0,
           'spurius': 0},
 'partial': {'actual': 1,
             'correct': 0,
             'f1_score': 0.25,
             'incorrect': 0,
             'missed': 2,
             'partial': 1,
             'possible': 3,
             'precision': 0.5,
             'recall': 0.16666666666666666,
             'spurius': 0},
 'strict': {'actual': 1,
            'correct': 0,
            'f1_score': 0,
            'incorrect': 1,
            'missed': 2,
            'partial': 0,
            'possible': 3,
            'precision': 0.0,
            'recall': 0.0,
            'spurius': 0},
 'type': {'actual': 1,
          'correct': 1,
          'f1_score': 0.5,
          'incorrect': 0,
          'missed': 2,
          'partial': 0,
          'possible': 3,
          'precision': 1.0,
          'recall': 0.3333333333333333,
          'spurius': 0}}

2. Evaluate all predictions

import eval4ner.muc as muc
# ground truth
grount_truths = [
    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
]
# NER model prediction
predictions = [
    [('PER', 'John Jones and Peter Peters came to York')],
    [('LOC', 'John Jones'), ('PER', 'Peters'), ('LOC', 'York')],
    [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
]
# input texts
texts = [
    'John Jones and Peter Peters came to York',
    'John Jones and Peter Peters came to York',
    'John Jones and Peter Peters came to York'
]
muc.evaluate_all(predictions, grount_truths * 1, texts, verbose=True)

Output:

 NER evaluation scores:
  strict mode, Precision=0.4444, Recall=0.4444, F1:0.4444
   exact mode, Precision=0.5556, Recall=0.5556, F1:0.5556
 partial mode, Precision=0.7778, Recall=0.6667, F1:0.6944
    type mode, Precision=0.8889, Recall=0.6667, F1:0.7222

This repo will be long-term supported. Welcome to contribute and PR.

Citation

For attribution in academic contexts, please cite this work as:

@misc{eval4ner,
  title={Evaluation Metrics of Named Entity Recognition},
  author={Chai, Yekun},
  year={2018},
  howpublished={\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
}

@misc{chai2018-ner-eval,
  author = {Chai, Yekun},
  title = {eval4ner: An All-Round Evaluation for Named Entity Recognition},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cyk1337/eval4ner}}
}

References

  1. Evaluation of the SemEval-2013 Task 9.1: Recognition and Classification of pharmacological substances
  2. MUC-5 Evaluation Metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eval4ner-0.1.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

eval4ner-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file eval4ner-0.1.0.tar.gz.

File metadata

  • Download URL: eval4ner-0.1.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for eval4ner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16a51d9c3b62bfc564eda9925354e9efc049f3dcba64dba9ef0b8c2dae91c319
MD5 ed2fab6cdd1582620bdd66cf10c1b7f6
BLAKE2b-256 2abb974cb6bd6443a4918877bdbfc46945402a86b431a29363b0048901cf4bae

See more details on using hashes here.

File details

Details for the file eval4ner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: eval4ner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for eval4ner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eee2ac9cdcfa02bac4a3a9dc49669583f12dd0327bf6b63a975fb00cdf63b5ae
MD5 ad73523dac6ed4515126e86524ad58fe
BLAKE2b-256 2df1c4f811419e8287e04bc2ababb2ca865980ec8077ba920ce584ef4b373e4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page