A package for NER evaluation
Project description
eval4ner: An All-Round Evaluation for Named Entity Recognition
Table of Contents
This is a Python toolkit of MUC-5 evaluation metrics for evaluating Named Entity Recognition (NER) results.
TL;DR
It considers not only the mode of strict matching, i.e., extracted entities are correct w.r.t both boundaries and types, but that of partial match, summarizing as following four modes:
- Strict:exact match (Both entity boundary and type are correct)
- Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
- Partial boundary matching:entity boundaries overlap, regardless of entity boundary
- Type matching:some overlap between the system tagged entity and the gold annotation is required;
Refer to the blog Evaluation Metrics of Name Entity Recognition for explanations of MUC metric.
Preliminaries for NER Evaluation
In research and production, following scenarios of NER systems can occur frequently:
Scenario | Golden Standard | NER system prediction | Measure | |||||
---|---|---|---|---|---|---|---|---|
Entity Type | Entity Boundary (Surface String) | Entity Type | Entity Boundary (Surface String) | Type | Partial | Exact | Strict | |
III | MUSIC_NAME | 告白气球 | MIS | MIS | MIS | MIS | ||
II | MUSIC_NAME | 年轮 | SPU | SPU | SPU | SPU | ||
V | MUSIC_NAME | 告白气球 | MUSIC_NAME | 一首告白气球 | COR | PAR | INC | INC |
IV | MUSIC_NAME | 告白气球 | SINGER | 告白气球 | INC | COR | COR | INC |
I | MUSIC_NAME | 告白气球 | MUSIC_NAME | 告白气球 | COR | COR | COR | COR |
VI | MUSIC_NAME | 告白气球 | SINGER | 一首告白气球 | INC | PAR | INC | INC |
Thus, MUC-5 takes into account all these scenarios for all-sided evaluation.
Then we can compute:
Number of golden standard:
Number of predictee:
The evaluation type of exact match and partial match are as follows:
Exact match(i.e. Strict, Exact)
Partial match (i.e. Partial, Type)
F-Measure
Therefore, we can get the results:
Measure | Type | Partial | Exact | Strict |
---|---|---|---|---|
Correct | 2 | 2 | 2 | 1 |
Incorrect | 2 | 0 | 2 | 3 |
Partial | 0 | 2 | 0 | 0 |
Missed | 1 | 1 | 1 | 1 |
Spurius | 1 | 1 | 1 | 1 |
Precision | 0.4 | 0.6 | 0.4 | 0.2 |
Recall | 0.4 | 0.6 | 0.4 | 0.2 |
F1 score | 0.4 | 0.6 | 0.4 | 0.2 |
User Guide
Installation
pip install [-U] eval4ner
Usage
1. Evaluate single prediction
import eval4ner.muc as muc
import pprint
grount_truth = [('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
prediction = [('PER', 'John Jones and Peter Peters came to York')]
text = 'John Jones and Peter Peters came to York'
one_result = muc.evaluate_one(prediction, grount_truth, text)
pprint.pprint(one_result)
Output:
{'exact': {'actual': 1,
'correct': 0,
'f1_score': 0,
'incorrect': 1,
'missed': 2,
'partial': 0,
'possible': 3,
'precision': 0.0,
'recall': 0.0,
'spurius': 0},
'partial': {'actual': 1,
'correct': 0,
'f1_score': 0.25,
'incorrect': 0,
'missed': 2,
'partial': 1,
'possible': 3,
'precision': 0.5,
'recall': 0.16666666666666666,
'spurius': 0},
'strict': {'actual': 1,
'correct': 0,
'f1_score': 0,
'incorrect': 1,
'missed': 2,
'partial': 0,
'possible': 3,
'precision': 0.0,
'recall': 0.0,
'spurius': 0},
'type': {'actual': 1,
'correct': 1,
'f1_score': 0.5,
'incorrect': 0,
'missed': 2,
'partial': 0,
'possible': 3,
'precision': 1.0,
'recall': 0.3333333333333333,
'spurius': 0}}
2. Evaluate all predictions
import eval4ner.muc as muc
# ground truth
grount_truths = [
[('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
[('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')],
[('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
]
# NER model prediction
predictions = [
[('PER', 'John Jones and Peter Peters came to York')],
[('LOC', 'John Jones'), ('PER', 'Peters'), ('LOC', 'York')],
[('PER', 'John Jones'), ('PER', 'Peter Peters'), ('LOC', 'York')]
]
# input texts
texts = [
'John Jones and Peter Peters came to York',
'John Jones and Peter Peters came to York',
'John Jones and Peter Peters came to York'
]
muc.evaluate_all(predictions, grount_truths * 1, texts, verbose=True)
Output:
NER evaluation scores:
strict mode, Precision=0.4444, Recall=0.4444, F1:0.4444
exact mode, Precision=0.5556, Recall=0.5556, F1:0.5556
partial mode, Precision=0.7778, Recall=0.6667, F1:0.6944
type mode, Precision=0.8889, Recall=0.6667, F1:0.7222
This repo will be long-term supported. Welcome to contribute and PR.
Citation
For attribution in academic contexts, please cite this work as:
@misc{eval4ner,
title={Evaluation Metrics of Named Entity Recognition},
author={Chai, Yekun},
year={2018},
howpublished={\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
}
@misc{chai2018-ner-eval,
author = {Chai, Yekun},
title = {eval4ner: An All-Round Evaluation for Named Entity Recognition},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/cyk1337/eval4ner}}
}
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file eval4ner-0.1.0.tar.gz
.
File metadata
- Download URL: eval4ner-0.1.0.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16a51d9c3b62bfc564eda9925354e9efc049f3dcba64dba9ef0b8c2dae91c319 |
|
MD5 | ed2fab6cdd1582620bdd66cf10c1b7f6 |
|
BLAKE2b-256 | 2abb974cb6bd6443a4918877bdbfc46945402a86b431a29363b0048901cf4bae |
File details
Details for the file eval4ner-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: eval4ner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eee2ac9cdcfa02bac4a3a9dc49669583f12dd0327bf6b63a975fb00cdf63b5ae |
|
MD5 | ad73523dac6ed4515126e86524ad58fe |
|
BLAKE2b-256 | 2df1c4f811419e8287e04bc2ababb2ca865980ec8077ba920ce584ef4b373e4b |