NER evaluation done right

These details have not been verified by PyPI

Project links

Homepage

Project description

nervaluate

nervaluate is a python module for evaluating Named Entity Recognition (NER) models as defined in the SemEval 2013 - 9.1 task.

The evaluation metrics output by nervaluate go beyond a simple token/tag based schema, and consider diferent scenarios based on wether all the tokens that belong to a named entity were classified or not, and also wether the correct entity type was assigned.

This problem is described in detail in the original blog post by David Batista, and extends the code in the original repository which accompanied the blog post.

The code draws heavily on:

Segura-bedmar, I., & Mart, P. (2013). 2013 SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from. Semeval, 2(DDIExtraction), 341–350. link
https://www.cs.york.ac.uk/semeval-2013/task9/data/uploads/semeval_2013-task-9_1-evaluation-metrics.pdf

Notes:

In scenarios IV and VI the entity type of the true and pred does not match, in both cases we only scored against the true entity, not the predicted one. You can argue that the predicted entity could also be scored as spurious, but according to the definition of spurius:

Spurius (SPU) : system produces a response which doesn’t exist in the golden annotation;

In this case there exists an annotation, but with a different entity type, so we assume it's only incorrect.

Installation

To install the package:

pip install git+https://github.com/ivyleavedtoadflax/nervaluate

To create a virtual environment for development:

make virtualenv

# Then to activate the virtualenv:

source /build/virtualenv/bin/activate

Alternatively you can use your own virtualenv manager and simply make reqs to install requirements.

To run tests:

# Will run tox

make test

Example:

The package currently works with two formats:

prodi.gy style lists of spans.
Nested lists containing NER labels.

Prodigy spans

true = [
    [{"label": "PER", "start": 2, "end": 4}],
    [{"label": "LOC", "start": 1, "end": 2},
     {"label": "LOC", "start": 3, "end": 4}]
]

pred = [
    [{"label": "PER", "start": 2, "end": 4}],
    [{"label": "LOC", "start": 1, "end": 2},
     {"label": "LOC", "start": 3, "end": 4}]
]

from nervaluate import Evaluator

evaluator = Evaluator(true, pred, tags=['LOC', 'PER'])

# Returns overall metrics and metrics for each tag

results, results_per_tag = evaluator.evaluate()

print(results)

{
    'ent_type':{
        'correct':3,
        'incorrect':0,
        'partial':0,
        'missed':0,
        'spurious':0,
        'possible':3,
        'actual':3,
        'precision':1.0,
        'recall':1.0
    },
    'partial':{
        'correct':3,
        'incorrect':0,
        'partial':0,
        'missed':0,
        'spurious':0,
        'possible':3,
        'actual':3,
        'precision':1.0,
        'recall':1.0
    },
    'strict':{
        'correct':3,
        'incorrect':0,
        'partial':0,
        'missed':0,
        'spurious':0,
        'possible':3,
        'actual':3,
        'precision':1.0,
        'recall':1.0
    },
    'exact':{
        'correct':3,
        'incorrect':0,
        'partial':0,
        'missed':0,
        'spurious':0,
        'possible':3,
        'actual':3,
        'precision':1.0,
        'recall':1.0
    }
}

print(results_by_tag)

{
    'LOC':{
        'ent_type':{
            'correct':2,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':2,
            'actual':2,
            'precision':1.0,
            'recall':1.0
        },
        'partial':{
            'correct':2,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':2,
            'actual':2,
            'precision':1.0,
            'recall':1.0
        },
        'strict':{
            'correct':2,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':2,
            'actual':2,
            'precision':1.0,
            'recall':1.0
        },
        'exact':{
            'correct':2,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':2,
            'actual':2,
            'precision':1.0,
            'recall':1.0
        }
    },
    'PER':{
        'ent_type':{
            'correct':1,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':1,
            'actual':1,
            'precision':1.0,
            'recall':1.0
        },
        'partial':{
            'correct':1,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':1,
            'actual':1,
            'precision':1.0,
            'recall':1.0
        },
        'strict':{
            'correct':1,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':1,
            'actual':1,
            'precision':1.0,
            'recall':1.0
        },
        'exact':{
            'correct':1,
            'incorrect':0,
            'partial':0,
            'missed':0,
            'spurious':0,
            'possible':1,
            'actual':1,
            'precision':1.0,
            'recall':1.0
        }
    }
}

Nested lists

true = [
    ['O', 'O', 'B-PER', 'I-PER', 'O'],
    ['O', 'B-LOC', 'I-LOC', 'B-LOC', 'I-LOC', 'O'],
]

pred = [
    ['O', 'O', 'B-PER', 'I-PER', 'O'],
    ['O', 'B-LOC', 'I-LOC', 'B-LOC', 'I-LOC', 'O'],
]

evaluator = Evaluator(true, pred, tags=['LOC', 'PER'], loader="list")

results, results_by_tag = evaluator.evaluate()

A working example of using nested lists is provided in the following notebook:

examples/example-full-named-entity-evaluation.ipynb

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.1

Mar 12, 2026

1.2.0

Mar 9, 2026

1.1.0

Sep 6, 2025

1.0.0

Aug 18, 2025

0.3.1

Jun 5, 2025

0.2.0

Jun 4, 2024

0.1.8

Oct 16, 2020

0.1.7

Dec 7, 2019

0.1.6

Dec 7, 2019

This version

0.1.5

Dec 6, 2019

0.1.4

Dec 6, 2019

0.1.3

Dec 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nervaluate-0.1.5-py3-none-any.whl (21.2 kB view details)

Uploaded Dec 6, 2019 Python 3

File details

Details for the file nervaluate-0.1.5-py3-none-any.whl.

File metadata

Download URL: nervaluate-0.1.5-py3-none-any.whl
Upload date: Dec 6, 2019
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for nervaluate-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a0f4075769472d9c0603277e28214cde6918df35aba47997fd6296b4568ad88`
MD5	`b9ac68a9e1c079799b4a99aeb6cb5d30`
BLAKE2b-256	`2530e329227b16914405deab65dee84491c49e9e767f7c948f2a1cc8f9cf5f90`

See more details on using hashes here.

nervaluate 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nervaluate

Notes:

Installation

Example:

Prodigy spans

Nested lists

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes