BIO and BEISO evaluation library
Project description
bioeval
CoNLL-2000 style evaluation of data using BIO and BEISO representation for mutli-token entities (i.e. chunks).
Install
In the root folder execute:
pip install bioeval
Change Log
- pypi release and automated CI releases
bioeval
now supports pandasDataFame
objects throughbioeval.evaluate_df
.
Usage
The library supports two ways of evaluating span annotation. The first is the native format way while the second uses a pandas DataFrame format.
Native input format
The native input format is a set of tuples, where each tuple signifies the group of tokens in a span. Tokens are also denoted by tuples that are supposed to be unique. The user can achieve that uniqueness through adding a unique identifier to each token as in the example bellow.
from bioeval import evaluate
# gold chunks
chunk = {
((1, 'Gold', 'N', 'B-NP'),),
((2, 'is', 'V', 'B-MV'),),
((3, 'green', 'J', 'B-AP'),),
((4, '.', '.', 'B-NP'),),
(
(5, 'The', 'D', 'B-NP'),
(6, 'red', 'J', 'I-NP'),
(7, 'square', 'N', 'I-NP')
),
((8, 'is', 'V', 'B-MV'),),
(
(9, 'very', 'A', 'B-AP'),
(10, 'boring', 'J', 'I-AP')
),
((11, '.', '.', 'O'),)
}
# candidate chunks
guess_chunk = {
((1, 'Gold', 'N', 'B-NP'),),
((2, 'is', 'V', 'I-NP'),),
((3, 'green', 'J', 'B-AP'),),
((4, '.', '.', 'B-NP'),),
(
(5, 'The', 'D', 'B-NP'),
(6, 'red', 'J', 'I-NP')
),
((7, 'square', 'N', 'O'),),
((8, 'is', 'V', 'B-MV'),),
(
(9, 'very', 'A', 'B-AP'),
(10, 'boring', 'J', 'I-AP')
),
((8, '.', '.', 'O'),)
}
# evaluation
f1, pr, re = evaluate(gold_sequence=chunk, guess_sequence=guess_chunk, chunk_col=3)
print(f1)
# 71.43
Dataframe format
The library supports dataframes input through the use of the evaluate_df
method, which needs the additional chunkcol
and guesscol
parameters to
specify the gold and candidate spans.
import pandas as pd
from bioeval import evaluate_df
# input data as a JSON parsed to a DataFrame object
df = pd.DataFrame(
[
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'I-foo'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-bar','guesstag': 'B-bar'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'I-foo'},
{'chunktag': 'B-bar','guesstag': 'B-bar'},
{'chunktag': 'I-bar','guesstag': 'I-bar'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'B-bar','guesstag': 'I-foo'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'B-foo'}
]
)
f1, pr, re = evaluate_df(df=df, chunkcol='chunktag', guesscol='guesstag')
print(f1)
>>> 62.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.