BIO and BEISO evaluation library
Project description
bioeval
CoNLL-2000 style evaluation of data using BIO and BEISO representation for mutli-token entities (i.e. chunks).
Install
In the root folder execute:
pip install bioeval
Change Log
- pypi release and automated CI releases
bioeval
now supports pandasDataFame
objects throughbioeval.evaluate_df
.
Usage
The library supports two ways of evaluating span annotation. The first is the native format way while the second uses a pandas DataFrame format.
Native input format
The native input format is a set of tuples, where each tuple signifies the group of tokens in a span. Tokens are also denoted by tuples that are supposed to be unique. The user can achieve that uniqueness through adding a unique identifier to each token as in the example bellow.
from bioeval import evaluate
# gold chunks
chunk = {
((1, 'Gold', 'N', 'B-NP'),),
((2, 'is', 'V', 'B-MV'),),
((3, 'green', 'J', 'B-AP'),),
((4, '.', '.', 'B-NP'),),
(
(5, 'The', 'D', 'B-NP'),
(6, 'red', 'J', 'I-NP'),
(7, 'square', 'N', 'I-NP')
),
((8, 'is', 'V', 'B-MV'),),
(
(9, 'very', 'A', 'B-AP'),
(10, 'boring', 'J', 'I-AP')
),
((11, '.', '.', 'O'),)
}
# candidate chunks
guess_chunk = {
((1, 'Gold', 'N', 'B-NP'),),
((2, 'is', 'V', 'I-NP'),),
((3, 'green', 'J', 'B-AP'),),
((4, '.', '.', 'B-NP'),),
(
(5, 'The', 'D', 'B-NP'),
(6, 'red', 'J', 'I-NP')
),
((7, 'square', 'N', 'O'),),
((8, 'is', 'V', 'B-MV'),),
(
(9, 'very', 'A', 'B-AP'),
(10, 'boring', 'J', 'I-AP')
),
((8, '.', '.', 'O'),)
}
# evaluation
f1, pr, re = evaluate(gold_sequence=chunk, guess_sequence=guess_chunk, chunk_col=3)
print(f1)
# 71.43
Dataframe format
The library supports dataframes input through the use of the evaluate_df
method, which needs the additional chunkcol
and guesscol
parameters to
specify the gold and candidate spans.
import pandas as pd
from bioeval import evaluate_df
# input data as a JSON parsed to a DataFrame object
df = pd.DataFrame(
[
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'I-foo'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-bar','guesstag': 'B-bar'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'I-foo'},
{'chunktag': 'B-bar','guesstag': 'B-bar'},
{'chunktag': 'I-bar','guesstag': 'I-bar'},
{'chunktag': 'O','guesstag': 'O'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'B-bar','guesstag': 'I-foo'},
{'chunktag': 'B-foo','guesstag': 'B-foo'},
{'chunktag': 'I-foo','guesstag': 'B-foo'}
]
)
f1, pr, re = evaluate_df(df=df, chunkcol='chunktag', guesscol='guesstag')
print(f1)
>>> 62.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bioeval-1.1.14.tar.gz
.
File metadata
- Download URL: bioeval-1.1.14.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a442e44d78ac1b8700e35f014d00183f13f70bb87355260696e492306dec1fb6 |
|
MD5 | dbe4ad1c92031b0a86f5d8c9de9987cf |
|
BLAKE2b-256 | 6c659cc3c8cb1f918b15cd86df9811fddfdb6fe9cb555906e4541708ccd25416 |
File details
Details for the file bioeval-1.1.14-py3-none-any.whl
.
File metadata
- Download URL: bioeval-1.1.14-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2256de82a6019e713d8646a7e7850b28866e3b4f682c5a6d9d21cb68397a53bd |
|
MD5 | 6f7e8c4df65c6a94def1b87ee369eed6 |
|
BLAKE2b-256 | 9c5b85f8d2a297b135a5f687225d4dab609d0fa9ac383a6f469cc9d6e87f1b93 |