Skip to main content

BIO and BEISO evaluation library

Project description

bioeval

CircleCI License: MIT

CoNLL-2000 style evaluation of data using BIO and BEISO representation for mutli-token entities (i.e. chunks).

Install

In the root folder execute:

pip install bioeval

Change Log

  • pypi release and automated CI releases
  • bioeval now supports pandas DataFame objects through bioeval.evaluate_df.

Usage

The library supports two ways of evaluating span annotation. The first is the native format way while the second uses a pandas DataFrame format.

Native input format

The native input format is a set of tuples, where each tuple signifies the group of tokens in a span. Tokens are also denoted by tuples that are supposed to be unique. The user can achieve that uniqueness through adding a unique identifier to each token as in the example bellow.

from bioeval import evaluate


# gold chunks
chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'B-MV'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP'),
        (7, 'square', 'N', 'I-NP')
    ),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((11, '.', '.', 'O'),)
}

# candidate chunks
guess_chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'I-NP'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP')
    ),
    ((7, 'square', 'N', 'O'),),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((8, '.', '.', 'O'),)
}

# evaluation
f1, pr, re = evaluate(gold_sequence=chunk, guess_sequence=guess_chunk, chunk_col=3)
print(f1)
# 71.43

Dataframe format

The library supports dataframes input through the use of the evaluate_df method, which needs the additional chunkcol and guesscol parameters to specify the gold and candidate spans.

import pandas as pd
from bioeval import evaluate_df

# input data as a JSON parsed to a DataFrame object
df = pd.DataFrame(
    [
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'I-bar','guesstag': 'I-bar'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'B-bar','guesstag': 'I-foo'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'B-foo'}
    ]
)

f1, pr, re = evaluate_df(df=df, chunkcol='chunktag', guesscol='guesstag')

print(f1)
>>> 62.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioeval-1.1.14.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

bioeval-1.1.14-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file bioeval-1.1.14.tar.gz.

File metadata

  • Download URL: bioeval-1.1.14.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for bioeval-1.1.14.tar.gz
Algorithm Hash digest
SHA256 a442e44d78ac1b8700e35f014d00183f13f70bb87355260696e492306dec1fb6
MD5 dbe4ad1c92031b0a86f5d8c9de9987cf
BLAKE2b-256 6c659cc3c8cb1f918b15cd86df9811fddfdb6fe9cb555906e4541708ccd25416

See more details on using hashes here.

File details

Details for the file bioeval-1.1.14-py3-none-any.whl.

File metadata

  • Download URL: bioeval-1.1.14-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for bioeval-1.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 2256de82a6019e713d8646a7e7850b28866e3b4f682c5a6d9d21cb68397a53bd
MD5 6f7e8c4df65c6a94def1b87ee369eed6
BLAKE2b-256 9c5b85f8d2a297b135a5f687225d4dab609d0fa9ac383a6f469cc9d6e87f1b93

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page