Skip to main content

A Python package for converting various Named Entity Recognition (NER) formats to BRAT and BIO formats.

Project description

NER Reformat

NER Reformat is a Python package that transforms Named Entity Recognition (NER) annotations into the BRAT and BIO formats. It also supports Named Entity Linking annotations transformation to BRAT format for some corpora. You can see the list of formats at section Supported formats

Installation

You can install NER Formatter using pip:

pip3 install ner-reformat

Usage

Here's a basic example of how to use NER Formatter:

from ner-reformat import ncbi_to_brat

path_to = "your path/NCBI diseases"
path_from = "your path/NCBI diseases"
ncbi_to_brat(path_from=path_from, path_to=path_to)

Supported Corpora

  • IOB formatted, including:
    • CoNLL
    • OntoNotes
    • MultiNERD
    • WikiNeural
    • WNUT
    • MIT Movies
    • MIT Restaurants
  • BRAT formatted, including:
    • CADEC
  • NCBI
  • IEER
  • BioCreative
  • Groningen Meaning Bank
  • GeoVirus
  • MalwareTextDB

Annotation Schemes of BRAT and IOB

BRAT (Brat Rapid Annotation Tool) format is a standoff annotation format used for text annotation tasks. In BRAT format, each entity is represented on a separate line with annotations including an ID, entity type, start and end offsets, and the annotated text. Example:

example.txt

The following month, he signed a contract to play for the Newark Bears in the International League.
example.ann

T1    ORG 58 70    Newark Bears
T2    ORG 78 98    International League

IOB (Beginning, Inside, Outside) format, also known as BIO, is a tagging scheme used for token-level annotation in NLP tasks like Named Entity Recognition. The B- prefix indicates the beginning of an entity, I- prefix indicates a token inside an entity, and O tag represents tokens outside any entity.

0    The    O
1    following    O
2    month    O
3    ,    O
4    he    O
5    signed    O
6    a    O
7    contract    O
8    to    O
9    play    O
10    for    O
11    the    O
12    Newark    B-ORG
13    Bears    I-ORG
14    in    O
15    the    O
16    International    B-ORG
17    League    I-ORG
18    .    O

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ner-reformat-1.0.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ner_reformat-1.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file ner-reformat-1.0.0.tar.gz.

File metadata

  • Download URL: ner-reformat-1.0.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for ner-reformat-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ec94172fe162674fcdca8d976a37a0828bb7f85aab927683464019be520ee49d
MD5 d75b03ef56ccb4f7c4dd2aa529c30e10
BLAKE2b-256 0e78acbf6287c2f0e7ec9fbecd64e29fe273f88864069d5105fdf597d6424918

See more details on using hashes here.

File details

Details for the file ner_reformat-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ner_reformat-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for ner_reformat-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d77e6cc15b1a2ffc32ea08bcbb9f1833b22f2435fbf04d5225e57f13e56cbae3
MD5 0b142a20cb19bbf064b5e8658c65eb72
BLAKE2b-256 0828e272991fdc4db82e0ef82ef1a3502dfb785838cdb44b0dfa1a4aa69f4d14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page