Ontonotes-5-parsing: parser of Ontonotes 5.0 to transform this corpus to a simple JSON format.
Project description
A simple parser of the famous Ontonotes 5 dataset https://catalog.ldc.upenn.edu/LDC2013T19
This dataset is very useful for experiments with NER, i.e. Named Entity Recognition. Besides, Ontonotes 5 includes three languages (English, Arabic, and Chinese), and this fact increases interest to use it in experiments with multi-lingual NER. But the source format of Ontonotes 5 is very intricate, in my view. Conformably, the goal of this project is the creation of a special parser to transform Ontonotes 5 into a simple JSON format. In this format, each annotated sentence is represented as a dictionary with five keys: text, morphology, syntax, entities, and language. In their’s turn, morphology, syntax, and entities are specified as dictionaries too, where each dictionary describes labels (part-of-speech labels, syntactical tags, or entity classes) and their bounds in the corresponded text.
You can read more detailed information about this Ontonotes 5 parser in the small documentation https://github.com/nsu-ai/ontonotes-5-parsing/blob/master/readme.md
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ontonotes-5-parsing-0.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 620c2b6efea7f2e0edcb19bc1def91efc9232897e969c91b858d02241cd7f738 |
|
MD5 | dfaef59f2ec5a7da5f6439a58deac635 |
|
BLAKE2b-256 | cf97edf5c59ddebeeef0162867b17050b71c845694a4b7735dc5ea69b1bd8700 |
Hashes for ontonotes_5_parsing-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1379683fa315418313006e75c678f74a3a9e7b853c3996ccab4891eb1acce3f7 |
|
MD5 | 52c9c3838f2d7db6bc696735f20fea63 |
|
BLAKE2b-256 | 5bee886a2b5faea82c9e665caf055dcdd868ce806682b1f44f6bd898d7405c3c |