Ontonotes-5-parsing: parser of Ontonotes 5.0 to transform this corpus to a simple JSON format.
Project description
A simple parser of the famous Ontonotes 5 dataset https://catalog.ldc.upenn.edu/LDC2013T19
This dataset is very useful for experiments with NER, i.e. Named Entity Recognition. Besides, Ontonotes 5 includes three languages (English, Arabic, and Chinese), and this fact increases interest to use it in experiments with multi-lingual NER. But the source format of Ontonotes 5 is very intricate, in my view. Conformably, the goal of this project is the creation of a special parser to transform Ontonotes 5 into a simple JSON format. In this format, each annotated sentence is represented as a dictionary with five keys: text, morphology, syntax, entities, and language. In their’s turn, morphology, syntax, and entities are specified as dictionaries too, where each dictionary describes labels (part-of-speech labels, syntactical tags, or entity classes) and their bounds in the corresponded text.
You can read more detailed information about this Ontonotes 5 parser in the [small documentation](https://github.com/nsu-ai/ ontonotes-5-parsing/blob/master/readme.md)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ontonotes-5-parsing-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1af1cb0fb03c0e2ac5d3f5f667d516755c35b7ef856c3d9fdf72fdcaad7a60f |
|
MD5 | cc0901e9e4c3163328904e600cb36257 |
|
BLAKE2b-256 | 1ebe4734d78b580c100b04f08fcc58d01c9fe922ad9b09e02afec1c6c36a1b30 |
Hashes for ontonotes_5_parsing-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eceedeaadd1f02b028d1d34c873280ef571b349d47f9b7833d6fc328fdf840c |
|
MD5 | b2154fb56db95d1c6269d9819ed7e526 |
|
BLAKE2b-256 | 9a8f3be277766a8fb714a2eeabdca6f3ca6ae0ff8d02aab083ace54246053501 |