OntoNotes Normal Form Parser
Project description
Introduction
onf-parser is a lightweight pure Python library for parsing the OntoNotes Normal Form format (.onf – cf. section 6.3).
Installation
Note that Python >=3.7 is required due to our dependency on dataclasses.
pip install onf-parser
Usage
There are three top-level functions:
from onf_parser import parse_files, parse_file, parse_file_string
# read a single file
sections = parse_file('ontonotes/some/file.onf')
# or parse a raw string
sections = parse_file_string(s)
# read all .onf files in a single directory
files = parse_file('ontonotes/')
For each file, a list of Section objects (which correspond to documents for the purposes of annotation) will be available:
for filepath, sections in files:
for section in sections:
coref_chains = section.chains
for chain in coref_chains:
print(chain.type)
print(chain.id)
print(chain.mentions)
for mention in chain.mentions:
print(mention.sentence_id)
print(mention.tokens)
for sentence in section.sentences:
print(sentence.plain_sentence)
print(sentence.plain_sentence.string)
print(sentence.treebanked_sentence)
print(sentence.treebanked_sentence.string)
print(sentence.treebanked_sentence.tokens)
print(sentence.speaker_information)
print(sentence.speaker_information.name)
print(sentence.speaker_information.start_time)
print(sentence.speaker_information.stop_time)
print(sentence.tree)
print(sentence.tree.tree_string)
print(sentence.leaves)
for leaf in sentence.leaves:
print(leaf.token)
print(leaf.token_id)
# NER
print(leaf.name)
print(leaf.name.type)
print(leaf.name.token_id_range)
print(leaf.name.tokens)
# Coreference
print(leaf.coref)
print(leaf.coref.type)
print(leaf.coref.token_id_range)
print(leaf.coref.tokens)
# WordNet sense
print(leaf.sense)
print(leaf.sense.label)
# PropBank
print(leaf.prop.label)
print(leaf.prop)
for arg_label, arg_spans in leaf.prop.args.items():
print(arg_label)
for arg_span in arg_spans:
print(arg_span)
See model classes for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file onf_parser-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: onf_parser-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f94ec472f205c634f12940833ec1f581380f4e17569888d93a53cedebb8dbc37 |
|
MD5 | 3a1cd0d03503c54e5a02d8ed2a43a981 |
|
BLAKE2b-256 | 23ec08549d5a2601c0ec815a22d9a782dff44bf304b656bdc2d73af52a67a909 |