graph-based processing of multi-level annotated corpora
Project description
DiscourseGraphs
This library enables you to process linguistic corpora with multiple levels of annotations by:
converting the different annotation formats into separate graphs and
merging these graphs into a single multidigraph (based on the common tokenization of the annotation layers)
So far, the following formats can be imported and merged:
TigerXML (a format for representing tree-like syntax graphs with secondary edges)
RS3 (a format used by RSTTool to annotate documents with Rhetorical Structure Theory)
an ad-hoc plain text format for annotating expletives (you’re probably not interested in)
Installation
git clone https://github.com/arne-cl/discoursegraphs.git cd discoursegraphs python setup.py install # prepend 'sudo' if needed
Requirements
If you’d like to visualize your graphs, you will also need:
License
3-Clause BSD.
People who downloaded this also like
SaltNPepper (a converter framework for various linguistic data formats)
News
0.1
Release date: 24-Apr-2014
first public release
imports: RS3, TigerXML and an ad-hoc format for expletive annotation
merge these formats/files into a single multidigraph
generates simple dot/graphviz-based visualization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.