Skip to main content

graph-based processing of multi-level annotated corpora

Project description


This library enables you to process linguistic corpora with multiple levels of annotations by:

  1. converting the different annotation formats into separate graphs and

  2. merging these graphs into a single multidigraph (based on the common tokenization of the annotation layers)

So far, the following formats can be imported and merged:

  • TigerXML (a format for representing tree-like syntax graphs with secondary edges)

  • RS3 (a format used by RSTTool to annotate documents with Rhetorical Structure Theory)

  • an ad-hoc plain text format for annotating expletives (you’re probably not interested in)


git clone
cd discoursegraphs
python install # prepend 'sudo' if needed


If you’d like to visualize your graphs, you will also need:


3-Clause BSD.


Arne Neumann

People who downloaded this also like

  • SaltNPepper (a converter framework for various linguistic data formats)



Release date: 24-Apr-2014

  • first public release

  • imports: RS3, TigerXML and an ad-hoc format for expletive annotation

  • merge these formats/files into a single multidigraph

  • generates simple dot/graphviz-based visualization

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discoursegraphs-0.1.0.tar.gz (17.9 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page