Skip to main content
Help us improve Python packaging – donate today!

graph-based processing of multi-level annotated corpora

Project Description


PyPI download counter Latest version BSD License

This library enables you to process linguistic corpora with multiple levels of annotations by:

  1. converting the different annotation formats into separate graphs and
  2. merging these graphs into a single multidigraph (based on the common tokenization of the annotation layers)

So far, the following formats can be imported and merged:

  • TigerXML (a format for representing tree-like syntax graphs with secondary edges)
  • RS3 (a format used by RSTTool to annotate documents with Rhetorical Structure Theory)
  • an ad-hoc plain text format for annotating expletives (you’re probably not interested in)


Install from PyPI

pip install discoursegraphs # prepend 'sudo' if needed

or, if you’re oldschool:

easy_install discoursegraphs # prepend 'sudo' if needed

Install from source

git clone
cd discoursegraphs
python install # prepend 'sudo' if needed


Right now, there’s only a primitive command line interface that will merge the syntax, RST and expletive annotation layers into one graph and generates a dot file from it.

discoursegraphs syntax/doc.xml rst/doc.rs3 expletives/doc.txt
dot -Tpdf > discoursegraph.pdf # generates a PDF from the dot file

If you’re interested in working with just one of those layers, you’ll have to call the code directly:

from discoursegraphs import readwrite
tiger_docgraph = readwrite.TigerDocumentGraph('syntax/doc.xml')
rst_docgraph = readwrite.RSTGraph('rst/doc.rs3')
expletives_docgraph = readwrite.AnaphoraDocumentGraph('expletives/doc.txt')

All the document graphs generated in this example are derived from the networkx.MultiDiGraph class, so you should be able to use all of its methods.


Source code documentation is available here, but you can always get an up-to-date local copy using Sphinx.

You can generate an HTML or PDF version by running these commands in the docs directory:

make latexpdf

to produce a PDF (docs/_build/latex/discoursegraphs.pdf) and

make html

to produce a set of HTML files (docs/_build/html/index.html).


If you’d like to visualize your graphs, you will also need:


3-Clause BSD.


Arne Neumann

People who downloaded this also like

  • SaltNPepper (a converter framework for various linguistic data formats)


0.1.2 (2014-05-13)

Release data: 13-May-2014

  • added basic Geoff and Neo4j exporter (not yet available via the command line)
  • added sphinx-based documentation

0.1.1 (2014-04-25)

Release date: 25-Apr-2014

  • small improvements
  • added usage examples to readme
  • discoursegraphs script now uses the commandline interface of the merging module

0.1.0 (2014-04-24)

Release date: 24-Apr-2014

  • first public release
  • imports: RS3, TigerXML and an ad-hoc format for expletive annotation
  • merge these formats/files into a single multidigraph
  • generates simple dot/graphviz-based visualization

Release history Release notifications

History Node


History Node


History Node


This version
History Node


History Node


History Node


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
discoursegraphs-0.1.2.tar.gz (21.0 kB) Copy SHA256 hash SHA256 Source None May 13, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page