Skip to main content

A set of utilities for interacting with Penn-Treebank .mrg-formatted parses and identifying syntactic heads

Project description

mrg_utils.py Created by Robert Elwell University of Texas at Austin Department of Linguistics http://comp.ling.utexas.edu/relwell

Licensed under GPL

This is a set of python classes for processing Penn-Treebank-style combined parses, also known as the .mrg format in PTB release two. Files should be fairly self-explanatory.

Canonical node is mrg_utils.py, but mrg_document.py and node.py may be more informative for someone starting out.

This could save you up to a month of writing and debugging, and was designed to be scalable.

You can use this to extract features, easily run statistics, and navigate syntactic trees.

This code is built from an API originally designed to interface with Stanford Parser-style dependency parse outputs (Marneffe et al, 2006), Penn Discourse Treebank data, and more. Code or guidance will be furnished upon request by emailing me at robert.elwell@gmail.com.

Good luck, and enjoy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrg_utils-0.0.1.tar.gz (8.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page