Skip to main content

Python interface for converting Penn Treebank trees to Stanford Dependencies

Project description

Example usage

Start by getting a StanfordDependencies instance with StanfordDependencies.get_instance():

>>> import StanfordDependencies
>>> sd = StanfordDependencies.get_instance(backend='subprocess')

get_instance() takes several options. backend can currently be subprocess or jpype (see below). If you have an existing Stanford CoreNLP or Stanford Parser jar file, use the jar_filename parameter to point to the full path of the jar file. Otherwise, PyStanfordDependencies will download a jar file for you and store it in locally (~/.local/share/pystanforddeps). You can request a specific version with the version flag, e.g., version='3.4.1'. To convert trees, use the convert_tree() or convert_trees() method. These return a sentence (list of Token objects) or a list of sentences (list of list of Token objects) respectively:

>>> sent = sd.convert_tree('(S1 (NP (DT some) (JJ blue) (NN moose)))')
>>> for token in sent:
...     print token
...
Token(index=1, form='some', cpos='DT', pos='DT', head=3, deprel='det')
Token(index=2, form='blue', cpos='JJ', pos='JJ', head=3, deprel='amod')
Token(index=3, form='moose', cpos='NN', pos='NN', head=0, deprel='root')

This tells you that moose is the head of the sentence and is modified by some (with a det = determiner relation) and blue (with an amod = adjective modifier relation). Fields on Token objects are readable as attributes. See docs for addtional options in convert_tree() and convert_trees().

Backends

Currently PyStanfordDependencies includes two backends:

  • subprocess (works anywhere with a java binary, slow so conversions with convert_trees() are recommended)

  • jpype (requires jpype1, faster than Subprocess, includes access to the Stanford CoreNLP lemmatizer)

By default, it will attempt to use the jpype backend and fallback to subprocess with a warning.

More information

Licensed under Apache 2.0.

Written by David McClosky (homepage, code)

Bug reports and feature requests: GitHub issue tracker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyStanfordDependencies-0.1.1.tar.gz (12.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page