Python interface for converting Penn Treebank trees to Stanford Dependencies
Project description
Example usage
Start by getting a StanfordDependencies instance with StanfordDependencies.get_instance():
>>> import StanfordDependencies >>> sd = StanfordDependencies.get_instance(backend='subprocess')
get_instance() takes several options. backend can currently be subprocess or jpype (see below). If you have an existing Stanford CoreNLP or Stanford Parser jar file, use the jar_filename parameter to point to the full path of the jar file. Otherwise, PyStanfordDependencies will download a jar file for you and store it in locally (~/.local/share/pystanforddeps). You can request a specific version with the version flag, e.g., version='3.4.1'. To convert trees, use the convert_tree() or convert_trees() method. These return a sentence (list of Token objects) or a list of sentences (list of list of Token objects) respectively:
>>> sent = sd.convert_tree('(S1 (NP (DT some) (JJ blue) (NN moose)))') >>> for token in sent: ... print token ... Token(index=1, form='some', cpos='DT', pos='DT', head=3, deprel='det') Token(index=2, form='blue', cpos='JJ', pos='JJ', head=3, deprel='amod') Token(index=3, form='moose', cpos='NN', pos='NN', head=0, deprel='root')
This tells you that moose is the head of the sentence and is modified by some (with a det = determiner relation) and blue (with an amod = adjective modifier relation). Fields on Token objects are readable as attributes. See docs for additional options in convert_tree() and convert_trees().
If you have the asciitree package, you can use a prettier formatter:
>>> print sent.as_asciitree() moose [root] +-- some [det] +-- blue [amod]
Backends
Currently PyStanfordDependencies includes two backends:
subprocess (works anywhere with a java binary, slow so batched conversions with convert_trees() are recommended)
jpype (requires jpype1, faster than Subprocess, includes access to the Stanford CoreNLP lemmatizer)
By default, PyStanfordDependencies will attempt to use the jpype backend and fallback to subprocess with a warning if jpype isn’t available or crashes on startup.
More information
Licensed under Apache 2.0.
Written by David McClosky (homepage, code)
Bug reports and feature requests: GitHub issue tracker
Release summaries
0.1.5 (2015.01.10): Support for ASCII tree formatting
0.1.4 (2015.01.07): Fix CCprocessed support
0.1.3 (2015.01.03): Bugfixes, coveralls integration, refactoring
0.1.2 (2015.01.02): Better CoNLL structures, test suite and Travis-CI support, bugfixes
0.1.1 (2014.12.15): More docs, fewer bugs
0.1 (2014.12.14): Initial version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for PyStanfordDependencies-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b041ce347ec93e51b1104e1de5521ef6c899e989afb4d8c91e1153e15419e01e |
|
MD5 | ce0591afe0d2497d032c9b2dd3f50f79 |
|
BLAKE2b-256 | 8ca2cde2af0d4a188931a953452b86a429fdab1369268d67993b7e16e3dcba23 |