Python interface for converting Penn Treebank trees to Stanford Dependencies
Project description
Example usage
Start by getting a StanfordDependencies instance with StanfordDependencies.get_instance():
>>> import StanfordDependencies >>> sd = StanfordDependencies.get_instance(backend='subprocess')
get_instance() takes several options. backend can currently be subprocess or jpype (see below). If you have an existing Stanford CoreNLP or Stanford Parser jar file, use the jar_filename parameter to point to the full path of the jar file. Otherwise, PyStanfordDependencies will download a jar file for you and store it in locally (~/.local/share/pystanforddeps). You can request a specific version with the version flag, e.g., version='3.4.1'. To convert trees, use the convert_tree() or convert_trees() method. These return a sentence (list of Token objects) or a list of sentences (list of list of Token objects) respectively:
>>> sent = sd.convert_tree('(S1 (NP (DT some) (JJ blue) (NN moose)))') >>> for token in sent: ... print token ... Token(index=1, form='some', cpos='DT', pos='DT', head=3, deprel='det') Token(index=2, form='blue', cpos='JJ', pos='JJ', head=3, deprel='amod') Token(index=3, form='moose', cpos='NN', pos='NN', head=0, deprel='root')
This tells you that moose is the head of the sentence and is modified by some (with a det = determiner relation) and blue (with an amod = adjective modifier relation). Fields on Token objects are readable as attributes. See docs for addtional options in convert_tree() and convert_trees().
Backends
Currently PyStanfordDependencies includes two backends:
subprocess (works anywhere with a java binary, slow so batched conversions with convert_trees() are recommended)
jpype (requires jpype1, faster than Subprocess, includes access to the Stanford CoreNLP lemmatizer)
By default, PyStanfordDependencies will attempt to use the jpype backend and fallback to subprocess with a warning if jpype isn’t available or crashes on startup.
More information
Licensed under Apache 2.0.
Written by David McClosky (homepage, code)
Bug reports and feature requests: GitHub issue tracker
Release summaries
0.1.3 (2015.01.03): Bugfixes, coveralls integration, refactoring
0.1.2 (2015.01.02): Better CoNLL structures, test suite and Travis-CI support, bugfixes
0.1.1 (2014.12.15): More docs, fewer bugs
0.1 (2014.12.14): Initial version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for PyStanfordDependencies-0.1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfe74f899cc51a5e4094b669c83c0947e4ad4abc8d63c48b90e56827755b4bca |
|
MD5 | ff88d6cb4f0b2560585a49841bbc8967 |
|
BLAKE2b-256 | 7ee78d1cb0cafccba4be3478d9982d6ed2a98edbb9344138766d5ad0b1bce2a6 |