Dependency parse searching
Project description
depgrep
Dependency parse searching for CONLL-U DataFrames
Version 0.1.3
Note: this tool currently doesn't have tests, CI, etc. It is not yet advised to use this tool outside of the depgrep methods provided by the
buzz
library.
Installation
pip install depgrep
Usage
The tool is designed to work with corpora made from CONLL-U files and parsed into DataFrames by buzz. The best thing to do is use buzz to model corpora, and then use its depgrep method.
pip install buzz
Then, in Python:
from buzz import Corpus
corpus = Corpus('path/to/conll/files')
query = 'l"have"' # match the lemma "have"
Syntax
depgrep searches work through a combination of nodes and relations, just like Tgrep2, on which this tool is based.
Nodes
A node targets one token feature (word, lemma, POS, wordclass, dependency role, etc). It may be specified as a regular expression or a simple string match: f/amod|nsubj/
will match tokens filling the nsubj or amod role; l"be"
will match the lemma, be.
The first part of the node query chooses which token attribute is to be searched. It can be any of:
w : word
l : lemma
p : part of speech tag
x : wordclass / XPOS
f : dependency role
i : index in sentence
s : sentence number
Case sensitivity is controlled by the case of the attribute you are searching: p/VB/
is case-insensitive, and P/VB/
is case sensitive. Therefore, the following query matches words ending in ing, ING, Ing, etc:
w/ing$/
For case-insensitivity across the query, use the case_sensitive=False
keyword argument.
Relations
Relations specify the relationship between nodes. For example, we can use f"nsubj" <- f"ROOT"
to locate nominal subjects governed by nodes in the role of ROOT. The thing you want to find is the leftmost node in the query. So, while the above query finds nominal subject tokens, you could use inverse relation, f"ROOT" -> f"nsubj"
to return the ROOT tokens.
Available relations:
a = b : a and b are the same node
a & b : a and b are the same node (same as =)
a <- b : a is a dependent of b
a <<- b : a is a descendent of b, with any distance in between
a <-: b : a is the only dependent of b
a <-N b : a is descendent of b by N generations
a -> b : a is the governor of a
a ->> b : a is an ancestor of b, with any distance in between
a ->: b : a is the only governor of b (as is normal in many grammars)
a ->N b : a is ancestor of b by N generations
a + b : a is immediately to the left of b
a +N b : a is N places to the left of b
a <| b : a is left of b, with any distance in between
a - b : a is immediately to the right of b
a -N b : a is n places to the right of b
a |> b : a is right of b, with any distance in between
a $ b : a and b share a governor (i.e. are sisters)
a $> b : a is a sister of and to the right of b.
a $< b : a is a sister of and to the left of b.
Negation
Add !
before a relation to negate it: f"ROOT" != x"VERB"
will find non-verbal ROOT nodes.
Brackets
Brackets can be used to make more complex queries:
f"amod" = l/^[abc]/ <- (f/nsubj/ != x/NOUN/)
The above translates to match adjectival modifiers starting with a, b or c, which are governed by nominal subjects that are not nouns
Note that without brackets, each relation/node refers to the leftmost node. In the following, the plural noun must be the same node as the nsubj, not the ROOT:
f"nsubj" <- f"ROOT" = p"NNS"
Or expressions
You can use the pipe (|
) to create an OR expression.
# match all kinds of modifiers
x"ADJ" | f"amod" | f"appos" | p/^JJ/
x"NOUN" <- f"ROOT" | = p"NNS"
Above, we match nouns that are either governed by ROOT, or are plural.
Wildcard
You can use __
or *
to stand in for any token. To match any token that is the governor of a verb, do:
__ -> x"VERB"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file depgrep-0.1.3.tar.gz
.
File metadata
- Download URL: depgrep-0.1.3.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc0ca8e8be4f4645b8a9e3eec19e71092144a8da32aa8bd93d7874e21c480acf |
|
MD5 | 5faf625ad4410a9a32008457c403d2fa |
|
BLAKE2b-256 | 874885b55230d0a6e0f11b5843fcafdb899f96ed7e5f815425837bd11e681ec2 |
File details
Details for the file depgrep-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: depgrep-0.1.3-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06d4f136ed8bdfa2e6264d0b1de7023d3a01afc7f930572413266825262bbaaf |
|
MD5 | 0ebf54179beaebf6db75b15f9580657e |
|
BLAKE2b-256 | f18936076afd0e1bb7adbe93991b91bccf244f11d2c37124fe04a69ec08c95e1 |