Skip to main content

Dependency parse searching

Project description

depgrep

Dependency parse searching for CONLL-U DataFrames

Version 0.1.3

Note: this tool currently doesn't have tests, CI, etc. It is not yet advised to use this tool outside of the depgrep methods provided by the buzz library.

Installation

pip install depgrep

Usage

The tool is designed to work with corpora made from CONLL-U files and parsed into DataFrames by buzz. The best thing to do is use buzz to model corpora, and then use its depgrep method.

pip install buzz

Then, in Python:

from buzz import Corpus
corpus = Corpus('path/to/conll/files')
query = 'l"have"'  # match the lemma "have"

Syntax

depgrep searches work through a combination of nodes and relations, just like Tgrep2, on which this tool is based.

Nodes

A node targets one token feature (word, lemma, POS, wordclass, dependency role, etc). It may be specified as a regular expression or a simple string match: f/amod|nsubj/ will match tokens filling the nsubj or amod role; l"be" will match the lemma, be.

The first part of the node query chooses which token attribute is to be searched. It can be any of:

w : word
l : lemma
p : part of speech tag
x : wordclass / XPOS
f : dependency role
i : index in sentence
s : sentence number

Case sensitivity is controlled by the case of the attribute you are searching: p/VB/ is case-insensitive, and P/VB/ is case sensitive. Therefore, the following query matches words ending in ing, ING, Ing, etc:

w/ing$/

For case-insensitivity across the query, use the case_sensitive=False keyword argument.

Relations

Relations specify the relationship between nodes. For example, we can use f"nsubj" <- f"ROOT" to locate nominal subjects governed by nodes in the role of ROOT. The thing you want to find is the leftmost node in the query. So, while the above query finds nominal subject tokens, you could use inverse relation, f"ROOT" -> f"nsubj" to return the ROOT tokens.

Available relations:

a = b   : a and b are the same node
a & b   : a and b are the same node (same as =)

a <- b  : a is a dependent of b
a <<- b : a is a descendent of b, with any distance in between
a <-: b : a is the only dependent of b
a <-N b : a is descendent of b by N generations

a -> b  : a is the governor of a
a ->> b : a is an ancestor of b, with any distance in between
a ->: b : a is the only governor of b (as is normal in many grammars)
a ->N b : a is ancestor of b by N generations

a + b   : a is immediately to the left of b
a +N b  : a is N places to the left of b
a <| b  : a is left of b, with any distance in between

a - b   : a is immediately to the right of b
a -N b  : a is n places to the right of b
a |> b  : a is right of b, with any distance in between

a $ b   : a and b share a governor (i.e. are sisters)

a $> b  : a is a sister of and to the right of b.
a $< b  : a is a sister of and to the left of b.

Negation

Add ! before a relation to negate it: f"ROOT" != x"VERB" will find non-verbal ROOT nodes.

Brackets

Brackets can be used to make more complex queries:

f"amod" = l/^[abc]/ <- (f/nsubj/ != x/NOUN/)

The above translates to match adjectival modifiers starting with a, b or c, which are governed by nominal subjects that are not nouns

Note that without brackets, each relation/node refers to the leftmost node. In the following, the plural noun must be the same node as the nsubj, not the ROOT:

f"nsubj" <- f"ROOT" = p"NNS"

Or expressions

You can use the pipe (|) to create an OR expression.

# match all kinds of modifiers
x"ADJ" | f"amod" | f"appos" | p/^JJ/
x"NOUN" <- f"ROOT" | = p"NNS"

Above, we match nouns that are either governed by ROOT, or are plural.

Wildcard

You can use __ or * to stand in for any token. To match any token that is the governor of a verb, do:

__ -> x"VERB"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

depgrep-0.1.3.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

depgrep-0.1.3-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file depgrep-0.1.3.tar.gz.

File metadata

  • Download URL: depgrep-0.1.3.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for depgrep-0.1.3.tar.gz
Algorithm Hash digest
SHA256 dc0ca8e8be4f4645b8a9e3eec19e71092144a8da32aa8bd93d7874e21c480acf
MD5 5faf625ad4410a9a32008457c403d2fa
BLAKE2b-256 874885b55230d0a6e0f11b5843fcafdb899f96ed7e5f815425837bd11e681ec2

See more details on using hashes here.

File details

Details for the file depgrep-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: depgrep-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for depgrep-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 06d4f136ed8bdfa2e6264d0b1de7023d3a01afc7f930572413266825262bbaaf
MD5 0ebf54179beaebf6db75b15f9580657e
BLAKE2b-256 f18936076afd0e1bb7adbe93991b91bccf244f11d2c37124fe04a69ec08c95e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page