Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.

Project description

Alpino Query

pip install alpino-query

When running locally without installing, instead of alpino-query use python -m alpino_query.

Mark

Mark which part of the treebank should selected for filtering. It has three inputs:

Lassy/Alpino XML
the tokens of the sentence
for each token specify the properties which should be marked

For example:

alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"

It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.

alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"

Subtree

Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.

The second argument can be empty, cat, rel or both (i.e. catrel or cat,rel). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.

alpino-query subtree "$(<tests/data/001.marked.xml)" cat

XPath

Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.

alpino-query xpath "$(<tests/data/001.subtree.xml)" 0

Using as Module

from alpino_query import AlpinoQuery

alpino_xml = "<Alpino xml as string>"
tokens = ["Dit", "is", "een", "voorbeeldzin", "."]
attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]

query = AlpinoQuery()
query.mark(alpino_xml, tokens, attributes)
print(query.marked_xml) # query.marked contains the lxml Element

query.generate_subtree(["rel", "cat"])
print(query.subtree_xml) # query.subtree contains the lxml Element

query.generate_xpath(False) # True to make order sensitive
print(query.xpath)

Considerations

Exclusive

When querying a node this could be exclusive in multiple ways. For example:

a node should not be a noun node[@pos!="noun"]
it should not have a node which is a noun not(node[@pos="noun"])

The first statement does require the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.

Relations

@cat and @rel are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the na property for the child tokens.

Upload to PyPi

pip install twine
python setup.py sdist
twine upload dist/*

Project details

Release history Release notifications | RSS feed

2.1.10

Mar 22, 2023

2.1.9

Sep 16, 2022

2.1.8

Aug 16, 2022

2.1.7

Aug 16, 2022

This version

2.1.6

Aug 4, 2022

2.1.5

Aug 4, 2022

2.1.4

Jul 20, 2022

2.1.3

Feb 3, 2022

2.1.2

Feb 3, 2022

2.1.1

Feb 1, 2022

2.1

Jan 27, 2022

2.0

Nov 29, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alpino-query-2.1.6.tar.gz (15.9 kB view hashes)

Uploaded Aug 4, 2022 Source

Hashes for alpino-query-2.1.6.tar.gz

Hashes for alpino-query-2.1.6.tar.gz
Algorithm	Hash digest
SHA256	`60f0b0df59b857591d29bc52668ee428158f43176643572ba24763693b21c614`
MD5	`406b47d570079b11c9664efab2eea220`
BLAKE2b-256	`725d23030bf12e81aa0ef1ec44561440dda5fae6a4ff75951aada2eff5b05450`