A library for the generation of minimal-pair syntactic tests from treebanks for Targeted Syntactic Evaluation of LLMs.
Project description
About
Grew-TSE is a tool for the query-based generation of custom minimal-pair syntactic tests from treebanks for Targeted Syntactic Evaluation of LLMs. The query language of choice is GREW (Graph Rewriting for NLP).
For full details of package installation and usage, see the Grew-TSE Documentation.
Purpose
The general research question that Grew-TSE aims to help answer is:
Can language models distinguish grammatical from ungrammatical sentences across syntactic phenomena and languages?
This means that if you speak a language, especially one that is low-resource, then you likely have something novel you could test in this area.
The pipeline generally looks something like the following:
- Parse a Universal Dependencies treebank in CoNLL-U format
- Isolate a specific syntactic phenomenon (e.g. verbal agreement) using a GREW query.
- Convert these isolated sentences into masked- or prompt-based datasets.
- Search the original treebank for words that differ by one syntactic feature to form a minimal pair.
- Evaluate a model available on the Hugging Face platform and view metrics such as accuracy, precision, recall, and the F1 score.
What does a "minimal-pair syntactic test" look like?
To analyse models in this way, we use what are called minimal pairs. A minimal pair consists of either
(1) two sentences that differ by one syntactic feature, or
(2) one sentence with a "gap" (or simply end mid-sentence as for next-token prediction) and two accompanying lexical items (e.g. is/are), one being deemed grammatical in the given context and one not.
With this tool we concern ourselves with the latter, and focus on generating minimal pairs (W1, W2) for the same context.
An example of some tests are shown in the table below, generated using Grew-TSE from the English EWT UD Treebank.
| masked_text | form_grammatical | form_ungrammatical |
|---|---|---|
| It [MASK] clear to me that the manhunt for high Ba... | seems | seem |
| In Ramadi, there [MASK] a big demonstration... | was | were |
| As the survey cited in the above-linked article [MASK]... | shows | show |
| Jim Lobe [MASK] more on the political implications... | has | have |
The above tests are for models trained on a Masked Language Modelling Task (MLM), however you may also generate prompt-based datasets with Grew-TSE.
Try out the Dashboard on Hugging Face🤗
You can try out the official Grew-TSE dashboard available as a Hugging Face Space. It currently is intended primarily for demonstration purposes, but can be useful for quickly carrying out syntactic evaluations.
Basic Usage
The first step in using this package is to create a lexical item set, which is a fancy way of saying a dataset of words and their features. These are used to identify the ungrammatical word for every grammatical word that you isolate in your Grew query.
from grewtse.pipeline import GrewTSEPipe
g_pipe = GrewTSEPipe()
# the first step is always to load in a UD Treebank
# you can supply either a single file path or a list of file paths
treebank_path = "./my-treebanks/german.conllu"
g_pipe.parse_treebank(treebank_path)
The deeper your knowledge of a language, the better you'll be at choosing syntactic phenomena to evaluate. Treebanks that are more expressive in terms of features will allow you to ask more questions and those that are of a larger size will be more likely to find suitable minimal pairs. The minimal pairs are found by isolating that word and its features, and altering the features by (typically) one. For instance, by changing an accusative noun to a genitive one. Note that morphological constraints (e.g Case, Gender, Number) are passed distinctly from universal constraints (upos) These are specified in a dict, like so:
morphology_change = {
"case": "Gen"
}
A Grew query and a target form the means by which we isolate individual phenomena and the target word, typically the grammatical word, for our grammatical-ungrammatical minimal pair. The Grew query feature values may change between treebanks, but the logic of the query should remain consistent. The dependency node is that variable in our grew query that represents that target word. For instance, V in the below query is isolated represeneting the verb. The dependency node must be a variable specified in the grew query. The below fancy-schmancy query isolates non-negated transitive verb phrases:
grew_query = """
pattern {
V [upos=VERB];
DirObj [Case=Gen];
V -[obj]-> DirObj;
}
without {
NEG [upos=PART, Polarity=Neg];
V -[advmod:neg]-> NEG;
}
"""
target = "V"
The generation of grammatical-ungrammatical minimal pairs for each sentence, as well as the automatic masking of that sentence, can then be undertaken with the following:
# generate a dataset from the treebank that creates masked
# sentences for masked language modeling (MLM)
masked_df = g_pipe.generate_masked_dataset(
grew_query,
target
)
# generate a dataset from the treebank that creates prompts
# for next-word prediction
prompt_df = g_pipe.generate_prompt_dataset(
grew_query,
target
)
# can only occur after a masked or prompt dataset
# has been generated
mp_dataset = g_pipe.generate_minimal_pair_dataset(
morphology_change,
)
Built With
Grew-TSE was built completely in Python and is available soon as a Python package. It makes use of the Huggingface Transformers library as well as plotnine for plotting.
Of course, the grewpy package was essential for this project.
License
This project is licensed under the GNU General Public License (GPL). See the LICENSE file for full details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grew_tse-0.1.3.tar.gz.
File metadata
- Download URL: grew_tse-0.1.3.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
916f67bb39d6de0e32b03a4bfeea256bd6d22aab6ea5a3afefe1aff1abdcf344
|
|
| MD5 |
975298aa924fa674c41969802a825595
|
|
| BLAKE2b-256 |
5dd18f1a49483f888dbe9f7addcba3f4854d299f8b46e3faf40f1a38b0276eaa
|
File details
Details for the file grew_tse-0.1.3-py3-none-any.whl.
File metadata
- Download URL: grew_tse-0.1.3-py3-none-any.whl
- Upload date:
- Size: 41.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cfe143bfab451266a830f6d0e95e4e1b596dad1cda631eaf7cede885e54e1a0
|
|
| MD5 |
3775968894917701c51f102dd2c9ea67
|
|
| BLAKE2b-256 |
867f7d4ee8290c9e49dbb6fa319a3bd6c29966bf27978f619b1434573481ae2a
|