Skip to main content

A library for the generation of minimal-pair syntactic tests from treebanks for Targeted Syntactic Evaluation of LLMs.

Project description

About

Grew-TSE is a tool for the query-based generation of custom minimal-pair syntactic tests from treebanks for Targeted Syntactic Evaluation of LLMs. The query language of choice is GREW (Graph Rewriting for NLP).

For full details of package installation and usage, see the Grew-TSE Documentation.

Purpose

The general research question that Grew-TSE aims to help answer is:
Can language models distinguish grammatical from ungrammatical sentences across syntactic phenomena and languages?

This means that if you speak a language, especially one that is low-resource, then you likely have something novel you could test in this area.

The pipeline generally looks something like the following:

  1. Parse a Universal Dependencies treebank in CoNLL-U format
  2. Isolate a specific syntactic phenomenon (e.g. verbal agreement) using a GREW query.
  3. Convert these isolated sentences into masked- or prompt-based datasets.
  4. Search the original treebank for words that differ by one syntactic feature to form a minimal pair.
  5. Evaluate a model available on the Hugging Face platform and view metrics such as accuracy, precision, recall, and the F1 score.

What does a "minimal-pair syntactic test" look like?

To analyse models in this way, we use what are called minimal pairs. A minimal pair consists of either
(1) two sentences that differ by one syntactic feature, or
(2) one sentence with a "gap" (or simply end mid-sentence as for next-token prediction) and two accompanying lexical items (e.g. is/are), one being deemed grammatical in the given context and one not.
With this tool we concern ourselves with the latter, and focus on generating minimal pairs (W1, W2) for the same context.

An example of some tests are shown in the table below, generated using Grew-TSE from the English EWT UD Treebank.

masked_text form_grammatical form_ungrammatical
It [MASK] clear to me that the manhunt for high Ba... seems seem
In Ramadi, there [MASK] a big demonstration... was were
As the survey cited in the above-linked article [MASK]... shows show
Jim Lobe [MASK] more on the political implications... has have

The above tests are for models trained on a Masked Language Modelling Task (MLM), however you may also generate prompt-based datasets with Grew-TSE.

Try out the Dashboard on Hugging Face🤗

You can try out the official Grew-TSE dashboard available as a Hugging Face Space. It currently is intended primarily for demonstration purposes, but can be useful for quickly carrying out syntactic evaluations.

Launch GrewTSE Space

Basic Usage

The first step in using this package is to create a lexical item set, which is a fancy way of saying a dataset of words and their features. These are used to identify the ungrammatical word for every grammatical word that you isolate in your Grew query.

from grewtse.pipeline import GrewTSEPipe
g_pipe = GrewTSEPipe()

# the first step is always to load in a UD Treebank
# you can supply either a single file path or a list of file paths
treebank_path = "./my-treebanks/german.conllu"
g_pipe.parse_treebank(treebank_path)

The deeper your knowledge of a language, the better you'll be at choosing syntactic phenomena to evaluate. Treebanks that are more expressive in terms of features will allow you to ask more questions and those that are of a larger size will be more likely to find suitable minimal pairs. The minimal pairs are found by isolating that word and its features, and altering the features by (typically) one. For instance, by changing an accusative noun to a genitive one. Note that morphological constraints (e.g Case, Gender, Number) are passed distinctly from universal constraints (upos) These are specified in a dict, like so:

morphology_change = {
  "case": "Gen"
}

A Grew query and a target form the means by which we isolate individual phenomena and the target word, typically the grammatical word, for our grammatical-ungrammatical minimal pair. The Grew query feature values may change between treebanks, but the logic of the query should remain consistent. The dependency node is that variable in our grew query that represents that target word. For instance, V in the below query is isolated represeneting the verb. The dependency node must be a variable specified in the grew query. The below fancy-schmancy query isolates non-negated transitive verb phrases:

grew_query = """
  pattern {
    V [upos=VERB];
    DirObj [Case=Gen];
    V -[obj]-> DirObj;
  }

  without {
    NEG [upos=PART, Polarity=Neg];
    V -[advmod:neg]-> NEG;
  }
"""

target = "V"

The generation of grammatical-ungrammatical minimal pairs for each sentence, as well as the automatic masking of that sentence, can then be undertaken with the following:

# generate a dataset from the treebank that creates masked
# sentences for masked language modeling (MLM)
masked_df = g_pipe.generate_masked_dataset(
    grew_query, 
    target
)

# generate a dataset from the treebank that creates prompts
# for next-word prediction
prompt_df = g_pipe.generate_prompt_dataset(
    grew_query, 
    target
)

# can only occur after a masked or prompt dataset
# has been generated
mp_dataset = g_pipe.generate_minimal_pair_dataset(
    morphology_change,
)

Built With

Grew-TSE was built completely in Python and is available soon as a Python package. It makes use of the Huggingface Transformers library as well as plotnine for plotting.

  • Python
  • Huggingface

Of course, the grewpy package was essential for this project.

License

This project is licensed under the GNU General Public License (GPL). See the LICENSE file for full details.



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grew_tse-0.1.3.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grew_tse-0.1.3-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file grew_tse-0.1.3.tar.gz.

File metadata

  • Download URL: grew_tse-0.1.3.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for grew_tse-0.1.3.tar.gz
Algorithm Hash digest
SHA256 916f67bb39d6de0e32b03a4bfeea256bd6d22aab6ea5a3afefe1aff1abdcf344
MD5 975298aa924fa674c41969802a825595
BLAKE2b-256 5dd18f1a49483f888dbe9f7addcba3f4854d299f8b46e3faf40f1a38b0276eaa

See more details on using hashes here.

File details

Details for the file grew_tse-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: grew_tse-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for grew_tse-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1cfe143bfab451266a830f6d0e95e4e1b596dad1cda631eaf7cede885e54e1a0
MD5 3775968894917701c51f102dd2c9ea67
BLAKE2b-256 867f7d4ee8290c9e49dbb6fa319a3bd6c29966bf27978f619b1434573481ae2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page