Skip to main content

CLI for the GELATO Dataset for Legislative NER

Project description

The GELATO Dataset for Legislative NER

This repository contains the code, data, and scores for The Gelato Dataset for Legislative NER (LREC 2026).

Original Paper

The preprint of the original paper is available on arXiv:

The GELATO Dataset for Legislative NER

CLI

The core of the project is a CLI to make it easy to run experiments on the GELATO dataset.

Installation

This project uses uv to manage the environment and internal dependencies.

With uv installed, run uv sync in the project root to create a .venv managed by uv. Then, run:

uv run gelato --help

to see commands.

Optionally, install the CLI as a tool on your $PATH via:

uv tool install .

and simply run

gelato --help

from anywhere to access the CLI.

Commands

The CLI has a variety of commands to facilitate working with gelato.

For help, run

uv run gelato --help
Usage: gelato [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.
  --help                Show this message and exit.

Commands:
  prompt-optimize  Use DSPy to optimize level two type prompts for a level one type
  predict          Load a DSPy-optimized program to predict level two labels 
                   from CoNLL-formatted level one predictions
  fine-tune        Fine-tune a HuggingFace Transformer using `wandb`
  train-model      Train the desired model with the provided parameters
  score            Score a model on the datset at the provided path
  align            Align predictions with tokens if the tokenizer aggregation 
                   pipeline fails. Applies first label wins strategy for
                   aggregation of text and labels. Useful as non-word-based
                   tokenizers sometimes struggle to rebuild and aggregate certain
                   words.
  confusion        Generate confusion matrices from CoNLL-formatted predictions 
                   and their reference counterpart

prompt-optimize

The prompt-optimize command simplifies using DSPy to optimize level two type prompts for each level one type prediction.

uv run gelato prompt-optimize --help
Usage: gelato prompt-optimize [OPTIONS] TRAIN_PATH DEV_PATH MODEL

  Use DSPy to optimize level two type prompts for a level one type

Arguments:
  TRAIN_PATH  Path to CoNLL-formatted train dataset  [required]
  DEV_PATH    Path to CoNLL-formatted test dataset  [required]
  MODEL       LLM to prompt as a HuggingFace ID e.g. 'Qwen/Qwen3-32B'
              [required]

Options:
  --level-one-type    [Abstraction|Act|Class|Document|Organization|Person]
                      Level one type to fine-tune a prompt for its
                      level two types  [required]
  --module            [ChainOfThought|Predict]
                      What dspy.Module to use  [required]
  --optimizer         [BetterTogether|BootstrapFewShot|BootstrapFewShotWithRandomSearch|
                      BootstrapFinetune|BootstrapRS|COPRO|Ensemble|InferRules|
                      KNNFewShot|LabeledFewShot|MIPROv2|SIMBA]
                      What dspy.Optimizer [required]
  --window INTEGER    The left-right context window to provide the
                      LLM for each mention  [default: 50]
  --base-url TEXT     URL endpoint for an OpenAI-compatible LLM
                      chat server e.g. 'http://localhost:8000/v1'
                      [default: http://localhost:8000/v1]
  --api-key TEXT      API key for OpenAI LLM endpoint. Defaults to
                      'LOCAL' for self-hosted models that do not
                      require authentication.  [default: LOCAL]
  --k INTEGER         'k' to use when generating kNN if
                      'KNNFewShot' is the Optimizer  [default: 10]
  --help              Show this message and exit.

predict

Load a DSPy-optimized program to predict level two labels from CoNLL-formatted level one predictions.

uv run gelato predict --help
Usage: gelato predict [OPTIONS] TEST_PATH MODEL

  Load a DSPy-optimized program to predict level two labels from CoNLL-
  formatted level one predictions

Arguments:
  TEST_PATH  Path to CoNLL-formatted test dataset  [required]
  MODEL      LLM to prompt as a HuggingFace ID 
              e.g. 'Qwen/Qwen3-32B' [required]

Options:
  --abstraction-path PATH   Path to optimized Abstraction program  [required]
  --act-path PATH           Path to optimized Act program  [required]
  --class-path PATH         Path to optimized Class program  [required]
  --document-path PATH      Path to optimized Document program  [required]
  --organization-path PATH  Path to optimized Organization program
                            [required]
  --person-path PATH        Path to optimized Person program  [required]
  --output-path PATH        Output path for serialized predictions
                            [required]
  --window INTEGER          The left-right context window to provide the LLM
                            for each mention  [default: 50]
  --base-url TEXT           URL endpoint for an OpenAI-compatible LLM chat
                            server e.g. 'http://localhost:8000/v1'  
                            [default: http://localhost:8000/v1]
  --api-key TEXT            API key for OpenAI LLM endpoint. Defaults to
                            'LOCAL' for self-hosted models that do not require
                            authentication.  [default: LOCAL]
  --help                    Show this message and exit.

fine-tune

The fine-tune command simplifies fine-tuning a HuggingFace Transformer using wandb.

uv run gelato fine-tune --help
Usage: gelato fine-tune [OPTIONS] TRAIN_PATH TEST_PATH MODEL

  Fine-tune a HuggingFace Transformer using `wandb`

Arguments:
  TRAIN_PATH  Path to CoNLL-formatted train dataset  [required]
  TEST_PATH   Path to CoNLL-formatted test dataset  [required]
  MODEL       Model to fine-tune as a HuggingFace ID e.g. 'FacebookAI/xlm-
              roberta-base'. Assumes model is compatible with HuggingFace
              transformers.  [required]

Options:
  --output-dir PATH       output directory for wandb logs  [required]
  --wandb-project TEXT    Name of wandb project to track sweeps e.g. 'gelato'
                          [default: gelato]
  --sweeps INTEGER RANGE  Number of wandb sweeps to perform
                          [default: 1; 1<=x<=64]
  --help                  Show this message and exit.

train-model

Train the desired HuggingFace-compatible transformer model with the provided parameters

uv run gelato train-model --help
Usage: gelato train-model [OPTIONS] MODEL_ID

  Train the desired model with the provided parameters.

Arguments:
  MODEL_ID  The HuggingFace model id of the model to train 
            e.g.'google-bert/bert-base-cased'  [required]

Options:
  --train-path TEXT      The path to the training dataset e.g.
                         'data/train.conll'  [required]
  --dev-path TEXT        The path to the dev dataset e.g. 'data/dev.conll'
                         [required]
  --learning-rate FLOAT  Learning rate of the model e.g. '0.003'  [required]
  --batch-size INTEGER   Learning and eval batch size e.g. '16'  [required]
  --epochs INTEGER       Number of training epochs e.g. '42'  [required]
  --weight-decay FLOAT   Training weight decay e.g. '0.3'  [required]
  --warmup-ratio FLOAT   Training warmup ratio e.g. '0.1'  [required]
  --output-dir TEXT      output directory for wandb logs  [required]
  --help                 Show this message and exit.

score

Score a model on the datset at the provided path.

uv run gelato score --help
Usage: gelato score [OPTIONS] DATASET_PATH MODEL

  Score a model on the datset at the provided path

Arguments:
  DATASET_PATH  Path to CoNLL-formatted dataset to evaluate  [required]
  MODEL         Model to test as a HuggingFace ID e.g.
                'Wollaston/gelato-roberta-large'  [required]

Options:
  --help  Show this message and exit.

align

Align predictions Applies first label wins strategy for aggregation of text and labels. Useful as non-word-based tokenizers sometimes struggle to rebuild and aggregate certain words.

uv run gelato align --help
Usage: gelato align [OPTIONS] PREDICTIONS_PATH REFERENCE_PATH

  Align predictions with tokens if the tokenizer aggregation pipeline fails.
  Applies first label wins strategy for aggregation of text and labels. Useful
  as non-word-based tokenizers sometimes struggle to rebuild and aggregate
  certain words.

Arguments:
  PREDICTIONS_PATH  Path to CoNLL-formatted predictions to align  [required]
  REFERENCE_PATH    Path to CoNLL-formatted reference data to align tokens to
                    [required]

Options:
  --help  Show this message and exit.

confusion

Generate confusion matrices from CoNLL-formatted predictions and their reference counterpart

uv run gelato confusion --help
Usage: gelato confusion [OPTIONS] PREDICTIONS REFERENCES OUTPUT_PATH

  Generate confusion matrices from CoNLL-formatted predictions and their
  reference counterpart

Arguments:
  PREDICTIONS  Path to CoNLL-formatted predictions  [required]
  REFERENCES   Path to CoNLL-formatted references  [required]
  OUTPUT_PATH  Path to save generated confusion matrix  [required]

Options:
  --help  Show this message and exit.

Checkpoints

We released our gelato checkpoints on HuggingFace:

Data

All gelato data, including level one and two splits, as well as original annotation data, can be found in the data/ folder.

We have also uploaded our data to HuggingFace. The level one and level two datasets are organized as subsets on HuggingFace, and each subset has its train, dev, and test splits.

Optimizers

The final DSPy optimizers can be found in the optimizers/ folder.

Scores

The CoNLL-formatted files for our reported scores can be found in the scores/ folder.

Citing GELATO

If you use our work in your research, please give us a cite:

@misc{flynn2026gelatodatasetlegislativener,
      title={The GELATO Dataset for Legislative NER}, 
      author={Matthew Flynn and Timothy Obiso and Sam Newman},
      year={2026},
      eprint={2603.14130},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.14130}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gelato_ner-1.0.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gelato_ner-1.0.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file gelato_ner-1.0.0.tar.gz.

File metadata

  • Download URL: gelato_ner-1.0.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gelato_ner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7d2c26ae679a3ea95da7cf34cf1f3d59dbd776bc78af063dbc945bb2b7391929
MD5 947c1421721b592caba92c46002374bd
BLAKE2b-256 99b7f3f30f3ee6e8f33af1ad0f934a3edbc64439fe870801a0d62f561a1ea25e

See more details on using hashes here.

File details

Details for the file gelato_ner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gelato_ner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gelato_ner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 725adb06b559340bc55beabc9f8eaa9cd621fc009a69ab461eb0b50be046918e
MD5 4a60940eea29c73fd682384f16fa5d65
BLAKE2b-256 a047b786f84a95d76cc2a56ff7f88280e0fda6b0023d4baef1b5ad96ff22fb03

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page