Skip to main content

Lexicon and Template Collection Construction Pipeline for Acceptability and Inference Judgment Data

Project description

bead

CI Python 3.13 License: MIT Documentation

A Python framework for constructing, deploying, and analyzing large-scale linguistic judgment experiments with active learning.

Overview

bead implements a complete pipeline for linguistic research: from lexical resource construction through experimental deployment to model training with active learning. It handles the combinatorial explosion of linguistic stimuli while maintaining full provenance tracking.

The name refers to the way sealant is applied while glazing a window, a play on the glazing package for accessing VerbNet, PropBank, and FrameNet.

Installation

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install bead
uv pip install bead

# With optional dependencies
uv pip install bead[api]       # OpenAI, Anthropic, Google APIs
uv pip install bead[training]  # PyTorch Lightning, TensorBoard

Development

git clone https://github.com/FACTSlab/bead.git
cd bead
uv sync --all-extras
uv run pytest tests/

Always use uv run to execute commands.

Quick Start

from bead.resources import LexicalItem, Template, Lexicon
from bead.templates import TemplateFiller
from bead.items import ItemConstructor
from bead.lists import ListPartitioner

# 1. Define resources
verbs = Lexicon(items=[
    LexicalItem(lemma="walk", pos="VERB", features={"transitive": False}),
    LexicalItem(lemma="eat", pos="VERB", features={"transitive": True}),
])

template = Template(
    text="The person {verb} the thing",
    slots=["verb"],
    language_code="en"
)

# 2. Fill templates
filler = TemplateFiller(strategy="exhaustive")
filled = filler.fill(templates=[template], lexicons={"verbs": verbs})

# 3. Construct items
constructor = ItemConstructor(models=["gpt2"])
items = constructor.construct_forced_choice_items(filled, n_alternatives=2)

# 4. Partition into lists
partitioner = ListPartitioner()
lists = partitioner.partition(items.get_uuids(), n_lists=4)

# 5. Deploy
lists.save("lists/experiment.jsonl")

Pipeline Stages

Stage Purpose Output
Resources Define lexical items and templates lexicons/*.jsonl, templates/*.jsonl
Templates Fill templates with lexical items filled_templates/*.jsonl
Items Construct experimental items items/*.jsonl
Lists Partition into balanced lists lists/*.jsonl
Deployment Generate jsPsych experiments deployment/*.jzip
Training Active learning until convergence Model checkpoints

Key Features

  • Stand-off annotation: UUID-based references for full provenance tracking
  • 8 task types: forced-choice, ordinal scale, binary, categorical, multi-select, magnitude, free text, cloze
  • Constraint satisfaction: batch and list-level constraints for balanced designs
  • Model integration: HuggingFace, OpenAI, Anthropic with caching
  • Active learning: uncertainty sampling with convergence detection
  • Annotation protocols: type-theoretic stack of SemanticAnchor (the question type), ProtocolContext (the dependent index), RealizationStrategy (template / contextual / LM phrasings), and DriftGuard (the type-checker over realized prompts), composed into conditional AnnotationProtocols
  • jsPsych 8.x: Material Design UI with JATOS deployment

CLI

bead init my-experiment     # Create project structure
bead templates fill         # Fill templates
bead items construct        # Construct items
bead lists partition        # Create experiment lists
bead deploy                 # Generate jsPsych experiment
bead training run           # Train with active learning

Documentation

Full documentation: bead.readthedocs.io

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Citation

@software{white2026bead,
  author = {White, Aaron Steven},
  title = {bead: A framework for large-scale linguistic judgment experiments},
  year = {2026},
  url = {https://github.com/FACTSlab/bead},
}

License

MIT License. See LICENSE for details.

Acknowledgments

This project was developed by Aaron Steven White at the University of Rochester with support from the National Science Foundation (NSF-BCS-2237175 CAREER: Logical Form Induction, NSF-BCS-2040831 Computational Modeling of the Internal Structure of Events). It was architected and implemented with the assistance of Claude Code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bead-0.4.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bead-0.4.0-py3-none-any.whl (697.8 kB view details)

Uploaded Python 3

File details

Details for the file bead-0.4.0.tar.gz.

File metadata

  • Download URL: bead-0.4.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bead-0.4.0.tar.gz
Algorithm Hash digest
SHA256 eef579dccc8cf0298a000663a96d23a3647680b43a15d493f2163a51d1353357
MD5 e869ac8faad4343c5db8340c2dfdff82
BLAKE2b-256 37afb311579ac0dc5d1bbe6733000430453ff3cc6f23d234c65056cd698eb930

See more details on using hashes here.

Provenance

The following attestation bundles were made for bead-0.4.0.tar.gz:

Publisher: publish.yml on FACTSlab/bead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bead-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: bead-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 697.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bead-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12fd439c6671029398b6cd3cfd484fa6376e7bb28c069a99a5ae7cc1f14c6388
MD5 691c89d4a313097861e507ce5448c202
BLAKE2b-256 f3ab29d7d64a7e480ff12d3b0f9ebec1d1cfc33d49882e14365b1efbfa860ce1

See more details on using hashes here.

Provenance

The following attestation bundles were made for bead-0.4.0-py3-none-any.whl:

Publisher: publish.yml on FACTSlab/bead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page