Lexicon and Template Collection Construction Pipeline for Acceptability and Inference Judgment Data
Project description
bead
A Python framework for constructing, deploying, and analyzing large-scale linguistic judgment experiments with active learning.
Overview
bead implements a complete pipeline for linguistic research: from lexical resource construction through experimental deployment to model training with active learning. It handles the combinatorial explosion of linguistic stimuli while maintaining full provenance tracking.
The name refers to the way sealant is applied while glazing a window, a play on the glazing package for accessing VerbNet, PropBank, and FrameNet.
Installation
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install bead
uv pip install bead
# With optional dependencies
uv pip install bead[api] # OpenAI, Anthropic, Google APIs
uv pip install bead[training] # PyTorch Lightning, TensorBoard
Development
git clone https://github.com/FACTSlab/bead.git
cd bead
uv sync --all-extras
uv run pytest tests/
Always use uv run to execute commands.
Quick Start
from bead.resources import LexicalItem, Template, Lexicon
from bead.templates import TemplateFiller
from bead.items import ItemConstructor
from bead.lists import ListPartitioner
# 1. Define resources
verbs = Lexicon(items=[
LexicalItem(lemma="walk", pos="VERB", features={"transitive": False}),
LexicalItem(lemma="eat", pos="VERB", features={"transitive": True}),
])
template = Template(
text="The person {verb} the thing",
slots=["verb"],
language_code="en"
)
# 2. Fill templates
filler = TemplateFiller(strategy="exhaustive")
filled = filler.fill(templates=[template], lexicons={"verbs": verbs})
# 3. Construct items
constructor = ItemConstructor(models=["gpt2"])
items = constructor.construct_forced_choice_items(filled, n_alternatives=2)
# 4. Partition into lists
partitioner = ListPartitioner()
lists = partitioner.partition(items.get_uuids(), n_lists=4)
# 5. Deploy
lists.save("lists/experiment.jsonl")
Pipeline Stages
| Stage | Purpose | Output |
|---|---|---|
| Resources | Define lexical items and templates | lexicons/*.jsonl, templates/*.jsonl |
| Templates | Fill templates with lexical items | filled_templates/*.jsonl |
| Items | Construct experimental items | items/*.jsonl |
| Lists | Partition into balanced lists | lists/*.jsonl |
| Deployment | Generate jsPsych experiments | deployment/*.jzip |
| Training | Active learning until convergence | Model checkpoints |
Key Features
- Stand-off annotation: UUID-based references for full provenance tracking
- 8 task types: forced-choice, ordinal scale, binary, categorical, multi-select, magnitude, free text, cloze
- Constraint satisfaction: batch and list-level constraints for balanced designs
- Model integration: HuggingFace, OpenAI, Anthropic with caching
- Active learning: uncertainty sampling with convergence detection
- jsPsych 8.x: Material Design UI with JATOS deployment
CLI
bead init my-experiment # Create project structure
bead templates fill # Fill templates
bead items construct # Construct items
bead lists partition # Create experiment lists
bead deploy # Generate jsPsych experiment
bead training run # Train with active learning
Documentation
Full documentation: bead.readthedocs.io
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
Citation
@software{white2025bead,
author = {White, Aaron Steven},
title = {bead: A framework for linguistic judgment experiments},
year = {2025},
url = {https://github.com/FACTSlab/bead},
}
License
MIT License. See LICENSE for details.
Acknowledgments
This project was developed by Aaron Steven White at the University of Rochester with support from the National Science Foundation (NSF-BCS-2237175 CAREER: Logical Form Induction, NSF-BCS-2040831 Computational Modeling of the Internal Structure of Events). It was architected and implemented with the assistance of Claude Code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bead-0.2.0.tar.gz.
File metadata
- Download URL: bead-0.2.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40127c2c7c80245b4e19463e784aee4a6cd8736d153d11ff7270a2e0be3d78b4
|
|
| MD5 |
daae1f64b30578b53f88a15c7dd52f79
|
|
| BLAKE2b-256 |
f85a1cde9a337434af6cd125a8419270d76419f89a5ddd8f13d6ef85a15f4812
|
Provenance
The following attestation bundles were made for bead-0.2.0.tar.gz:
Publisher:
publish.yml on FACTSlab/bead
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bead-0.2.0.tar.gz -
Subject digest:
40127c2c7c80245b4e19463e784aee4a6cd8736d153d11ff7270a2e0be3d78b4 - Sigstore transparency entry: 938244882
- Sigstore integration time:
-
Permalink:
FACTSlab/bead@99774c2a70782494e53fa733492800cf5cc7b932 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FACTSlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@99774c2a70782494e53fa733492800cf5cc7b932 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bead-0.2.0-py3-none-any.whl.
File metadata
- Download URL: bead-0.2.0-py3-none-any.whl
- Upload date:
- Size: 680.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d19f58096e41525c793bbe54227c31c552648a093ce8c0a5ea2cd5a8a190167a
|
|
| MD5 |
4f8ac3a65b06272e1863df412bad3409
|
|
| BLAKE2b-256 |
406d8af83d173013a021d1face3b8e3f60c8b7cefb775ea7cf989b5ab63df2a7
|
Provenance
The following attestation bundles were made for bead-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on FACTSlab/bead
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bead-0.2.0-py3-none-any.whl -
Subject digest:
d19f58096e41525c793bbe54227c31c552648a093ce8c0a5ea2cd5a8a190167a - Sigstore transparency entry: 938244885
- Sigstore integration time:
-
Permalink:
FACTSlab/bead@99774c2a70782494e53fa733492800cf5cc7b932 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/FACTSlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@99774c2a70782494e53fa733492800cf5cc7b932 -
Trigger Event:
push
-
Statement type: