Transcript annotator for speaker-scoped CHAT corpus correction
Project description
talk-tag
Adapter-only TalkBank CHAT morphosyntactic error annotator for .cha and .jsonl.
The runtime deployment path is fixed to:
- Base model:
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Adapter:
mash-mash/talkbank-morphosyntax-annotator-final-recon_full_comp_preserve_final_seed3407
No merged-model runtime path is used.
The package bundles the deployed CHAT token augmentation list and injects those tokens into the tokenizer before loading the PEFT adapter. This step is required to keep the tokenizer/model vocabulary aligned with the adapter checkpoint.
Install
Python requirement: >=3.10.
pip install "talk-tag[runtime]"
Runtime extras include torch, transformers, and peft.
Hugging Face access
You need Hub access to both repositories above. Set a token before first run:
export HF_TOKEN=...
If token or access is missing, talk-tag doctor/talk-tag model pull will report
auth or gated-repo errors.
First-run workflow
- Check environment:
talk-tag doctor
- Pull/warm model assets:
talk-tag model pull --device auto
- Run annotation:
talk-tag annotate \
--input-dir ./input \
--output-dir ./output \
--target-speaker "*CHI" \
--device auto
Single-file .cha example:
talk-tag annotate \
--input-path ./input/sample.cha \
--output-dir ./output \
--target-speaker "*CHI" \
--device auto
Inference defaults
batch_size = 4max_new_tokens = 128max_seq_length = 512max_context_chars = 1200limit = 0- greedy decoding (
do_sample = false)
Supported runtime inputs
.cha.jsonl(requires--speaker-fieldand--text-field)
The annotate command accepts either:
--input-dirfor folder annotation--input-pathfor a single.chaor.jsonlfile
Other previously supported formats (.txt, .csv, .json, .xlsx) are rejected in adapter-only deployment mode.
Colab quickstart
See examples/colab_quickstart.ipynb for a minimal setup flow.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file talk_tag-0.3.0.tar.gz.
File metadata
- Download URL: talk_tag-0.3.0.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bcae51e158ecf0a75e6ba226b46238babf72483fa426dadb12950b647f5f810
|
|
| MD5 |
8ee95b11decbfc5d51fd425f03e9334d
|
|
| BLAKE2b-256 |
62be512b5583a5aeb4eb2a63fb4909b0404afc477651d335e5c1e5392748226e
|
File details
Details for the file talk_tag-0.3.0-py3-none-any.whl.
File metadata
- Download URL: talk_tag-0.3.0-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b805cc9168148840a3b7518d01034ffd5a679e08c484d191b2e5585cae7b2e1
|
|
| MD5 |
6938b950a370ac3bbe9c6094291b982e
|
|
| BLAKE2b-256 |
c066800db03aed908a536631098ad9d7a872a390f95e56d1f38b41dbcad23364
|