LLM-powered TalkBank CHAT annotator for speaker-targeted morphosyntactic transcript correction
Project description
talk-tag
talk-tag is an adapter-only TalkBank CHAT morphosyntactic error annotator for
.cha and .jsonl inputs.
Runtime model contract
The deployment path is fixed:
- Base model:
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Adapter:
mash-mash/talkbank-morphosyntax-annotator-final-recon_full_comp_preserve_final_seed3407
No merged-model runtime path is used. The package bundles CHAT token augmentation entries and injects them into the tokenizer before adapter loading so tokenizer and checkpoint vocabulary stay aligned.
Install
Python >=3.10 is required.
pip install "talk-tag[runtime]"
Quickstart
Set Hugging Face credentials (required for the fixed base + adapter repositories):
export HF_TOKEN=...
On PowerShell:
$env:HF_TOKEN = "..."
Run preflight checks:
talk-tag doctor
Warm model assets:
talk-tag model pull --device auto
Annotate a folder:
talk-tag annotate \
--input-dir ./input \
--output-dir ./output \
--target-speaker "*CHI" \
--device auto
Annotate one file:
talk-tag annotate \
--input-path ./input/sample.cha \
--output-dir ./output \
--target-speaker "*CHI" \
--device auto
CLI commands
talk-tag annotate: annotate.chaor.jsonldata.talk-tag doctor: run runtime, dependency, and model-access checks.talk-tag model pull: pre-download model assets and optionally verify load.
.jsonl inputs require --speaker-field and --text-field.
Inference defaults
batch_size = 4max_new_tokens = 128max_seq_length = 512max_context_chars = 1200limit = 0- Greedy decoding (
do_sample = false)
Documentation and support
- Documentation: https://oliverhennhoefer.github.io/talk-tag/
- Changelog: CHANGELOG.md
- Security policy: SECURITY.md
- Contributing guide: CONTRIBUTING.md
Notebook example
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file talk_tag-0.4.0.tar.gz.
File metadata
- Download URL: talk_tag-0.4.0.tar.gz
- Upload date:
- Size: 31.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20fa8ee7b431c3a80f2dd87bfa97c9caf82ce42c66c4ec8bc5a623d9e995342f
|
|
| MD5 |
f77f28f9fa4e5dd231f0a121f1e2f763
|
|
| BLAKE2b-256 |
268f945f1e566f8ae55966d28d150f4210708d2e76bdbac61ef99839a557e1d2
|
File details
Details for the file talk_tag-0.4.0-py3-none-any.whl.
File metadata
- Download URL: talk_tag-0.4.0-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dcb06db7cd3144483d2c02b5df58ef039f3cff795a346edcf7dba9cda3b725b5
|
|
| MD5 |
a008de76956ebb8cfcad606e2f7124f0
|
|
| BLAKE2b-256 |
0ebe68393d528c2ba9f4064f518954a623735bd77632f893f57cde601bfc65c6
|