Skip to main content

LLM-powered TalkBank CHAT annotator for speaker-targeted morphosyntactic transcript correction

Project description

talk-tag

PyPI Python License CI Docs

talk-tag is an adapter-only TalkBank CHAT morphosyntactic error annotator for .cha and .jsonl inputs.

Runtime model contract

The deployment path is fixed:

  1. Base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
  2. Adapter: mash-mash/talkbank-morphosyntax-annotator-final-recon_full_comp_preserve_final_seed3407

No merged-model runtime path is used. The package bundles CHAT token augmentation entries and injects them into the tokenizer before adapter loading so tokenizer and checkpoint vocabulary stay aligned.

Install

Python >=3.10 is required.

pip install "talk-tag[runtime]"

Quickstart

Set Hugging Face credentials (required for the fixed base + adapter repositories):

export HF_TOKEN=...

On PowerShell:

$env:HF_TOKEN = "..."

Run preflight checks:

talk-tag doctor

Warm model assets:

talk-tag model pull --device auto

Annotate a folder:

talk-tag annotate \
  --input-dir ./input \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --device auto

Annotate one file:

talk-tag annotate \
  --input-path ./input/sample.cha \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --device auto

CLI commands

  • talk-tag annotate: annotate .cha or .jsonl data.
  • talk-tag doctor: run runtime, dependency, and model-access checks.
  • talk-tag model pull: pre-download model assets and optionally verify load.

.jsonl inputs require --speaker-field and --text-field.

Inference defaults

  • batch_size = 4
  • max_new_tokens = 128
  • max_seq_length = 512
  • max_context_chars = 1200
  • limit = 0
  • Greedy decoding (do_sample = false)

Documentation and support

Notebook example

See examples/colab_quickstart.ipynb.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talk_tag-0.4.0.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

talk_tag-0.4.0-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file talk_tag-0.4.0.tar.gz.

File metadata

  • Download URL: talk_tag-0.4.0.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for talk_tag-0.4.0.tar.gz
Algorithm Hash digest
SHA256 20fa8ee7b431c3a80f2dd87bfa97c9caf82ce42c66c4ec8bc5a623d9e995342f
MD5 f77f28f9fa4e5dd231f0a121f1e2f763
BLAKE2b-256 268f945f1e566f8ae55966d28d150f4210708d2e76bdbac61ef99839a557e1d2

See more details on using hashes here.

File details

Details for the file talk_tag-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: talk_tag-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for talk_tag-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcb06db7cd3144483d2c02b5df58ef039f3cff795a346edcf7dba9cda3b725b5
MD5 a008de76956ebb8cfcad606e2f7124f0
BLAKE2b-256 0ebe68393d528c2ba9f4064f518954a623735bd77632f893f57cde601bfc65c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page