Skip to main content

Transcript annotator for speaker-scoped CHAT corpus correction

Project description

talk-tag

Adapter-only TalkBank CHAT morphosyntactic error annotator for .cha and .jsonl.

The runtime deployment path is fixed to:

  1. Base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
  2. Adapter: mash-mash/talkbank-morphosyntax-annotator-final-recon_full_comp_preserve_final_seed3407

No merged-model runtime path is used.

The package bundles the deployed CHAT token augmentation list and injects those tokens into the tokenizer before loading the PEFT adapter. This step is required to keep the tokenizer/model vocabulary aligned with the adapter checkpoint.

Install

Python requirement: >=3.10.

pip install "talk-tag[runtime]"

Runtime extras include torch, transformers, and peft.

Hugging Face access

You need Hub access to both repositories above. Set a token before first run:

export HF_TOKEN=...

If token or access is missing, talk-tag doctor/talk-tag model pull will report auth or gated-repo errors.

First-run workflow

  1. Check environment:
talk-tag doctor
  1. Pull/warm model assets:
talk-tag model pull --device auto
  1. Run annotation:
talk-tag annotate \
  --input-dir ./input \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --device auto

Single-file .cha example:

talk-tag annotate \
  --input-path ./input/sample.cha \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --device auto

Inference defaults

  • batch_size = 4
  • max_new_tokens = 128
  • max_seq_length = 512
  • max_context_chars = 1200
  • limit = 0
  • greedy decoding (do_sample = false)

Supported runtime inputs

  • .cha
  • .jsonl (requires --speaker-field and --text-field)

The annotate command accepts either:

  • --input-dir for folder annotation
  • --input-path for a single .cha or .jsonl file

Other previously supported formats (.txt, .csv, .json, .xlsx) are rejected in adapter-only deployment mode.

Colab quickstart

See examples/colab_quickstart.ipynb for a minimal setup flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talk_tag-0.3.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

talk_tag-0.3.0-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file talk_tag-0.3.0.tar.gz.

File metadata

  • Download URL: talk_tag-0.3.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for talk_tag-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6bcae51e158ecf0a75e6ba226b46238babf72483fa426dadb12950b647f5f810
MD5 8ee95b11decbfc5d51fd425f03e9334d
BLAKE2b-256 62be512b5583a5aeb4eb2a63fb4909b0404afc477651d335e5c1e5392748226e

See more details on using hashes here.

File details

Details for the file talk_tag-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: talk_tag-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 30.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for talk_tag-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b805cc9168148840a3b7518d01034ffd5a679e08c484d191b2e5585cae7b2e1
MD5 6938b950a370ac3bbe9c6094291b982e
BLAKE2b-256 c066800db03aed908a536631098ad9d7a872a390f95e56d1f38b41dbcad23364

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page