backchannel classifier - detect backchannels vs real responses in thai and japanese asr output
Project description
backchannel classifier
detects backchannel responses vs real user input for voice ai systems. supports thai and japanese (aizuchi).
install
pip install backchannel-classifier
usage
from backchannel_classifier import is_backchannel
# thai (default)
is_backchannel("ครับ") # (True, 0.91)
is_backchannel("ไม่ครับ") # (False, 0.01)
is_backchannel("ใช่ แต่ว่า") # (False, 0.01)
# japanese
is_backchannel("はい", lang="ja") # (True, 0.99)
is_backchannel("そうですね", lang="ja") # (True, 0.99)
is_backchannel("予約したいです", lang="ja") # (False, 0.0001)
# direct import
from backchannel_classifier.jp import is_backchannel_ja
is_backchannel_ja("なるほど") # (True, 0.99)
returns (is_backchannel: bool, confidence: float).
why
voice bots using asr → llm → tts pipelines need to distinguish between backchannels (acknowledgment sounds that should be ignored) and real responses that need processing. simple exact matching fails on asr variants and misses edge cases.
approach
gradient boosting classifier with handcrafted language-specific features. key idea: strip known backchannel components from the text, measure what's left (remaining_ratio). if nothing remains, it's a backchannel.
thai (26 features)
| feature | importance |
|---|---|
| remaining_ratio | 0.9098 |
| has_request | 0.0406 |
| has_negation | 0.0274 |
| particle_ratio | 0.0108 |
- polite particle detection (ครับ/ค่ะ/จ้ะ variants)
- backchannel sound patterns (อืม/อ๋อ/เออ with tone variants)
- question/negation/request/continuation markers
- handles asr misspellings (ค่า→ค่ะ, คับ→ครับ, อื้ม→อืม)
japanese (27 features)
| feature | importance |
|---|---|
| remaining_ratio | 0.7765 |
| remaining_len | 0.0484 |
| katakana | 0.0347 |
| word_count | 0.0325 |
| kanji_ratio | 0.0206 |
- core aizuchi (はい/ええ/うん/そう)
- agreement, understanding, surprise, filler, reaction markers
- question/continuation/request/negation/verb negative indicators
- handles asr elongation variants (はーーい, えーーー)
results
thai
- 99.49% f1 (5-fold cv)
- test suite: 94/94 (100%)
japanese
- 98.37% f1 (5-fold cv)
- test suite: 119/119 (100%)
test coverage
thai (94 cases)
backchannels (49): ครับ, ค่ะ, อืม, ใช่, อ๋อ, เหรอ, ฮัลโหล, asr variants... real responses (45): สวัสดีครับ, ไม่ครับ, ราคาเท่าไหร่ครับ, edge cases (ใช่ แต่ว่า, ครับ แล้วก็)...
japanese (119 cases)
aizuchi (63): はい, うん, そうですね, なるほど, へー, まじで, えーと, すごい, 承知しました, compounds... real responses (56): ありがとうございます, いくらですか, 予約したいです, edge cases (はい、質問があります, そうですね、でも...)...
testing
python3 -m pytest tests/ -v
files
backchannel_classifier/__init__.py- thai classifier + unified apibackchannel_classifier/jp.py- japanese classifiertrain.py- thai training scripttrain_ja.py- japanese training scripttests/test_classifier.py- thai test suite (94 cases)tests/test_classifier_ja.py- japanese test suite (119 cases)
requirements
- python 3.8+
- scikit-learn
- numpy
memory
~3.7 MB per language model, lazy-loaded. if you only use thai, japanese model is never loaded (zero overhead).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file backchannel_classifier-0.4.0.tar.gz.
File metadata
- Download URL: backchannel_classifier-0.4.0.tar.gz
- Upload date:
- Size: 71.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
367c999dfc1c0985f34fd5d476e0cf9e4fc118c74a4e4c2f47ebf6f84918984f
|
|
| MD5 |
0f226feb2898eabe59e8d116717bef34
|
|
| BLAKE2b-256 |
15c4327506ba1769dbed48a4c182404e7ad763ca886f42cf3a922a886912caf0
|
File details
Details for the file backchannel_classifier-0.4.0-py3-none-any.whl.
File metadata
- Download URL: backchannel_classifier-0.4.0-py3-none-any.whl
- Upload date:
- Size: 65.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f16a49e3951fd9a41e57e222c4fe8c30e2f003db9c7ffaa1fc824e947f45aabb
|
|
| MD5 |
81f1f20654b596a429aeb178d0bab7b7
|
|
| BLAKE2b-256 |
4c5cfa2e366f2ffaad64cea4d8109eed8f142b5dafcbf829e09c1e423da6b23e
|