Skip to main content

Thai filler word classifier for voice bots - picks the right acknowledgment phrase while LLM thinks

Project description

filler-classifier

Thai filler word classifier for voice bots. Classifies customer input into categories and returns the appropriate filler phrase to play instantly while the LLM generates a full response.

Built for ingfah.ai voice bot but easily adaptable to any Thai voice AI system.

Why

Voice bots have a latency problem: the user speaks, ASR transcribes, then the LLM takes 1-3 seconds to respond. Dead silence feels broken. The solution is to play a short filler phrase ("สักครู่นะคะ", "ขออภัยด้วยน่ะคะ") immediately while the LLM thinks.

But you can't play the same filler for everything. If someone is angry, "ได้เลยค่ะ" sounds dismissive. If someone asks a question, "ขออภัยด้วยน่ะคะ" makes no sense.

This classifier picks the right filler by category.

Categories

Category When Example Fillers
complaint Angry, frustrated, profanity, threats ขออภัยด้วยน่ะคะ
question Asking for info, pricing, how-to สักครู่นะคะ, ตรวจสอบให้นะคะ
default Greetings, agreements, requests, everything else รับทราบค่ะ, ได้เลยค่ะ

Default Filler Phrases

Category Fillers
complaint ขออภัยด้วยน่ะคะ
question สักครู่นะคะ, สักครู่ค่ะ, ตรวจสอบให้นะคะ
default รับทราบค่ะ, ค่ะ ได้ค่ะ, ได้เลยค่ะ, ดีเลยค่ะ, เข้าใจค่ะ

A random filler is picked from the matching category each time. These are designed to be short (~0.3-0.5s when synthesized) for minimal latency.

How It Works

Uses intfloat/multilingual-e5-small embeddings with centroid-based cosine similarity:

  1. Each category has ~30-60 anchor phrases (real Thai customer service examples)
  2. On init, all anchors are embedded and averaged into category centroids
  3. At inference, the input is embedded and compared to centroids via cosine similarity
  4. The closest category wins, and a random filler from that category is returned

Performance

  • Accuracy: 89.6% on 1,000 Thai customer service sentences
  • Inference: <10ms per classification (after model load)
  • Init: ~200ms for centroid computation
  • Model size: ~118MB (multilingual-e5-small)

Installation

pip install filler-classifier

Usage

from filler_classifier import FillerClassifier

# loads model automatically on first init
clf = FillerClassifier()

# classify and get category + confidence + filler
category, confidence, filler = clf.classify("อยากถามเรื่องบิลครับ")
# ("question", 0.872, "สักครู่นะคะ")

category, confidence, filler = clf.classify("ใช้งานไม่ได้เลย")
# ("complaint", 0.891, "ขออภัยด้วยน่ะคะ")

category, confidence, filler = clf.classify("ได้ครับ ตกลง")
# ("default", 0.845, "ได้เลยค่ะ")

# or just get the filler phrase directly
filler = clf.get_filler("มีโปรอะไรบ้างครับ")
# "ตรวจสอบให้นะคะ"

Sharing the model

If you already have a SentenceTransformer instance loaded (e.g., for other tasks), pass it in to avoid loading twice:

from sentence_transformers import SentenceTransformer
from filler_classifier import FillerClassifier

model = SentenceTransformer("intfloat/multilingual-e5-small")
clf = FillerClassifier(model=model)

Customizing Fillers

Override CATEGORY_FILLERS to use your own phrases:

import filler_classifier

filler_classifier.CATEGORY_FILLERS["complaint"] = ["ขออภัยค่ะ", "เข้าใจค่ะ"]
filler_classifier.CATEGORY_FILLERS["question"] = ["รอสักครู่นะคะ"]

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filler_classifier-0.2.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filler_classifier-0.2.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file filler_classifier-0.2.0.tar.gz.

File metadata

  • Download URL: filler_classifier-0.2.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for filler_classifier-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bc8d79d3c788be9c41778698703e56fb1514b81004e0b7c6696edc102ddc4d83
MD5 ef19d0ef5d50b19cd5074f48d3d842aa
BLAKE2b-256 83ee79e94a8cd0c2cf9788dc850df56dc1fe5487a291f141a494e91952c6e830

See more details on using hashes here.

File details

Details for the file filler_classifier-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for filler_classifier-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e5da2b2610e90afed44512dcbcc66168b6402bdc29955def29dd418e4a5be24
MD5 3a1d850599c6cedce3d4ce44514254a4
BLAKE2b-256 8efe677fccce4e7ffc7b9757a36a75a68460db25de2acd1f72075fbcc7e5e0ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page