Khmer Speech To Text Inference API using Wav2Vec2 with Pretrain Model
Project description
Sdab
Khmer Automatic Speech Recognition (Whisper + Wav2Vec2)
Sdab is a lightweight helper around Hugging Face ASR models with a focus on Khmer language. It can load sequence-to-sequence Whisper checkpoints (default) or CTC-style Wav2Vec2 models, convert audio to the expected format, and return a transcription in a single call.
- License: Apache-2.0
- Default Whisper model: metythorn/whisper-large-v3
- Optional turbo Whisper model: metythorn/whisper-large-v3-turbo
- Example Wav2Vec2 model: metythorn/wav2vec2-xls-r-300m
Features
- 🔁 Automatically detects Whisper vs Wav2Vec2 when you pass a Hugging Face repo ID or local directory.
- 🎧 Handles loading, mono conversion, and resampling to 16 kHz with
torchaudio. - ⚙️ Lets you pick CPU/GPU device and numerical precision to match your hardware.
- 🧪 Includes a sample audio clip for quick testing.
Installation
It is highly recommended to work inside a virtual environment.
python -m pip install --upgrade pip
pip install torch torchaudio transformers soundfile
pip install sdab
GPU users should install the right torch/torchaudio binaries for their CUDA version as described on https://pytorch.org/get-started/locally/.
To install from source:
git clone https://github.com/MetythornPenn/sdab.git
cd sdab
pip install -e .
Quick Start
Download the bundled sample audio (Khmer speech, 16 kHz WAV):
wget -O audio.wav https://raw.githubusercontent.com/MetythornPenn/sdab/main/sample/audio.wav
Whisper (default)
from sdab import Sdab
sd = Sdab("audio.wav") # defaults to metythorn/whisper-large-v3
print(sd.transcribe())
Explicit Whisper model
from sdab import Sdab
sd = Sdab(
"audio.wav",
model_name="metythorn/whisper-large-v3",
device="cuda:0", # or "cpu"
)
print(sd.transcribe())
Need the faster turbo checkpoint? Provide it explicitly:
sd = Sdab("audio.wav", model_name="metythorn/whisper-large-v3-turbo")
print(sd.transcribe())
Wav2Vec2 / CTC model
from sdab import Sdab
sd = Sdab(
"audio.wav",
model_name="metythorn/wav2vec2-xls-r-300m",
model_type="wav2vec2", # optional; Sdab will infer from the model name
)
print(sd.transcribe())
Important parameters
file_path: Path to your WAV/FLAC/etc. file.model_name: Hugging Face repo ID or local directory with the pretrained model.model_type: Force"whisper"or"wav2vec2"if autodetect is not correct.device:"cpu"or any PyTorch device string (for example"cuda:0").torch_dtype: Override the dtype (defaults tofloat32on CPU andfloat16on CUDA).
Tips
- Whisper expects mono 16 kHz input; Sdab automatically downsamples and squeezes channels.
- Models are downloaded from Hugging Face the first time you reference them. Keep an eye on cache size in
~/.cache/huggingface. - For long recordings consider chunking/streaming outside of Sdab to stay within GPU memory.
- Results are returned from
sd.transcribe()directly; the class no longer stores a separatesd.result. - Errors while loading a model are wrapped in a helpful
RuntimeErrorwith the model name.
References
- Inspired by Bong Vitou Phy and the accompanying Techcast episode.
- Khmer word segmentation libraries from SeangHay: khmercut and khmersegment.
- Whisper: paper | Hugging Face models.
- Wav2Vec2 paper and resources from Facebook AI Research: fairseq examples.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdab-1.0.1.tar.gz.
File metadata
- Download URL: sdab-1.0.1.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
614d892f47bfa2c291456f05b350ecd2e21e03eb30e2395eeb7278d4b8ac3dbb
|
|
| MD5 |
fc682f14a5ef5d432427035dd774907e
|
|
| BLAKE2b-256 |
afacfee69425b8f970cd0ba00d651450636b02f4153e440f35428e3c6da3b4ec
|
File details
Details for the file sdab-1.0.1-py3-none-any.whl.
File metadata
- Download URL: sdab-1.0.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
736cd1b44ff7410c75f8b42df038355183a217df31976dc7975b75e94a4ff96a
|
|
| MD5 |
563a5783a0db8c5a8ae7ff77d8151a87
|
|
| BLAKE2b-256 |
ce4173ff0fa901defacd4eaac61a915614f6b1408a05edf08a03a2bf4e045d19
|