Khmer Speech To Text Inference API using Wav2Vec2 with Pretrain Model

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Sdab

Khmer Automatic Speech Recognition (Whisper + Wav2Vec2)

Sdab is a lightweight helper around Hugging Face ASR models with a focus on Khmer language. It can load sequence-to-sequence Whisper checkpoints (default) or CTC-style Wav2Vec2 models, convert audio to the expected format, and return a transcription in a single call.

License: Apache-2.0
Default Whisper model: metythorn/whisper-large-v3
Optional turbo Whisper model: metythorn/whisper-large-v3-turbo
Example Wav2Vec2 model: metythorn/wav2vec2-xls-r-300m

Features

🔁 Automatically detects Whisper vs Wav2Vec2 when you pass a Hugging Face repo ID or local directory.
🎧 Handles loading, mono conversion, and resampling to 16 kHz with torchaudio.
⚙️ Lets you pick CPU/GPU device and numerical precision to match your hardware.
🧪 Includes a sample audio clip for quick testing.

Installation

It is highly recommended to work inside a virtual environment.

python -m pip install --upgrade pip
pip install torch torchaudio transformers soundfile
pip install sdab

GPU users should install the right torch/torchaudio binaries for their CUDA version as described on https://pytorch.org/get-started/locally/.

To install from source:

git clone https://github.com/MetythornPenn/sdab.git
cd sdab
pip install -e .

Quick Start

Download the bundled sample audio (Khmer speech, 16 kHz WAV):

wget -O audio.wav https://raw.githubusercontent.com/MetythornPenn/sdab/main/sample/audio.wav

Whisper (default)

from sdab import Sdab

sd = Sdab("audio.wav")  # defaults to metythorn/whisper-large-v3
print(sd.transcribe())

Explicit Whisper model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/whisper-large-v3",
    device="cuda:0",         # or "cpu"
)
print(sd.transcribe())

Need the faster turbo checkpoint? Provide it explicitly:

sd = Sdab("audio.wav", model_name="metythorn/whisper-large-v3-turbo")
print(sd.transcribe())

Wav2Vec2 / CTC model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/wav2vec2-xls-r-300m",
    model_type="wav2vec2",  # optional; Sdab will infer from the model name
)
print(sd.transcribe())

Important parameters

file_path: Path to your WAV/FLAC/etc. file.
model_name: Hugging Face repo ID or local directory with the pretrained model.
model_type: Force "whisper" or "wav2vec2" if autodetect is not correct.
device: "cpu" or any PyTorch device string (for example "cuda:0").
torch_dtype: Override the dtype (defaults to float32 on CPU and float16 on CUDA).

Tips

Whisper expects mono 16 kHz input; Sdab automatically downsamples and squeezes channels.
Models are downloaded from Hugging Face the first time you reference them. Keep an eye on cache size in ~/.cache/huggingface.
For long recordings consider chunking/streaming outside of Sdab to stay within GPU memory.
Results are returned from sd.transcribe() directly; the class no longer stores a separate sd.result.
Errors while loading a model are wrapped in a helpful RuntimeError with the model name.

References

Inspired by Bong Vitou Phy and the accompanying Techcast episode.
Khmer word segmentation libraries from SeangHay: khmercut and khmersegment.
Whisper: paper | Hugging Face models.
Wav2Vec2 paper and resources from Facebook AI Research: fairseq examples.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.1

Nov 22, 2025

1.0.0

Nov 22, 2025

0.1.2

May 30, 2024

0.1.1

May 30, 2024

0.1.0

May 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdab-1.0.1.tar.gz (9.0 kB view details)

Uploaded Nov 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sdab-1.0.1-py3-none-any.whl (9.3 kB view details)

Uploaded Nov 22, 2025 Python 3

File details

Details for the file sdab-1.0.1.tar.gz.

File metadata

Download URL: sdab-1.0.1.tar.gz
Upload date: Nov 22, 2025
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`614d892f47bfa2c291456f05b350ecd2e21e03eb30e2395eeb7278d4b8ac3dbb`
MD5	`fc682f14a5ef5d432427035dd774907e`
BLAKE2b-256	`afacfee69425b8f970cd0ba00d651450636b02f4153e440f35428e3c6da3b4ec`

See more details on using hashes here.

File details

Details for the file sdab-1.0.1-py3-none-any.whl.

File metadata

Download URL: sdab-1.0.1-py3-none-any.whl
Upload date: Nov 22, 2025
Size: 9.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`736cd1b44ff7410c75f8b42df038355183a217df31976dc7975b75e94a4ff96a`
MD5	`563a5783a0db8c5a8ae7ff77d8151a87`
BLAKE2b-256	`ce4173ff0fa901defacd4eaac61a915614f6b1408a05edf08a03a2bf4e045d19`

See more details on using hashes here.

sdab 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sdab

Khmer Automatic Speech Recognition (Whisper + Wav2Vec2)

Features

Installation

Quick Start

Whisper (default)

Explicit Whisper model

Wav2Vec2 / CTC model

Important parameters

Tips

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes