Skip to main content

Khmer Speech To Text Inference API using Wav2Vec2 with Pretrain Model

Project description

Sdab

Khmer Automatic Speech Recognition (Whisper + Wav2Vec2)

Sdab is a lightweight helper around Hugging Face ASR models with a focus on Khmer language. It can load sequence-to-sequence Whisper checkpoints (default) or CTC-style Wav2Vec2 models, convert audio to the expected format, and return a transcription in a single call.

Features

  • 🔁 Automatically detects Whisper vs Wav2Vec2 when you pass a Hugging Face repo ID or local directory.
  • 🎧 Handles loading, mono conversion, and resampling to 16 kHz with torchaudio.
  • ⚙️ Lets you pick CPU/GPU device and numerical precision to match your hardware.
  • 🧪 Includes a sample audio clip for quick testing.

Installation

It is highly recommended to work inside a virtual environment.

python -m pip install --upgrade pip
pip install torch torchaudio transformers soundfile
pip install sdab

GPU users should install the right torch/torchaudio binaries for their CUDA version as described on https://pytorch.org/get-started/locally/.

To install from source:

git clone https://github.com/MetythornPenn/sdab.git
cd sdab
pip install -e .

Quick Start

Download the bundled sample audio (Khmer speech, 16 kHz WAV):

wget -O audio.wav https://raw.githubusercontent.com/MetythornPenn/sdab/main/sample/audio.wav

Whisper (default)

from sdab import Sdab

sd = Sdab("audio.wav")  # defaults to metythorn/whisper-large-v3
print(sd.transcribe())

Explicit Whisper model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/whisper-large-v3",
    device="cuda:0",         # or "cpu"
)
print(sd.transcribe())

Need the faster turbo checkpoint? Provide it explicitly:

sd = Sdab("audio.wav", model_name="metythorn/whisper-large-v3-turbo")
print(sd.transcribe())

Wav2Vec2 / CTC model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/wav2vec2-xls-r-300m",
    model_type="wav2vec2",  # optional; Sdab will infer from the model name
)
print(sd.transcribe())

Important parameters

  • file_path: Path to your WAV/FLAC/etc. file.
  • model_name: Hugging Face repo ID or local directory with the pretrained model.
  • model_type: Force "whisper" or "wav2vec2" if autodetect is not correct.
  • device: "cpu" or any PyTorch device string (for example "cuda:0").
  • torch_dtype: Override the dtype (defaults to float32 on CPU and float16 on CUDA).

Tips

  • Whisper expects mono 16 kHz input; Sdab automatically downsamples and squeezes channels.
  • Models are downloaded from Hugging Face the first time you reference them. Keep an eye on cache size in ~/.cache/huggingface.
  • For long recordings consider chunking/streaming outside of Sdab to stay within GPU memory.
  • Results are returned from sd.transcribe() directly; the class no longer stores a separate sd.result.
  • Errors while loading a model are wrapped in a helpful RuntimeError with the model name.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdab-1.0.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdab-1.0.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file sdab-1.0.1.tar.gz.

File metadata

  • Download URL: sdab-1.0.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.1.tar.gz
Algorithm Hash digest
SHA256 614d892f47bfa2c291456f05b350ecd2e21e03eb30e2395eeb7278d4b8ac3dbb
MD5 fc682f14a5ef5d432427035dd774907e
BLAKE2b-256 afacfee69425b8f970cd0ba00d651450636b02f4153e440f35428e3c6da3b4ec

See more details on using hashes here.

File details

Details for the file sdab-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: sdab-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 736cd1b44ff7410c75f8b42df038355183a217df31976dc7975b75e94a4ff96a
MD5 563a5783a0db8c5a8ae7ff77d8151a87
BLAKE2b-256 ce4173ff0fa901defacd4eaac61a915614f6b1408a05edf08a03a2bf4e045d19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page