Skip to main content

Khmer Speech To Text Inference API using Wav2Vec2 with Pretrain Model

Project description

Sdab

Khmer Automatic Speech Recognition (Whisper + Wav2Vec2)

Sdab is a lightweight helper around Hugging Face ASR models with a focus on Khmer language. It can load sequence-to-sequence Whisper checkpoints (default) or CTC-style Wav2Vec2 models, convert audio to the expected format, and return a transcription in a single call.

Features

  • 🔁 Automatically detects Whisper vs Wav2Vec2 when you pass a Hugging Face repo ID or local directory.
  • 🎧 Handles loading, mono conversion, and resampling to 16 kHz with torchaudio.
  • ⚙️ Lets you pick CPU/GPU device and numerical precision to match your hardware.
  • 🧪 Includes a sample audio clip for quick testing.

Installation

It is highly recommended to work inside a virtual environment.

python -m pip install --upgrade pip
pip install torch torchaudio transformers soundfile
pip install sdab

GPU users should install the right torch/torchaudio binaries for their CUDA version as described on https://pytorch.org/get-started/locally/.

To install from source:

git clone https://github.com/MetythornPenn/sdab.git
cd sdab
pip install -e .

Quick Start

Download the bundled sample audio (Khmer speech, 16 kHz WAV):

wget -O audio.wav https://raw.githubusercontent.com/MetythornPenn/sdab/main/sample/audio.wav

Whisper (default)

from sdab import Sdab

sd = Sdab("audio.wav")  # defaults to metythorn/whisper-large-v3
print(sd.transcribe())

Explicit Whisper model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/whisper-large-v3",
    device="cuda:0",         # or "cpu"
)
print(sd.transcribe())

Need the faster turbo checkpoint? Provide it explicitly:

sd = Sdab("audio.wav", model_name="metythorn/whisper-large-v3-turbo")
print(sd.transcribe())

Wav2Vec2 / CTC model

from sdab import Sdab

sd = Sdab(
    "audio.wav",
    model_name="metythorn/wav2vec2-xls-r-300m",
    model_type="wav2vec2",  # optional; Sdab will infer from the model name
)
print(sd.transcribe())

Important parameters

  • file_path: Path to your WAV/FLAC/etc. file.
  • model_name: Hugging Face repo ID or local directory with the pretrained model.
  • model_type: Force "whisper" or "wav2vec2" if autodetect is not correct.
  • device: "cpu" or any PyTorch device string (for example "cuda:0").
  • torch_dtype: Override the dtype (defaults to float32 on CPU and float16 on CUDA).

Tips

  • Whisper expects mono 16 kHz input; Sdab automatically downsamples and squeezes channels.
  • Models are downloaded from Hugging Face the first time you reference them. Keep an eye on cache size in ~/.cache/huggingface.
  • For long recordings consider chunking/streaming outside of Sdab to stay within GPU memory.
  • Results are returned from sd.transcribe() directly; the class no longer stores a separate sd.result.
  • Errors while loading a model are wrapped in a helpful RuntimeError with the model name.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdab-1.0.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdab-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file sdab-1.0.0.tar.gz.

File metadata

  • Download URL: sdab-1.0.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1110943c1fbea34b4ea76df086465220358a3adf7eb9d484830a9f70bb65f493
MD5 658a279d1dcb481f109d5e5303553022
BLAKE2b-256 d691ebaa7a9d2c958a4cefde07c4b42ad909e7da8c9bd2bdec6d299ceabb8712

See more details on using hashes here.

File details

Details for the file sdab-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sdab-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for sdab-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9b8d0f74ed03d8035acf072439088817d2dd9fe81269e1f01341ba778f76951
MD5 7bfb3a454ef9ae352df33dd9e0b27b68
BLAKE2b-256 c4e3b333aeff2b1a3bd9bdb32c715671a1cef7b8acf55fe407f05ed35d64e201

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page