VAD-Enhanced ASR with Word-Level Timestamps
Project description
Praasper
Praasper is an Automatic Speech Recognition (ASR) application designed help researchers transribe audio files to both word- and phoneme-level text.
In Praasper, we adopt a rather simple and straightforward pipeline to extract phoneme-level information from audio files. The pipeline includes Whisper and Praditor.
Now Praasper support Mandarin (zh). In the near future we plan to add support for Cantonese (yue) and English (en).
For langauges that are not yet support, you can still get a result as the word-level annotation with high external boundaries. While the inner boundries could be inaccurate due to Whisper's feature.
How to use
The default model is large-v3-turbo.
I personally recommend to use the SOTA model as time isn't a really big problem for offline processing.
import praasper
model = praasper.init_model(model_name="large-v3-turbo")
model.annote(input_path="data") # The folder where you store .wav
# If you want to know what other models are available:
# import whisper
# print(whisper.available_models())
The output should be like:
[00:00:242] Loading Whisper model: large-v3-turbo
[00:07:472] Model loaded successfully. Current device in use: cuda:0
[00:07:472] 1 valid audio files detected in C:\Users\User\Desktop\Praasper\data
[00:07:472] Processing test_audio.wav (1/1)
[00:07:472] (test_audio.wav) VAD processing started...
[00:09:202] (test_audio.wav) Drawing onset(s) (7/7, 100%)
[00:09:553] (test_audio.wav) Drawing offset(s) (7/7, 100%)
[00:09:555] (test_audio.wav) VAD results saved
[00:12:181] (test_audio.wav) Transcribing into zh...
[00:12:183] (test_audio.wav) Whisper word-level transcription saved
[00:12:183] (test_audio.wav) Trimming word-level annotation...
[00:12:211] (test_audio.wav) Phoneme-level segmentation saved
[00:12:213] Processing completed.
Mechanism
Whisper is used to transcribe the audio file to word-level text. At this point, speech onsets and offsets exhibit time deviations in seconds.
Praditor is applied to perform Voice Activity Detection (VAD) algorithm to trim the currently existing word/character-level timestamps to millisecond level. It is a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.
To extract phoneme boundaries, we designed an edge detection algorithm.
- The audio file is first resampled to 16 kHz as to remove noise in the high-frequency domain.
- A kernel,
[-1, 0, 1], is then applied to the frequency domain to enhance the edge(s) between phonetic segments. - The most prominent n peaks are then selected so as to match the wanted number of phonemes.
Setup
pip installation
pip install -U praasper
If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.
GPU Acceleration (Windows/Linux)
Whisper can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.
- For macOS users,
Whisperonly supportsCPUas the processing device. - For Windows/Linux users, the priority order should be:
CUDA->CPU.
If you have no experience in installing CUDA, follow the steps below:
First, go to command line and check the latest CUDA version your system supports:
nvidia-smi
Results should pop up like this (It means that this device supports CUDA up to version 12.9).
| NVIDIA-SMI 576.80 Driver Version: 576.80 CUDA Version: 12.9 |
Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.
Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.
Here is an example for CUDA 12.9:
pip install --reinstall torch --index-url https://download.pytorch.org/whl/cu129
(Advanced) uv installation
uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:
pip install uv
Then, create a virtual environment (e.g., .venv):
uv venv .venv
You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)
Lastly, install praasper (by adding uv before pip):
uv pip install -U praasper
For CUDA support,
uv pip install --reinstall torch --index-url https://download.pytorch.org/whl/cu129
# Or whichever version that matches your CUDA version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file praasper-0.2.1.tar.gz.
File metadata
- Download URL: praasper-0.2.1.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11c00ef61b640f801b8b834b628894dbc7f7c58e33664f8ff5aaa4d7e666e5eb
|
|
| MD5 |
dd69efbd2cffa43077f01554cbf123e7
|
|
| BLAKE2b-256 |
0dcdda36f70a3db74c136c004a2cc872d7f310b785d83f3ba1736b1e5f76437a
|
File details
Details for the file praasper-0.2.1-py3-none-any.whl.
File metadata
- Download URL: praasper-0.2.1-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79c46977ae8af5fdee997c72d57b827c03b625bc48c6dff3db04c5d86c2e9948
|
|
| MD5 |
dc5b8f2b03d6d186c92005a94f870c43
|
|
| BLAKE2b-256 |
395b6ae29beeba03165a8f03a5d52462ab7374450d77085186e30db91024f90f
|