Skip to main content

VAD-Enhanced ASR with Word- and Phoneme-Level Timestamps

Project description

Praasper

PyPI Downloads Python GitHub License

Praasper is an Automatic Speech Recognition (ASR) application designed help researchers transribe audio files to both word- and phoneme-level text.

mechanism

In Praasper, we adopt a rather simple and straightforward pipeline to extract phoneme-level information from audio files. The pipeline includes Whisper and Praditor.

Now Praasper support Mandarin. In the near future we plan to add support for Cantonese and English.

How to use

The default model is large-v3-turbo.

I personally recommend to use the SOTA model as time isn't a really big problem for offline processing.

import praasper

model = praasper.init_model(model_name="large-v3-turbo")  
model.annote(input_path="data")  # The folder where you store .wav

# If you want to know what other models are available:

# import whisper
# print(whisper.available_models())

Mechanism

Whisper is used to transcribe the audio file to word-level text. At this point, speech onsets and offsets exhibit time deviations in seconds.

Praditor is applied to perform Voice Activity Detection (VAD) algorithm to trim the currently existing word/character-level timestamps to millisecond level. It is a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.

To extract phoneme boundaries, we designed an edge detection algorithm.

  • The audio file is first resampled to 16 kHz as to remove noise in the high-frequency domain.
  • A kernel,[-1, 0, 1], is then applied to the frequency domain to enhance the edge(s) between phonetic segments.
  • The most prominent n peaks are then selected so as to match the wanted number of phonemes.

Setup

pip installation

pip install -U praasper

If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.

GPU Acceleration (Windows/Linux)

Whisper can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.

  • For macOS users, Whisper only detects CPU as the processing device.
  • For Windows/Linux users, the priority order should be: CUDA -> CPU.

If you have no experience in installing CUDA, follow the steps below:

First, go to command line and check the latest CUDA version your system supports:

nvidia-smi

Results should pop up like this:

| NVIDIA-SMI 576.80                 Driver Version: 576.80         CUDA Version: 12.9     |

It means that this device supports CUDA up to version 12.9.

Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.

Next, uninstall the default CPU-only torch:

pip uninstall torch

Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.

Here is an example for CUDA 12.9:

pip install torch --index-url https://download.pytorch.org/whl/cu129

(Advanced) uv installation

uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:

pip install uv

Then, create a virtual environment (e.g., .venv):

uv venv .venv

You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)

Lastly, install praasper (by adding uv before pip):

uv pip install praasper

For CUDA support,

uv pip install --reinstall torch --index-url https://download.pytorch.org/whl/cu129
# Or whichever version that matches your CUDA version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praasper-0.1.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

praasper-0.1.1-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file praasper-0.1.1.tar.gz.

File metadata

  • Download URL: praasper-0.1.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 881098783bb6d041961cc26b427b7cf8fa17b77843a6cc36b9cdeedf0a075448
MD5 cc25c5acc44ed47d3a251dbd6dcfbc70
BLAKE2b-256 00500a7a879a4f18f27b217adc5d19295f10d0c2194a11e1b200256d30ca25b6

See more details on using hashes here.

File details

Details for the file praasper-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: praasper-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b03c30cf4eb27194fb86a59f4bf6a0de6ccb1401cbb6bc4ad1d4d412696a1ef
MD5 a1ff4578b024912d31cc966bae31798d
BLAKE2b-256 0819756beeebc75a95498271f825621716a0ac1005207dca0d80685daa4ca99c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page