Skip to main content

VAD-Enhanced ASR Framework for Researchers

Project description

Praasper

PyPI Downloads Python GitHub License

Praasper is an Automatic Speech Recognition (ASR) framework designed to help researchers transribe audio files to utterance-level text with accurate transcriptoin and timestamps.

mechanism

In Praasper, we adopt a rather simple and straightforward pipeline to extract utterance-level information from audio files. The pipeline includes SenseVoiceSmall and Praditor.

For more information about supported languages, please refer to the FunASR repository.

How to use

The default model is iic/SenseVoiceSmall.

I personally recommend to use the SOTA model as time isn't a really big problem for offline processing.

Here is a simplest example:

import praasper

model = praasper.init_model()
model.annote(input_path="data")  # The folder where you store .wav

Here are some other parameters you can pass to the annote method:

model.annote(
    input_path="data",
    min_pause=.8,  # Minimum pause duration between two utterances, 0.2 seconds as default.
    language=None,  # "zh" for Mandarin, "yue" for Cantonese, "en" for English, None for automatic language detection
    seg_dur=15.,  # Segment large audio into pieces, 15 seconds as default.
)

Mechanism

Praditor is applied to perform Voice Activity Detection (VAD) algorithm to trim the currently existing word/character-level timestamps to millisecond level. It is a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.

SenseVoiceSmall is used to transcribe the audio file, which does not offer timestamps. It has better support for short-length audio files, compared to Whisper.

Setup

pip installation

pip install -U praasper

If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.

GPU Acceleration (Windows/Linux)

Whisper can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.

  • For macOS users, Whisper only supports CPU as the processing device.
  • For Windows/Linux users, the priority order should be: CUDA -> CPU.

If you have no experience in installing CUDA, follow the steps below:

First, go to command line and check the latest CUDA version your system supports:

nvidia-smi

Results should pop up like this (It means that this device supports CUDA up to version 12.9).

| NVIDIA-SMI 576.80                 Driver Version: 576.80         CUDA Version: 12.9     |

Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.

Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.

Here is an example for CUDA 12.9:

pip install --reinstall torch --index-url https://download.pytorch.org/whl/cu129

(Advanced) uv installation

uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:

pip install uv

Then, create a virtual environment (e.g., .venv):

uv venv .venv

You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)

Lastly, install praasper (by adding uv before pip):

uv pip install -U praasper

For CUDA support,

uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129
# Or whichever version that matches your CUDA version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praasper-0.4.4.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

praasper-0.4.4-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file praasper-0.4.4.tar.gz.

File metadata

  • Download URL: praasper-0.4.4.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.4.4.tar.gz
Algorithm Hash digest
SHA256 43c0089c981582ea16bf9cf20969189e1276566e8a6e596b4d2a617f9ab37d5d
MD5 af9b34ddd04b8559cffd4ab4689f35d0
BLAKE2b-256 aaa5577550cba5d64d42826ccbc35b25cd2584adf4d6283c8a43054ea0b612b2

See more details on using hashes here.

File details

Details for the file praasper-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: praasper-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ca5adfa326d6921010c063e11369af2d547fabf8ef05e095944ca4e8bc0c1cba
MD5 b583c7b082b3adc47a66c3abfdc88459
BLAKE2b-256 9be83f65afbd4c88dc066be3bf354a4fb2c9cddd1ad943e99beb7a7de1a72e72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page