Skip to main content

VAD-Enhanced ASR Framework for Researchers

Project description

Praasper

PyPI Downloads Python GitHub License

Setup | Usage | Mechanism

Praasper is an Automatic Speech Recognition (ASR) framework designed to help researchers transribe audio files to utterance from a single word to a complete sentence with decent level of accuracy in both transcriptoin and timestamps.

mechanism

In Praasper, we adopt a rather simple and straightforward pipeline to extract utterance-level information from audio files. The pipeline includes VAD (Praditor), ASR (SenseVoiceSmall) and LLM (Qwen).

How to use

Here is one of the simplest examples:

import praasper

model = praasper.init_model()
model.annote("data_folder")

Here are some other parameters you can pass to the annote method:

Param Default Description
ASR iic/SenseVoiceSmall Model name as the ASR core. Check out FunASR's model list for available models.
LLM Qwen/Qwen2.5-1.5B-Instruct Model name as the LLM core. Check out Qwen's model list for available models.
input_path - Path to the folder where audio files are stored.
seg_dur 10. Segment large audio into pieces, in seconds.
min_pause 0.2 Minimum pause duration between two utterances, in seconds.
min_speech 0.2 Minimum duration for an utterance, in seconds.
language None "zh" for Mandarin, "yue" for Cantonese, "en" for English, "ja" for Japanese, "ko" for Korean, and None for automatic language detection.

Here is an code example indicating how you can use these parameters:

import praasper

model = praasper.init_model(
    ASR="iic/SenseVoiceSmall",
    LLM="Qwen/Qwen2.5-1.5B-Instruct"
)

model.annote(
    input_path="data_folder",
    min_pause=.8,
    min_speech=.2,
    language=None,
    seg_dur=15.
)

Fine-tune Praditor

Praasper is embedded with a default set of parameters for Praditor. But the default parameters may not be always optimal. In that case, you are recommended to use a custom set of parameters for Praditor.

  1. Use the lastest version of Praditor (v1.3.1). It supports VAD.
  2. Annotate the audio file. Fine-tune the parameters until the results fits your standard.
  3. Click Save under the Current mode (top-right corner).

Praditor will then save a .txt param file to the same folder as the input audio file, with which Praasper will overrule the default params.

ASR/LLM model recommendation

For ASR core, iic/SenseVoiceSmall is the only recommendedation at this moment.

For LLM core, the recommended models include (from large to small ones): Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen2.5-1.5B-Instruct (default). The default is small but good enough for laptop users. You are also welcome to try other Qwen models.

Mechanism

Praditor is applied to perform Voice Activity Detection (VAD) algorithm to (1) segment large audio files into smaller pieces and (2) extract utterance. It can generate intervals with millisecond-level precision. It is originally a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.

SenseVoiceSmall is used to transcribe the audio file, which does not offer timestamps. It is a lightweight ASR model compatible with even laptop. It has better support for short-length audio files, compared to Whisper.

In addition, in case that users want to designate one langauge throughout transcription, an additional LLM (Qwen/Qwen2.5-1.5B-Instruct) is added to the framework to correct potential error in the transcription.

Setup

pip installation

pip install -U praasper

If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.

GPU Acceleration (Windows/Linux)

Currently, Praasper utilizes SenseVoiceSmall from FunASR as the ASR core.

FunASR can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.

  • For macOS users, only CPU is supported as the processing device.
  • For Windows/Linux users, the priority order should be: CUDA -> CPU.

If you have no experience in installing CUDA, follow the steps below:

First, go to command line and check the latest CUDA version your system supports:

nvidia-smi

Results should pop up like this (It means that this device supports CUDA up to version 12.9).

| NVIDIA-SMI 576.80                 Driver Version: 576.80         CUDA Version: 12.9     |

Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.

Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.

Here is an example for CUDA 12.9:

pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

(Advanced) uv installation

uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:

pip install uv

Then, create a virtual environment (e.g., .venv):

uv venv .venv

You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)

Lastly, install praasper (by adding uv before pip):

uv pip install -U praasper

For CUDA support, here is an example for downloading torch that fits CUDA 12.9:

uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

Dev Plan

  • Add more LLM models support.
  • Seperate LLM strategies for error correction and language correction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praasper-0.5.1.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

praasper-0.5.1-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file praasper-0.5.1.tar.gz.

File metadata

  • Download URL: praasper-0.5.1.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.5.1.tar.gz
Algorithm Hash digest
SHA256 b4f7b79b803195a978467c015e7966e6d9307eeb2f426d5a977b29bb72c84a1e
MD5 549448ac8a034120cc52fd11728045c0
BLAKE2b-256 3dd57405ae3be4e6eb8eb416fbcc72a720d4c97de544a8d14f0f022c47c26adb

See more details on using hashes here.

File details

Details for the file praasper-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: praasper-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8e1f385504da2b629acb4b72eb3ae332185d0978929a826152a8f26ee0ca0fd
MD5 89f15da4a43744b31c4e5ac3341d64b6
BLAKE2b-256 5f786d6b576ea4d75fd3160a37b98aa818b6cf6c79ef68a1e056b4ed86b576bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page