Skip to main content

VAD-Enhanced ASR Framework for Researchers

Project description

Praasper

PyPI Downloads Python GitHub License

Setup | Usage | Mechanism

Praasper is an Automatic Speech Recognition (ASR) framework designed to help researchers transribe audio files to utterance from a single word to a complete sentence with decent level of accuracy in both transcriptoin and timestamps.

mechanism

In Praasper, we adopt a rather simple and straightforward pipeline to extract utterance-level information from audio files. The pipeline includes VAD (Praditor), ASR (SenseVoiceSmall) and LLM (Qwen).

How to use

Here is one of the simplest examples:

import praasper

model = praasper.init_model()
model.annote("data_folder")

Here are some other parameters you can pass to the annote method:

Param Default Description
ASR iic/SenseVoiceSmall Model name as the ASR core. Check out FunASR's model list for available models.
LLM Qwen/Qwen2.5-1.5B-Instruct Model name as the LLM core. Check out Qwen's model list for available models.
input_path - Path to the folder where audio files are stored.
seg_dur 10. Segment large audio into pieces, in seconds.
min_pause 0.2 Minimum pause duration between two utterances, in seconds.
min_speech 0.2 Minimum duration for an utterance, in seconds.
language None "zh" for Mandarin, "yue" for Cantonese, "en" for English, "ja" for Japanese, "ko" for Korean, and None for automatic language detection.

Here is an code example indicating how you can use these parameters:

import praasper

model = praasper.init_model(
    ASR="iic/SenseVoiceSmall",
    LLM="Qwen/Qwen2.5-1.5B-Instruct"
)

model.annote(
    input_path="data_folder",
    min_pause=.8,
    min_speech=.2,
    language=None,
    seg_dur=15.
)

Fine-tune Praditor

Praasper is embedded with a default set of parameters for Praditor. But the default parameters may not be always optimal. In that case, you are recommended to use a custom set of parameters for Praditor.

  1. Use the lastest version of Praditor (v1.3.1). It supports VAD.
  2. Annotate the audio file. Fine-tune the parameters until the results fits your standard.
  3. Click Save under the Current mode (top-right corner).

Praditor will then save a .txt param file to the same folder as the input audio file, with which Praasper will overrule the default params.

ASR/LLM model recommendation

For ASR core, iic/SenseVoiceSmall is the only recommendedation at this moment.

For LLM core, the recommended models include (from large to small ones): Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen2.5-1.5B-Instruct (default). The default is small but good enough for laptop users. You are also welcome to try other Qwen models.

Mechanism

Praditor is applied to perform Voice Activity Detection (VAD) algorithm to (1) segment large audio files into smaller pieces and (2) extract utterance. It can generate intervals with millisecond-level precision. It is originally a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.

SenseVoiceSmall is used to transcribe the audio file, which does not offer timestamps. It is a lightweight ASR model compatible with even laptop. It has better support for short-length audio files, compared to Whisper.

In addition, in case that users want to designate one langauge throughout transcription, an additional LLM (Qwen/Qwen2.5-1.5B-Instruct) is added to the framework to correct potential error in the transcription.

Setup

pip installation

pip install -U praasper

If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.

GPU Acceleration (Windows/Linux)

Currently, Praasper utilizes SenseVoiceSmall from FunASR as the ASR core.

FunASR can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.

  • For macOS users, only CPU is supported as the processing device.
  • For Windows/Linux users, the priority order should be: CUDA -> CPU.

If you have no experience in installing CUDA, follow the steps below:

First, go to command line and check the latest CUDA version your system supports:

nvidia-smi

Results should pop up like this (It means that this device supports CUDA up to version 12.9).

| NVIDIA-SMI 576.80                 Driver Version: 576.80         CUDA Version: 12.9     |

Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.

Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.

Here is an example for CUDA 12.9:

pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

(Advanced) uv installation

uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:

pip install uv

Then, create a virtual environment (e.g., .venv):

uv venv .venv

You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)

Lastly, install praasper (by adding uv before pip):

uv pip install -U praasper

For CUDA support, here is an example for downloading torch that fits CUDA 12.9:

uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

Dev Plan

  • Add more LLM models support.
  • Seperate LLM strategies for error correction and language correction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praasper-0.5.2.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

praasper-0.5.2-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file praasper-0.5.2.tar.gz.

File metadata

  • Download URL: praasper-0.5.2.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.5.2.tar.gz
Algorithm Hash digest
SHA256 84e35eb3fa11b5bd4aaa3cad8749f0b087bb25f2dc1a79a83f8b214d3ec8175e
MD5 89c49c2c6c680451365402017d2dc0a5
BLAKE2b-256 5edc655136748c31869e6264780a2831e1b3f6aa47e61d9057325329102127b2

See more details on using hashes here.

File details

Details for the file praasper-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: praasper-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for praasper-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5f922b83b61aa4d80cdc5bca63b0158975dfa9da185406952018a2c99a580dd0
MD5 53ffa138f86c1c035bb66f58adad5368
BLAKE2b-256 4ab58cc968cb2e9c1e4b70d56ab669db9173467d835235a1431afca39a5a3563

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page