VAD-Enhanced ASR Framework for Researchers
Project description
Praasper
Praasper is an Automatic Speech Recognition (ASR) framework designed to help researchers transribe audio files to utterance from a single word to a complete sentence with decent level of accuracy in both transcriptoin and timestamps.
In Praasper, we adopt a rather simple and straightforward pipeline to extract utterance-level information from audio files. The pipeline includes VAD (Praditor), ASR (SenseVoiceSmall of FunASR) and LLM (Qwen2.5-1.5B-Instruct).
How to use
Here is one of the simplest examples:
import praasper
model = praasper.init_model()
model.annote("data")
Here are some other parameters you can pass to the annote method:
| Param | Default | Description |
|---|---|---|
ASR |
iic/SenseVoiceSmall | Model name as the ASR core. Check out FunASR's model list for available models. |
LLM |
Qwen/Qwen2.5-1.5B-Instruct | Model name as the LLM core. Check out Qwen's model list for available models. |
input_path |
- | Path to the folder where audio files are stored. |
seg_dur |
10. | Segment large audio into pieces, in seconds. |
min_pause |
0.2 | Minimum pause duration between two utterances, in seconds. |
min_speech |
0.2 | Minimum duration for an utterance, in seconds. |
language |
None | "zh" for Mandarin, "yue" for Cantonese, "en" for English, "ja" for Japanese, "ko" for Korean, and None for automatic language detection. |
Here is an code example indicating how you can use these parameters:
import praasper
model = praasper.init_model(
ASR="iic/SenseVoiceSmall",
LLM="Qwen/Qwen2.5-1.5B-Instruct"
)
model.annote(
input_path="data",
min_pause=.8,
min_speech=.2,
language=None,
seg_dur=15.
)
Fine-tune Praditor
Praasper is embedded with a default set of parameters for Praditor. But the default parameters may not be always optimal. In that case, you are recommended to use a custom set of parameters for Praditor.
- Use the lastest version of Praditor (v1.3.1). It supports VAD.
- Annotate the audio file. Fine-tune the parameters until the results fits your standard.
- Click
Saveunder theCurrentmode (top-right corner).
Praditor will then save a .txt param file to the same folder as the input audio file, with which Praasper will overrule the default params.
ASR/LLM model recommendation
For ASR core, iic/SenseVoiceSmall is the only recommendedation at this moment.
For LLM core, the recommended models include (from large to small ones): Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen2.5-3B-Instruct, Qwen/Qwen2.5-1.5B-Instruct (default). The default is small but good enough for laptop users. You are also welcome to try other Qwen models.
Mechanism
Praditor is applied to perform Voice Activity Detection (VAD) algorithm to segment large audio files into smaller pieces. It can generate intervals with millisecond-level precision. It is a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.
SenseVoiceSmall is used to transcribe the audio file, which does not offer timestamps. It is a lightweight ASR model compatible with even laptop. It has better support for short-length audio files, compared to Whisper.
In addition, in case that users want to designate one langauge throughout transcription, an additional LLM (Qwen/Qwen2.5-1.5B-Instruct) is added to the framework to correct potential error in the transcription.
Setup
pip installation
pip install -U praasper
If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.
GPU Acceleration (Windows/Linux)
Currently, Praasper utilizes SenseVoiceSmall from FunASR as the ASR core.
FunASRcan automaticly detects the best currently available device to use. But you still need to first install GPU-support versiontorchin order to enable CUDA acceleration.
- For macOS users, only
CPUis supported as the processing device. - For Windows/Linux users, the priority order should be:
CUDA->CPU.
If you have no experience in installing CUDA, follow the steps below:
First, go to command line and check the latest CUDA version your system supports:
nvidia-smi
Results should pop up like this (It means that this device supports CUDA up to version 12.9).
| NVIDIA-SMI 576.80 Driver Version: 576.80 CUDA Version: 12.9 |
Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.
Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.
Here is an example for CUDA 12.9:
pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129
(Advanced) uv installation
uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:
pip install uv
Then, create a virtual environment (e.g., .venv):
uv venv .venv
You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)
Lastly, install praasper (by adding uv before pip):
uv pip install -U praasper
For CUDA support, here is an example for downloading torch that fits CUDA 12.9:
uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129
Dev Plan
- Add more LLM models support.
- Seperate LLM strategies for error correction and language correction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file praasper-0.4.5.tar.gz.
File metadata
- Download URL: praasper-0.4.5.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94c49299b1a4627dd24e8c3433fc4c26a09b1bf0ccfa212d35b469b699c85661
|
|
| MD5 |
ec81f60a2a20e253bbe82c7f32b86a18
|
|
| BLAKE2b-256 |
f026b1a0f1c5f17b7aebd6601e3dc0f65ea9e6016196353e18b846b04dec8bfc
|
File details
Details for the file praasper-0.4.5-py3-none-any.whl.
File metadata
- Download URL: praasper-0.4.5-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3e8c1f22b30f1fb702e96e92d31c1aa5cf6b688abebb52cfe329c6bea0ceca6
|
|
| MD5 |
14e4a0bb93d60cbd652863fa95b75bd2
|
|
| BLAKE2b-256 |
97fa6c6aca8209f076a18e08fb0bf0737b2d2b590438ac12d33755bb2c58b1ce
|