A tool for automatic speech recognition and annotation
Project description
Praasper
Praasper is a speech processing framework designed to help researchers transcribe audio files into word-level timestamps — from a single word to a complete sentence — with high accuracy in both transcription and timestamps.
In Praasper, the pipeline has three steps. First, VAD (Praditor) segments audio into chunks and extracts speech intervals with millisecond-level precision. Second, ASR (Fun-ASR-Nano) transcribes each chunk with word-level timestamps. Third, timestamps are aligned to VAD intervals and exported as a TextGrid file where each interval contains one word and its start/end time.
How to use
Here is one of the simplest examples:
import praasper
model = praasper.init_model()
model.annote("data_folder")
Here are the parameters you can pass to init_model and annote:
| Param | Default | Description |
|---|---|---|
infer_mode |
"local" |
ASR backend: "local" for on-device FunASR-Nano, "api" for DashScope cloud API. |
device |
"auto" |
Hardware for local inference: "auto", "cuda", or "cpu". Ignored in API mode. |
ASR |
FunAudioLLM/Fun-ASR-Nano-2512 | Advanced: override the default local ASR model. See FunASR model zoo. |
api_key |
None |
DashScope API key. Required when infer_mode="api". Can also be set via DASHSCOPE_API_KEY env var. |
input_path |
— | Path to the folder where audio files are stored. |
seg_dur |
10. | Segment large audio into pieces, in seconds. |
min_pause |
0.2 | Minimum pause duration between two utterances, in seconds. |
Here are code examples showing how to use these parameters:
import praasper
# Local inference (default)
model = praasper.init_model()
# Local inference on GPU
model = praasper.init_model(device="cuda")
# DashScope cloud API
model = praasper.init_model(infer_mode="api", api_key="sk-...")
# Custom local ASR model (advanced)
model = praasper.init_model(ASR="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
model.annote(
input_path="data_folder",
min_pause=.8,
seg_dur=15.
)
Fine-tune Praditor
Praasper is embedded with a default set of parameters for Praditor. But the default parameters may not always be optimal. In that case, you are recommended to use a custom set of parameters for Praditor.
- Use the latest version of Praditor (v1.3.1). It supports VAD.
- Annotate the audio file. Fine-tune the parameters until the results fit your standard.
- Click
Saveunder theCurrentmode (top-right corner).
Praditor will then save a .txt param file to the same folder as the input audio file, which Praasper will use to override the default params.
Here is a code example showing how to override the default parameters for a specific audio file. The VAD parameters are passed as a dict with onset and offset sections:
import praasper
model = praasper.init_model()
# Define custom VAD parameters
custom_params = {
"onset": {"amp": "1.05", "cutoff0": "60", "cutoff1": "10800", "numValid": "475", "eps_ratio": "0.05"},
"offset": {"amp": "1.05", "cutoff0": "60", "cutoff1": "10800", "numValid": "475", "eps_ratio": "0.05"},
}
model.annote(
input_path="data_folder",
params=custom_params,
)
Alternatively, save the parameters to a .txt file and pass the file path instead:
model.annote(
input_path="data_folder",
params="/path/to/custom_params.txt",
)
In both cases, Praasper will use your custom VAD parameters instead of running the auto grid search. To export the current parameters to a .txt file for later reuse:
model.export_params("/path/to/custom_params.txt")
ASR: local vs. cloud
Praasper supports two ASR backends, chosen via the infer_mode parameter:
infer_mode |
Backend | Best for |
|---|---|---|
"local" (default) |
FunASR-Nano — lightweight, runs on laptop CPU/GPU | Offline work, no API costs, privacy |
"api" |
DashScope (AliCloud) — cloud ASR with stronger accuracy | High-accuracy needs, server deployments |
Local mode (infer_mode="local")
model = praasper.init_model() # auto-detect GPU/CPU
model = praasper.init_model(device="cuda") # force GPU
model = praasper.init_model(device="cpu") # force CPU
The default model is FunAudioLLM/Fun-ASR-Nano-2512, which supports Mandarin, Cantonese, English, Japanese, and Korean with word-level timestamps. Power users can swap in a different FunASR model via the ASR parameter.
API mode (infer_mode="api")
model = praasper.init_model(infer_mode="api", api_key="sk-...")
# or set DASHSCOPE_API_KEY environment variable:
# model = praasper.init_model(infer_mode="api")
Note: DashScope requires audio files to be hosted at a public URL (HTTP/HTTPS/OSS). Local file paths are not supported in API mode.
Mechanism
Praditor is applied to perform Voice Activity Detection (VAD) — it segments large audio files into smaller pieces and extracts speech intervals with millisecond-level precision. It was originally developed as a Speech Onset Detection (SOT) algorithm for language researchers.
The default ASR engine is Fun-ASR-Nano (local mode), a lightweight model that runs on laptop CPU/GPU and supports multiple languages including Mandarin, Cantonese, English, Japanese, and Korean — all with word-level timestamps. For higher accuracy, Praasper can also use DashScope cloud ASR via infer_mode="api".
The VAD intervals and ASR timestamps are then aligned via overlap matching: each word from the ASR output is assigned to the VAD interval it overlaps with most. The result is exported as a TextGrid file where each interval contains one word with its start time, end time, and text.
Setup
pip installation
pip install -U praasper
If you have a successful installation and don't care about GPU acceleration, you can stop right here.
GPU Acceleration (Windows/Linux)
Currently, Praasper utilizes Fun-ASR-Nano from FunASR as the default local ASR engine. Cloud ASR is also available via DashScope (infer_mode="api").
FunASRautomatically detects the best currently available device to use. But you still need to first install the GPU-support version oftorchin order to enable CUDA acceleration.
- For macOS users, only
CPUis supported as the processing device. - For Windows/Linux users, the priority order should be:
CUDA->CPU.
If you have no experience in installing CUDA, follow the steps below:
First, go to command line and check the latest CUDA version your system supports:
nvidia-smi
Results should pop up like this (It means that this device supports CUDA up to version 12.9).
| NVIDIA-SMI 576.80 Driver Version: 576.80 CUDA Version: 12.9 |
Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.
Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.
Here is an example for CUDA 12.9:
pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129
(Advanced) uv installation
uv is also highly recommended for a much faster installation. First, make sure uv is installed in your default environment:
pip install uv
Then, create a virtual environment (e.g., .venv):
uv venv .venv
You should see a new .venv folder appear in your project directory. (You may also want to restart the terminal.)
Lastly, install praasper (by prefixing pip with uv):
uv pip install -U praasper
For CUDA support, here is an example for downloading torch that fits CUDA 12.9:
uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file praasper-0.6.0.tar.gz.
File metadata
- Download URL: praasper-0.6.0.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9541945aa17d4556216e21b5bfe9e2bcc23029bffced06b650bcc98e5596ee5
|
|
| MD5 |
0955de9db05e044ca7a799de54485128
|
|
| BLAKE2b-256 |
d40734bdd25cfe8031c13c3aa75f69f721c0a9266e283e1532621a6cc958bb1b
|
File details
Details for the file praasper-0.6.0-py3-none-any.whl.
File metadata
- Download URL: praasper-0.6.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
222a04a274543e3b61a2c2ce50d8c49ff506019bf97950056d812c30e504dbc4
|
|
| MD5 |
91178d9244651d173565890ada68b560
|
|
| BLAKE2b-256 |
e107ffb5fef2e62ff54824e48c0626ccf303191285898e1da34773cac10c1493
|