fsmn vad model for fasr
Project description
fasr-vad-fsmn
基于 FSMN(Feedforward Sequential Memory Networks)的语音活动检测模型插件,使用 ONNX Runtime 推理,为 fasr 提供高效离线 VAD 能力。
安装
pip install fasr-vad-fsmn
注册模型
| 注册名 | 类 | 说明 |
|---|---|---|
fsmn |
FSMNForVAD |
FSMN 离线 VAD,ONNX Runtime 推理,内置默认权重 |
使用方式
在流水线中使用
from fasr import AudioPipeline
pipeline = (
AudioPipeline()
.add_pipe("detector", model="fsmn")
.add_pipe("recognizer", model="paraformer")
.add_pipe("sentencizer", model="ct_transformer")
)
单独使用模型
from fasr.config import registry
from fasr.data import Waveform
model = registry.vad_models.get("fsmn")()
model.from_checkpoint() # 使用内置权重
waveform = Waveform.from_file("example.wav")
segments = model.detect(waveform)
for seg in segments:
print(f"{seg.start_ms}ms - {seg.end_ms}ms")
动态更新 VAD 配置
model.update_config(
max_end_silence_time=500,
speech_noise_thres=0.5,
)
print(model.get_config())
from_checkpoint 参数
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
checkpoint_dir |
str | Path | None |
内置权重目录 | 模型权重目录,需含 config.yaml 和 ONNX 模型文件 |
device_id |
str | int | None |
None |
ONNX Runtime 设备 ID |
num_threads |
int |
2 |
推理线程数 |
compile |
bool |
False |
是否编译预处理 |
VAD 配置参数
以下参数可在实例化时设置,也可通过 update_config() 动态更新:
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
detect_mode |
int |
1 |
检测模式:0=单句,1=多句 |
max_end_silence_time |
int |
800 |
语音段结束后最大静音时长(毫秒) |
max_start_silence_time |
int |
3000 |
开始前最大静音时长(毫秒) |
max_single_segment_time |
int |
60000 |
单段最大时长(毫秒) |
speech_noise_thres |
float |
0.6 |
语音/噪声判定阈值(0-1) |
sil_to_speech_time_thres |
int |
150 |
静音→语音转换阈值(毫秒) |
speech_to_sil_time_thres |
int |
150 |
语音→静音转换阈值(毫秒) |
window_size_ms |
int |
200 |
检测窗口大小(毫秒) |
do_start_point_detection |
bool |
True |
是否检测起始点 |
do_end_point_detection |
bool |
True |
是否检测结束点 |
do_extend |
int |
1 |
是否扩展语音段边界 |
lookback_time_start_point |
int |
200 |
起始点回看时间(毫秒) |
lookahead_time_end_point |
int |
100 |
结束点前瞻时间(毫秒) |
snr_mode |
int |
0 |
信噪比模式 |
snr_thres |
float |
-100.0 |
信噪比阈值 |
decibel_thres |
float |
-100.0 |
分贝阈值 |
speech_2_noise_ratio |
float |
1.0 |
语音噪声比 |
fe_prior_thres |
float |
1e-4 |
前端先验阈值 |
frame_in_ms |
int |
10 |
帧间隔(毫秒) |
frame_length_ms |
int |
25 |
帧长度(毫秒) |
依赖
fasrfunasr、funasr-onnxtorch、torchaudiosentencepiece- Python 3.10–3.12
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fasr_vad_fsmn-0.3.9.tar.gz
(3.2 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasr_vad_fsmn-0.3.9.tar.gz.
File metadata
- Download URL: fasr_vad_fsmn-0.3.9.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb414f9e6b185bdb394de8aab50743ff70c14ff89e3fb766b84c12f3e6b10613
|
|
| MD5 |
43f5b0de660a735649014a4d8114ce03
|
|
| BLAKE2b-256 |
72490eff24d871066edd031d04f521ea99de034d311b0adc7356f611488b0a6f
|
File details
Details for the file fasr_vad_fsmn-0.3.9-py3-none-any.whl.
File metadata
- Download URL: fasr_vad_fsmn-0.3.9-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b599a25950eba4e9d8f4c217f3bad55565c9bc672bf611bb467c46c6b565e8e
|
|
| MD5 |
6bf21119b674685d83694429f91e12bb
|
|
| BLAKE2b-256 |
401b6cc7e9edfe77de8315193d1e77cb0eb33540d377a11a9422063ba8bc1baa
|