fsmn vad model for fasr
Project description
fasr-vad-fsmn
基于 FSMN(Feedforward Sequential Memory Networks)的语音活动检测模型插件,使用 ONNX Runtime 推理,为 fasr 提供高效离线 VAD 能力。
安装
pip install fasr-vad-fsmn
注册模型
| 注册名 | 类 | 说明 |
|---|---|---|
fsmn |
FSMNForVAD |
FSMN 离线 VAD,ONNX Runtime 推理,内置默认权重 |
stream_fsmn / stream_fsmn.torch |
FSMNForStreamVAD |
流式 FSMN VAD,基于 funasr AutoModel |
stream_fsmn.onnx |
FSMNForStreamVADOnnx |
流式 FSMN VAD,基于 funasr-onnx |
使用方式
在流水线中使用
from fasr import AudioPipeline
pipeline = (
AudioPipeline()
.add_pipe("detector", model="fsmn")
.add_pipe("recognizer", model="paraformer")
.add_pipe("sentencizer", model="ct_transformer")
)
单独使用模型
模型实例化时会自动执行 download_checkpoint() + load_checkpoint(),无需手动调用:
from fasr.config import registry
from fasr.data import AudioSpan, Waveform
model = registry.vad_models.get("fsmn")() # 使用内置权重自动加载
audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for seg in segments:
print(f"{seg.start_ms}ms - {seg.end_ms}ms")
若需使用自定义权重目录,可重新调用 load_checkpoint:
model.load_checkpoint("/path/to/custom/fsmn-vad")
动态更新 VAD 配置
model.update_config(
max_end_silence_time=500,
speech_noise_thres=0.5,
)
print(model.get_config())
运行期 / 会话参数
构造时传入,或通过字段赋值覆盖:
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
checkpoint |
str | None |
None |
远程 repo_id;非空时实例化会自动下载到 cache_dir |
cache_dir |
str | Path | None |
None |
缓存目录,None 使用 fasr.utils.get_cache_dir() |
endpoint |
Literal["modelscope", "huggingface", "hf-mirror"] |
"hf-mirror" |
下载端点 |
device_id |
str | int | None |
None |
ONNX Runtime 设备 ID(None / -1 表示 CPU) |
num_threads |
int |
2 |
ONNX Runtime intra-op 线程数 |
compile_preprocessor |
bool |
False |
是否用 torch.compile 编译 fbank 预处理 |
VAD 配置参数
以下参数可在实例化时设置,也可通过 update_config() 动态更新:
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
detect_mode |
int |
1 |
检测模式:0=单句,1=多句 |
max_end_silence_time |
int |
800 |
语音段结束后最大静音时长(毫秒) |
max_start_silence_time |
int |
3000 |
开始前最大静音时长(毫秒) |
max_single_segment_time |
int |
60000 |
单段最大时长(毫秒) |
speech_noise_thres |
float |
0.6 |
语音/噪声判定阈值(0-1) |
sil_to_speech_time_thres |
int |
150 |
静音→语音转换阈值(毫秒) |
speech_to_sil_time_thres |
int |
150 |
语音→静音转换阈值(毫秒) |
window_size_ms |
int |
200 |
检测窗口大小(毫秒) |
do_start_point_detection |
bool |
True |
是否检测起始点 |
do_end_point_detection |
bool |
True |
是否检测结束点 |
do_extend |
int |
1 |
是否扩展语音段边界 |
lookback_time_start_point |
int |
200 |
起始点回看时间(毫秒) |
lookahead_time_end_point |
int |
100 |
结束点前瞻时间(毫秒) |
snr_mode |
int |
0 |
信噪比模式 |
snr_thres |
float |
-100.0 |
信噪比阈值 |
decibel_thres |
float |
-100.0 |
分贝阈值 |
speech_2_noise_ratio |
float |
1.0 |
语音噪声比 |
fe_prior_thres |
float |
1e-4 |
前端先验阈值 |
frame_in_ms |
int |
10 |
帧间隔(毫秒) |
frame_length_ms |
int |
25 |
帧长度(毫秒) |
依赖
fasrfunasr、funasr-onnxtorch、torchaudiosentencepiece- Python 3.10–3.12
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fasr_vad_fsmn-0.5.1.tar.gz
(3.2 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasr_vad_fsmn-0.5.1.tar.gz.
File metadata
- Download URL: fasr_vad_fsmn-0.5.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7183a88812474d9b590df7dd024efe37cda0c81f641ae75f6437d48149d26957
|
|
| MD5 |
84d0ffb08e5c8edcb9c930856c4ca5a7
|
|
| BLAKE2b-256 |
e68731a40f9f9a2b69d63d7096a564376b9590f0ec04315023f54fbfce9be3a6
|
File details
Details for the file fasr_vad_fsmn-0.5.1-py3-none-any.whl.
File metadata
- Download URL: fasr_vad_fsmn-0.5.1-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7a993b4e64e72b28a7c2e283550ee3be411d2373d288a03615872ffb45adcc0
|
|
| MD5 |
3b9f13e1d038ad7b1165c55b91a654a2
|
|
| BLAKE2b-256 |
d5b2ed7e4e18707b3625759fae74b16223fa94cb23148304f3212d8853f45398
|