Skip to main content

fsmn vad model for fasr

Project description

fasr-vad-fsmn

基于 FSMN(Feedforward Sequential Memory Networks)的语音活动检测模型插件,使用 ONNX Runtime 推理,为 fasr 提供高效离线 VAD 能力。

安装

pip install fasr-vad-fsmn

注册模型

注册名 说明
fsmn FSMNForVAD FSMN 离线 VAD,ONNX Runtime 推理,内置默认权重
stream_fsmn / stream_fsmn.torch FSMNForStreamVAD 流式 FSMN VAD,基于 funasr AutoModel
stream_fsmn.onnx FSMNForStreamVADOnnx 流式 FSMN VAD,基于 funasr-onnx

使用方式

在流水线中使用

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe("recognizer", model="paraformer")
    .add_pipe("sentencizer", model="ct_transformer")
)

单独使用模型

模型实例化时会自动执行 download_checkpoint() + load_checkpoint(),无需手动调用:

from fasr.config import registry
from fasr.data import Waveform

model = registry.vad_models.get("fsmn")()  # 使用内置权重自动加载

waveform = Waveform.from_file("example.wav")
segments = model.detect(waveform)
for seg in segments:
    print(f"{seg.start_ms}ms - {seg.end_ms}ms")

若需使用自定义权重目录,可重新调用 load_checkpoint

model.load_checkpoint("/path/to/custom/fsmn-vad")

动态更新 VAD 配置

model.update_config(
    max_end_silence_time=500,
    speech_noise_thres=0.5,
)
print(model.get_config())

运行期 / 会话参数

构造时传入,或通过字段赋值覆盖:

参数 类型 默认值 说明
checkpoint str | None None 远程 repo_id;非空时实例化会自动下载到 cache_dir
cache_dir str | Path | None None 缓存目录,None 使用 fasr.utils.get_cache_dir()
endpoint Literal["modelscope", "huggingface", "hf-mirror"] "hf-mirror" 下载端点
device_id str | int | None None ONNX Runtime 设备 ID(None / -1 表示 CPU)
num_threads int 2 ONNX Runtime intra-op 线程数
compile_preprocessor bool False 是否用 torch.compile 编译 fbank 预处理

VAD 配置参数

以下参数可在实例化时设置,也可通过 update_config() 动态更新:

参数 类型 默认值 说明
detect_mode int 1 检测模式:0=单句,1=多句
max_end_silence_time int 800 语音段结束后最大静音时长(毫秒)
max_start_silence_time int 3000 开始前最大静音时长(毫秒)
max_single_segment_time int 60000 单段最大时长(毫秒)
speech_noise_thres float 0.6 语音/噪声判定阈值(0-1)
sil_to_speech_time_thres int 150 静音→语音转换阈值(毫秒)
speech_to_sil_time_thres int 150 语音→静音转换阈值(毫秒)
window_size_ms int 200 检测窗口大小(毫秒)
do_start_point_detection bool True 是否检测起始点
do_end_point_detection bool True 是否检测结束点
do_extend int 1 是否扩展语音段边界
lookback_time_start_point int 200 起始点回看时间(毫秒)
lookahead_time_end_point int 100 结束点前瞻时间(毫秒)
snr_mode int 0 信噪比模式
snr_thres float -100.0 信噪比阈值
decibel_thres float -100.0 分贝阈值
speech_2_noise_ratio float 1.0 语音噪声比
fe_prior_thres float 1e-4 前端先验阈值
frame_in_ms int 10 帧间隔(毫秒)
frame_length_ms int 25 帧长度(毫秒)

依赖

  • fasr
  • funasrfunasr-onnx
  • torchtorchaudio
  • sentencepiece
  • Python 3.10–3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_fsmn-0.5.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_fsmn-0.5.0-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_fsmn-0.5.0.tar.gz.

File metadata

  • Download URL: fasr_vad_fsmn-0.5.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.5.0.tar.gz
Algorithm Hash digest
SHA256 729284f10301747cde5c601cc88cadfe5442b68b0a67d5a3981a14851bddbba7
MD5 f3e78f32b3f7c6525ea4c7e93426b6ec
BLAKE2b-256 179925f994a979be64f4c4ccfcfb1ab7d83e620162537fdfd2dc84d30172a9a0

See more details on using hashes here.

File details

Details for the file fasr_vad_fsmn-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_fsmn-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66cc405136ac83733d3f0b465ff4d80bb6be7d54cd608a4b463315d6ff8b2e90
MD5 7dbe0c87f5e458ebe18be4240a355995
BLAKE2b-256 c856bf8caa4934925538c4f89442ef30a3a20e333d7ed2bd35346af0fed5c749

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page