Skip to main content

fsmn vad model for fasr

Project description

fasr-vad-fsmn

基于 FSMN(Feedforward Sequential Memory Networks)的语音活动检测模型插件,使用 ONNX Runtime 推理,为 fasr 提供高效离线 VAD 能力。

安装

pip install fasr-vad-fsmn

注册模型

注册名 说明
fsmn FSMNForVAD FSMN 离线 VAD,ONNX Runtime 推理,内置默认权重

使用方式

在流水线中使用

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe("recognizer", model="paraformer")
    .add_pipe("sentencizer", model="ct_transformer")
)

单独使用模型

from fasr.config import registry
from fasr.data import Waveform

model = registry.vad_models.get("fsmn")()
model.from_checkpoint()  # 使用内置权重

waveform = Waveform.from_file("example.wav")
segments = model.detect(waveform)
for seg in segments:
    print(f"{seg.start_ms}ms - {seg.end_ms}ms")

动态更新 VAD 配置

model.update_config(
    max_end_silence_time=500,
    speech_noise_thres=0.5,
)
print(model.get_config())

from_checkpoint 参数

参数 类型 默认值 说明
checkpoint_dir str | Path | None 内置权重目录 模型权重目录,需含 config.yaml 和 ONNX 模型文件
device_id str | int | None None ONNX Runtime 设备 ID
num_threads int 2 推理线程数
compile bool False 是否编译预处理

VAD 配置参数

以下参数可在实例化时设置,也可通过 update_config() 动态更新:

参数 类型 默认值 说明
detect_mode int 1 检测模式:0=单句,1=多句
max_end_silence_time int 800 语音段结束后最大静音时长(毫秒)
max_start_silence_time int 3000 开始前最大静音时长(毫秒)
max_single_segment_time int 60000 单段最大时长(毫秒)
speech_noise_thres float 0.6 语音/噪声判定阈值(0-1)
sil_to_speech_time_thres int 150 静音→语音转换阈值(毫秒)
speech_to_sil_time_thres int 150 语音→静音转换阈值(毫秒)
window_size_ms int 200 检测窗口大小(毫秒)
do_start_point_detection bool True 是否检测起始点
do_end_point_detection bool True 是否检测结束点
do_extend int 1 是否扩展语音段边界
lookback_time_start_point int 200 起始点回看时间(毫秒)
lookahead_time_end_point int 100 结束点前瞻时间(毫秒)
snr_mode int 0 信噪比模式
snr_thres float -100.0 信噪比阈值
decibel_thres float -100.0 分贝阈值
speech_2_noise_ratio float 1.0 语音噪声比
fe_prior_thres float 1e-4 前端先验阈值
frame_in_ms int 10 帧间隔(毫秒)
frame_length_ms int 25 帧长度(毫秒)

依赖

  • fasr
  • funasrfunasr-onnx
  • torchtorchaudio
  • sentencepiece
  • Python 3.10–3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_fsmn-0.3.9.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_fsmn-0.3.9-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_fsmn-0.3.9.tar.gz.

File metadata

  • Download URL: fasr_vad_fsmn-0.3.9.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.3.9.tar.gz
Algorithm Hash digest
SHA256 eb414f9e6b185bdb394de8aab50743ff70c14ff89e3fb766b84c12f3e6b10613
MD5 43f5b0de660a735649014a4d8114ce03
BLAKE2b-256 72490eff24d871066edd031d04f521ea99de034d311b0adc7356f611488b0a6f

See more details on using hashes here.

File details

Details for the file fasr_vad_fsmn-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_fsmn-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4b599a25950eba4e9d8f4c217f3bad55565c9bc672bf611bb467c46c6b565e8e
MD5 6bf21119b674685d83694429f91e12bb
BLAKE2b-256 401b6cc7e9edfe77de8315193d1e77cb0eb33540d377a11a9422063ba8bc1baa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page