Skip to main content

fsmn vad model for fasr

Project description

fasr-vad-fsmn

基于 FSMN(Feedforward Sequential Memory Networks)的语音活动检测模型插件,使用 ONNX Runtime 推理,为 fasr 提供高效离线 VAD 能力。

安装

pip install fasr-vad-fsmn

注册模型

注册名 说明
fsmn FSMNForVAD FSMN 离线 VAD,ONNX Runtime 推理,内置默认权重

使用方式

在流水线中使用

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe("recognizer", model="paraformer")
    .add_pipe("sentencizer", model="ct_transformer")
)

单独使用模型

from fasr.config import registry
from fasr.data import Waveform

model = registry.vad_models.get("fsmn")()
model.from_checkpoint()  # 使用内置权重

waveform = Waveform.from_file("example.wav")
segments = model.detect(waveform)
for seg in segments:
    print(f"{seg.start_ms}ms - {seg.end_ms}ms")

动态更新 VAD 配置

model.update_config(
    max_end_silence_time=500,
    speech_noise_thres=0.5,
)
print(model.get_config())

from_checkpoint 参数

参数 类型 默认值 说明
checkpoint_dir str | Path | None 内置权重目录 模型权重目录,需含 config.yaml 和 ONNX 模型文件
device_id str | int | None None ONNX Runtime 设备 ID
num_threads int 2 推理线程数
compile bool False 是否编译预处理

VAD 配置参数

以下参数可在实例化时设置,也可通过 update_config() 动态更新:

参数 类型 默认值 说明
detect_mode int 1 检测模式:0=单句,1=多句
max_end_silence_time int 800 语音段结束后最大静音时长(毫秒)
max_start_silence_time int 3000 开始前最大静音时长(毫秒)
max_single_segment_time int 60000 单段最大时长(毫秒)
speech_noise_thres float 0.6 语音/噪声判定阈值(0-1)
sil_to_speech_time_thres int 150 静音→语音转换阈值(毫秒)
speech_to_sil_time_thres int 150 语音→静音转换阈值(毫秒)
window_size_ms int 200 检测窗口大小(毫秒)
do_start_point_detection bool True 是否检测起始点
do_end_point_detection bool True 是否检测结束点
do_extend int 1 是否扩展语音段边界
lookback_time_start_point int 200 起始点回看时间(毫秒)
lookahead_time_end_point int 100 结束点前瞻时间(毫秒)
snr_mode int 0 信噪比模式
snr_thres float -100.0 信噪比阈值
decibel_thres float -100.0 分贝阈值
speech_2_noise_ratio float 1.0 语音噪声比
fe_prior_thres float 1e-4 前端先验阈值
frame_in_ms int 10 帧间隔(毫秒)
frame_length_ms int 25 帧长度(毫秒)

依赖

  • fasr
  • funasrfunasr-onnx
  • torchtorchaudio
  • sentencepiece
  • Python 3.10–3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_fsmn-0.4.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_fsmn-0.4.0-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_fsmn-0.4.0.tar.gz.

File metadata

  • Download URL: fasr_vad_fsmn-0.4.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.4.0.tar.gz
Algorithm Hash digest
SHA256 80aab64c16df23c0ac40e27a60af5b6cbebb6cb4bbd365b491d9ed3f54e34f8d
MD5 4ab40c7d5613fc5465939d66f1ec577e
BLAKE2b-256 8c693c9d88fd4b80fe945c7d247a67da5ee12cb24c76f701f478ca4f9df3cf78

See more details on using hashes here.

File details

Details for the file fasr_vad_fsmn-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_fsmn-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d34d36bfe71a9afba1ac4c61cc247d47015041a3cf9990bc5412f770f996db8d
MD5 e3dd47cc6580c679a8ab2f857dbccb44
BLAKE2b-256 4046e3157cffa699440254f9ac86ce450180f11f867a4c2ee1cc5f39e04f7275

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page