Skip to main content

NVIDIA MarbleNet vad model for fasr

Project description

fasr-vad-marblenet

基于 NVIDIA MarbleNet ONNX 推理脚本封装的 VAD 插件,为 fasr 提供离线语音活动检测能力。插件已内置 model.onnx,默认可直接加载。

安装

pip install fasr-vad-marblenet

注册模型

注册名 说明
marblenet MarbleNetForVAD NVIDIA MarbleNet 非流式 VAD,ONNX Runtime 推理

使用方式

在流水线中使用

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="marblenet", checkpoint_dir="/path/to/onnx_dir")
    .add_pipe("recognizer", model="paraformer")
)

单独使用模型

模型实例化时会自动执行 download_checkpoint() + load_checkpoint(),默认使用插件内置 ONNX,无需手动调用:

from fasr.config import registry
from fasr.data import Waveform

model = registry.vad_models.get("marblenet")()

waveform = Waveform.from_file("example.wav")
segments = model.detect(waveform)
for seg in segments:
    print(f"{seg.start_ms}ms - {seg.end_ms}ms")

若需使用自定义权重目录,可重新调用 load_checkpoint

model.load_checkpoint("/path/to/custom/marblenet")

运行期 / 会话参数

构造时传入,或通过字段赋值覆盖:

参数 类型 默认值 说明
checkpoint str | None None 远程 repo_id;非空时实例化会自动下载到 cache_dir,默认使用插件内置 ONNX
cache_dir str | Path | None None 缓存目录,None 使用 fasr.utils.get_cache_dir()
endpoint Literal["modelscope", "huggingface", "hf-mirror"] "hf-mirror" 下载端点
model_path str | Path | None None 直接指定 .onnx 文件路径,优先级高于 checkpoint_dir
providers list[str] | None ["CPUExecutionProvider"] ONNX Runtime provider 列表
intra_op_num_threads int 2 ONNX Runtime 算子内并行线程数
inter_op_num_threads int 0 ONNX Runtime 算子间并行线程数

VAD 算法参数

参数 类型 默认值 说明
speaking_score float 0.5 语音激活阈值,高于此值判定为开始说话
silence_score float 0.5 语音结束阈值,高于此值判定为回到静音
fusion_threshold float 0.1 邻近片段合并阈值(秒),间隔小于此值的片段会合并
min_speech_duration float 0.05 最小语音段时长(秒),短于此值的片段会被过滤
output_frame_length int 320 每帧采样点数,默认对应 20ms@16kHz

VAD 参数调优

片段长度主要由 silence_scorefusion_threshold 控制:

问题 调整方式 说明
片段过长 降低 silence_scorefusion_threshold 让模型更容易切分
片段过短/碎片化 提高 silence_scorefusion_threshold 让模型更稳定,合并邻近片段

参数调整示例

减少片段长度(更细粒度切分):

pipeline.add_pipe(
    "detector",
    model="marblenet",
    silence_score=0.3,        # 降低阈值,更容易检测到语音结束
    fusion_threshold=0.05,    # 降低合并阈值,减少片段合并
)

增加片段长度(减少碎片化):

pipeline.add_pipe(
    "detector",
    model="marblenet",
    silence_score=0.7,        # 提高阈值,需要更确信才结束片段
    fusion_threshold=0.3,     # 提高合并阈值,合并更多邻近片段
    min_speech_duration=0.1,  # 过滤短片段
)

单独使用模型时调整参数:

model = registry.vad_models.get("marblenet")(
    silence_score=0.3,
    fusion_threshold=0.05,
)
# or mutate after construction
model.silence_score = 0.3
model.fusion_threshold = 0.05

依赖

  • fasr
  • numpy >= 1.24
  • onnxruntime >= 1.16.0
  • Python 3.10–3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_marblenet-0.5.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_marblenet-0.5.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_marblenet-0.5.0.tar.gz.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.0.tar.gz
Algorithm Hash digest
SHA256 e4e585b862c0e99abace557c62b495a5c5b1e1dbc295d592782983857ac853ef
MD5 0b9e75893faf04363d0ce54817714774
BLAKE2b-256 10bf6de8b71d80b824f0fd786fcc77d310544a5d1e71ef04c7c8bcdde81773d5

See more details on using hashes here.

File details

Details for the file fasr_vad_marblenet-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de5b1352a4d1bde182398d3190573bf4a97a352959fca24f6e87fb327345c58e
MD5 61b9c0bf6c23c065162db58a8298a73b
BLAKE2b-256 4df8ba8ecd4f1fc4fda13c08082ad7d85ec437c6f465c3ce6d79f7492a1d2dd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page