Skip to main content

NVIDIA MarbleNet vad model for fasr

Project description

fasr-vad-marblenet

基于 NVIDIA MarbleNet ONNX 推理脚本封装的 VAD 插件,为 fasr 提供离线语音活动检测能力。插件已内置 model.onnx,默认可直接加载。

安装

pip install fasr-vad-marblenet

注册模型

注册名 说明
marblenet MarbleNetForVAD NVIDIA MarbleNet 非流式 VAD,ONNX Runtime 推理

使用方式

在流水线中使用

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="marblenet", checkpoint_dir="/path/to/onnx_dir")
    .add_pipe("recognizer", model="paraformer")
)

单独使用模型

模型实例化时会自动执行 download_checkpoint() + load_checkpoint(),默认使用插件内置 ONNX,无需手动调用:

from fasr.config import registry
from fasr.data import AudioSpan, Waveform

model = registry.vad_models.get("marblenet")()

audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for seg in segments:
    print(f"{seg.start_ms}ms - {seg.end_ms}ms")

若需使用自定义权重目录,可重新调用 load_checkpoint

model.load_checkpoint("/path/to/custom/marblenet")

运行期 / 会话参数

构造时传入,或通过字段赋值覆盖:

参数 类型 默认值 说明
checkpoint str | None None 远程 repo_id;非空时实例化会自动下载到 cache_dir,默认使用插件内置 ONNX
cache_dir str | Path | None None 缓存目录,None 使用 fasr.utils.get_cache_dir()
endpoint Literal["modelscope", "huggingface", "hf-mirror"] "hf-mirror" 下载端点
model_path str | Path | None None 直接指定 .onnx 文件路径,优先级高于 checkpoint_dir
providers list[str] | None ["CPUExecutionProvider"] ONNX Runtime provider 列表
intra_op_num_threads int 2 ONNX Runtime 算子内并行线程数
inter_op_num_threads int 0 ONNX Runtime 算子间并行线程数

VAD 算法参数

参数 类型 默认值 说明
speaking_score float 0.5 语音激活阈值,高于此值判定为开始说话
silence_score float 0.5 语音结束阈值,高于此值判定为回到静音
fusion_threshold float 0.1 邻近片段合并阈值(秒),间隔小于此值的片段会合并
min_speech_duration float 0.05 最小语音段时长(秒),短于此值的片段会被过滤
max_speech_duration float | None None 最大单段语音时长(秒),超出后会在后处理阶段按固定长度强制切分
output_frame_length int 320 每帧采样点数,默认对应 20ms@16kHz

VAD 参数调优

片段长度主要由 silence_scorefusion_threshold 控制:

问题 调整方式 说明
片段过长 降低 silence_scorefusion_threshold 让模型更容易切分
片段过短/碎片化 提高 silence_scorefusion_threshold 让模型更稳定,合并邻近片段
需要限制单段最大时长 设置 max_speech_duration 连续语音超过阈值时会被硬切分,不依赖静音边界

参数调整示例

减少片段长度(更细粒度切分):

pipeline.add_pipe(
    "detector",
    model="marblenet",
    silence_score=0.3,        # 降低阈值,更容易检测到语音结束
    fusion_threshold=0.05,    # 降低合并阈值,减少片段合并
)

增加片段长度(减少碎片化):

pipeline.add_pipe(
    "detector",
    model="marblenet",
    silence_score=0.7,        # 提高阈值,需要更确信才结束片段
    fusion_threshold=0.3,     # 提高合并阈值,合并更多邻近片段
    min_speech_duration=0.1,  # 过滤短片段
)

单独使用模型时调整参数:

model = registry.vad_models.get("marblenet")(
    silence_score=0.3,
    fusion_threshold=0.05,
)
# or mutate after construction
model.silence_score = 0.3
model.fusion_threshold = 0.05

限制单段最大时长(hard split):

pipeline.add_pipe(
    "detector",
    model="marblenet",
    max_speech_duration=15.0,  # Force-split any speech segment longer than 15s
)

依赖

  • fasr
  • numpy >= 1.24
  • onnxruntime >= 1.16.0
  • Python 3.10–3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_marblenet-0.5.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_marblenet-0.5.1-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_marblenet-0.5.1.tar.gz.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.1.tar.gz
Algorithm Hash digest
SHA256 b49fd1e8f117cdc9466de7eb373888043ba321d7523460f687553f51bdb2ff6e
MD5 9adf6b16ad67c1d17ca13ede75cb7980
BLAKE2b-256 3507366655642c082d0a9c9ad228d6193daefb94f42a073a9c7814ff99c20973

See more details on using hashes here.

File details

Details for the file fasr_vad_marblenet-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c1df81bdd8e3c37ca927dd058c3fe5409409a81f36370cb7ad2e7aeca18e66ec
MD5 ca6c3c72eb946c2ebffd9137ad4a2560
BLAKE2b-256 8ce8c47521f6478966f45504b14451d7b2f62ddc8b8e53d1bcfea7a882303b19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page