NVIDIA MarbleNet vad model for fasr
Project description
fasr-vad-marblenet
基于 NVIDIA MarbleNet ONNX 推理脚本封装的 VAD 插件,为 fasr 提供离线语音活动检测能力。插件已内置 model.onnx,默认可直接加载。
安装
pip install fasr-vad-marblenet
注册模型
| 注册名 | 类 | 说明 |
|---|---|---|
marblenet |
MarbleNetForVAD |
NVIDIA MarbleNet 非流式 VAD,ONNX Runtime 推理 |
使用方式
在流水线中使用
from fasr import AudioPipeline
pipeline = (
AudioPipeline()
.add_pipe("detector", model="marblenet", checkpoint_dir="/path/to/onnx_dir")
.add_pipe("recognizer", model="paraformer")
)
单独使用模型
模型实例化时会自动执行 download_checkpoint() + load_checkpoint(),默认使用插件内置 ONNX,无需手动调用:
from fasr.config import registry
from fasr.data import AudioSpan, Waveform
model = registry.vad_models.get("marblenet")()
audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for seg in segments:
print(f"{seg.start_ms}ms - {seg.end_ms}ms")
若需使用自定义权重目录,可重新调用 load_checkpoint:
model.load_checkpoint("/path/to/custom/marblenet")
运行期 / 会话参数
构造时传入,或通过字段赋值覆盖:
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
checkpoint |
str | None |
None |
远程 repo_id;非空时实例化会自动下载到 cache_dir,默认使用插件内置 ONNX |
cache_dir |
str | Path | None |
None |
缓存目录,None 使用 fasr.utils.get_cache_dir() |
endpoint |
Literal["modelscope", "huggingface", "hf-mirror"] |
"hf-mirror" |
下载端点 |
model_path |
str | Path | None |
None |
直接指定 .onnx 文件路径,优先级高于 checkpoint_dir |
providers |
list[str] | None |
["CPUExecutionProvider"] |
ONNX Runtime provider 列表 |
intra_op_num_threads |
int |
2 |
ONNX Runtime 算子内并行线程数 |
inter_op_num_threads |
int |
0 |
ONNX Runtime 算子间并行线程数 |
VAD 算法参数
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
speaking_score |
float |
0.5 |
语音激活阈值,高于此值判定为开始说话 |
silence_score |
float |
0.5 |
语音结束阈值,高于此值判定为回到静音 |
fusion_threshold |
float |
0.1 |
邻近片段合并阈值(秒),间隔小于此值的片段会合并 |
min_speech_duration |
float |
0.05 |
最小语音段时长(秒),短于此值的片段会被过滤 |
max_speech_duration |
float | None |
None |
最大单段语音时长(秒),超出后会在后处理阶段按固定长度强制切分 |
output_frame_length |
int |
320 |
每帧采样点数,默认对应 20ms@16kHz |
VAD 参数调优
片段长度主要由 silence_score 和 fusion_threshold 控制:
| 问题 | 调整方式 | 说明 |
|---|---|---|
| 片段过长 | 降低 silence_score 或 fusion_threshold |
让模型更容易切分 |
| 片段过短/碎片化 | 提高 silence_score 或 fusion_threshold |
让模型更稳定,合并邻近片段 |
| 需要限制单段最大时长 | 设置 max_speech_duration |
连续语音超过阈值时会被硬切分,不依赖静音边界 |
参数调整示例
减少片段长度(更细粒度切分):
pipeline.add_pipe(
"detector",
model="marblenet",
silence_score=0.3, # 降低阈值,更容易检测到语音结束
fusion_threshold=0.05, # 降低合并阈值,减少片段合并
)
增加片段长度(减少碎片化):
pipeline.add_pipe(
"detector",
model="marblenet",
silence_score=0.7, # 提高阈值,需要更确信才结束片段
fusion_threshold=0.3, # 提高合并阈值,合并更多邻近片段
min_speech_duration=0.1, # 过滤短片段
)
单独使用模型时调整参数:
model = registry.vad_models.get("marblenet")(
silence_score=0.3,
fusion_threshold=0.05,
)
# or mutate after construction
model.silence_score = 0.3
model.fusion_threshold = 0.05
限制单段最大时长(hard split):
pipeline.add_pipe(
"detector",
model="marblenet",
max_speech_duration=15.0, # Force-split any speech segment longer than 15s
)
依赖
fasrnumpy >= 1.24onnxruntime >= 1.16.0- Python 3.10–3.12
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasr_vad_marblenet-0.5.1.tar.gz.
File metadata
- Download URL: fasr_vad_marblenet-0.5.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b49fd1e8f117cdc9466de7eb373888043ba321d7523460f687553f51bdb2ff6e
|
|
| MD5 |
9adf6b16ad67c1d17ca13ede75cb7980
|
|
| BLAKE2b-256 |
3507366655642c082d0a9c9ad228d6193daefb94f42a073a9c7814ff99c20973
|
File details
Details for the file fasr_vad_marblenet-0.5.1-py3-none-any.whl.
File metadata
- Download URL: fasr_vad_marblenet-0.5.1-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1df81bdd8e3c37ca927dd058c3fe5409409a81f36370cb7ad2e7aeca18e66ec
|
|
| MD5 |
ca6c3c72eb946c2ebffd9137ad4a2560
|
|
| BLAKE2b-256 |
8ce8c47521f6478966f45504b14451d7b2f62ddc8b8e53d1bcfea7a882303b19
|