Skip to main content

NVIDIA MarbleNet vad model for fasr

Project description

fasr-vad-marblenet

Chinese documentation

NVIDIA MarbleNet voice activity detection for fasr. The plugin ships a bundled ONNX model, so the default marblenet registry entry works without downloading extra weights.

Install

pip install fasr-vad-marblenet

Registered Model

Registry name Class Best for
marblenet MarbleNetForVAD Offline CPU-friendly VAD with ONNX Runtime

Pipeline Usage

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe(
        "detector",
        model="marblenet",
        speaking_score=0.55,
        silence_score=0.45,
        fusion_threshold=0.2,
    )
    .add_pipe("recognizer", model="paraformer")
)

Quick choices:

Goal Use Result
Reduce false starts from noise speaking_score=0.65 Speech starts only when the model is more confident
Keep quiet speech speaking_score=0.35 More sensitive starts, with more risk of noise
End speech sooner silence_score=0.35 Shorter segments, lower trailing silence
Avoid fragmented segments fusion_threshold=0.3 Merges speech pieces separated by short pauses
Drop clicks or very short bursts min_speech_duration=0.1 Filters segments shorter than 100 ms
Cap ASR segment length max_speech_duration=15.0 Hard-splits long speech spans into 15-second pieces

Confection Config

[vad_model]
@vad_models = "marblenet"
speaking_score = 0.55
silence_score = 0.45
fusion_threshold = 0.2

Inside a pipeline:

[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]

[pipeline.pipes]

[pipeline.pipes.detector]
@pipes = "thread_pipe"

[pipeline.pipes.detector.component]
@components = "detector"

[pipeline.pipes.detector.component.model]
@vad_models = "marblenet"
speaking_score = 0.55
silence_score = 0.45
fusion_threshold = 0.2

Direct Model Usage

from fasr.config import registry
from fasr.data import AudioSpan, Waveform

model = registry.vad_models.get("marblenet")(
    speaking_score=0.55,
    silence_score=0.45,
)

audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
    print(f"{segment.start_ms}ms - {segment.end_ms}ms")

Use a local ONNX directory when needed:

model.load_checkpoint("/path/to/marblenet")

Parameters

Parameter Type / range Default Higher value Lower value Change when
speaking_score float, 0.0 to 1.0 0.5 More conservative starts More sensitive starts Starts are too eager or quiet speech is missed
silence_score float, 0.0 to 1.0 0.5 Speech ends later Speech ends sooner Segments are too long or clipped
fusion_threshold float >= 0, seconds 0.1 Merges wider gaps Keeps nearby segments separate Output is too fragmented or too merged
min_speech_duration float >= 0, seconds 0.05 Filters more short segments Keeps shorter bursts Clicks leak through, or short words disappear
max_speech_duration float > 0 or None, seconds None Longer hard-split limit Shorter hard-split limit ASR works better with bounded segments
intra_op_num_threads int >= 0 2 More CPU parallelism Less CPU usage CPU throughput needs tuning
inter_op_num_threads int >= 0 0 More operator-level parallelism Lets ORT decide Advanced ONNX Runtime tuning

Tuning Guide

Symptom Try first
Noise starts speech segments Raise speaking_score to 0.6 or 0.7
Quiet speech start is missed Lower speaking_score to 0.35 or 0.4
Segment tail is too long Lower silence_score to 0.35 or 0.4
Speech is cut too early Raise silence_score to 0.6
Segments are too fragmented Raise fusion_threshold to 0.2 or 0.3
Very short false segments appear Raise min_speech_duration to 0.1

Dependencies

  • fasr
  • numpy >= 1.24
  • onnxruntime >= 1.16.0
  • Python 3.10-3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_marblenet-0.5.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_marblenet-0.5.2-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_marblenet-0.5.2.tar.gz.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.2.tar.gz
Algorithm Hash digest
SHA256 2810bc11e341f0d678f296610f3402d9f8a6064e0f19b251d7ea3120b58dae6e
MD5 0b838100f95faaa80107eb66ffe759d3
BLAKE2b-256 931b212c42a5257b3fe3b676030b968771d891e13da572c47ceac0bfec17c867

See more details on using hashes here.

File details

Details for the file fasr_vad_marblenet-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_marblenet-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_marblenet-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aa6723e6e02c7af4dae442a380f4a9120d88409804cf3613a03cf04bedcca390
MD5 2b7fafcd61673ebbebf12cff3730e3bf
BLAKE2b-256 da0428abd5df9b78841b2da063aad977fcebe909ee313990b398b9b937e90953

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page