NVIDIA MarbleNet vad model for fasr
Project description
fasr-vad-marblenet
NVIDIA MarbleNet voice activity detection for fasr. The plugin ships a bundled
ONNX model, so the default marblenet registry entry works without downloading
extra weights.
Install
pip install fasr-vad-marblenet
Registered Model
| Registry name | Class | Best for |
|---|---|---|
marblenet |
MarbleNetForVAD |
Offline CPU-friendly VAD with ONNX Runtime |
Pipeline Usage
from fasr import AudioPipeline
pipeline = (
AudioPipeline()
.add_pipe(
"detector",
model="marblenet",
speaking_score=0.55,
silence_score=0.45,
fusion_threshold=0.2,
)
.add_pipe("recognizer", model="paraformer")
)
Quick choices:
| Goal | Use | Result |
|---|---|---|
| Reduce false starts from noise | speaking_score=0.65 |
Speech starts only when the model is more confident |
| Keep quiet speech | speaking_score=0.35 |
More sensitive starts, with more risk of noise |
| End speech sooner | silence_score=0.35 |
Shorter segments, lower trailing silence |
| Avoid fragmented segments | fusion_threshold=0.3 |
Merges speech pieces separated by short pauses |
| Drop clicks or very short bursts | min_speech_duration=0.1 |
Filters segments shorter than 100 ms |
| Cap ASR segment length | max_speech_duration=15.0 |
Hard-splits long speech spans into 15-second pieces |
Confection Config
[vad_model]
@vad_models = "marblenet"
speaking_score = 0.55
silence_score = 0.45
fusion_threshold = 0.2
Inside a pipeline:
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]
[pipeline.pipes]
[pipeline.pipes.detector]
@pipes = "thread_pipe"
[pipeline.pipes.detector.component]
@components = "detector"
[pipeline.pipes.detector.component.model]
@vad_models = "marblenet"
speaking_score = 0.55
silence_score = 0.45
fusion_threshold = 0.2
Direct Model Usage
from fasr.config import registry
from fasr.data import AudioSpan, Waveform
model = registry.vad_models.get("marblenet")(
speaking_score=0.55,
silence_score=0.45,
)
audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
print(f"{segment.start_ms}ms - {segment.end_ms}ms")
Use a local ONNX directory when needed:
model.load_checkpoint("/path/to/marblenet")
Parameters
| Parameter | Type / range | Default | Higher value | Lower value | Change when |
|---|---|---|---|---|---|
speaking_score |
float, 0.0 to 1.0 |
0.5 |
More conservative starts | More sensitive starts | Starts are too eager or quiet speech is missed |
silence_score |
float, 0.0 to 1.0 |
0.5 |
Speech ends later | Speech ends sooner | Segments are too long or clipped |
fusion_threshold |
float >= 0, seconds |
0.1 |
Merges wider gaps | Keeps nearby segments separate | Output is too fragmented or too merged |
min_speech_duration |
float >= 0, seconds |
0.05 |
Filters more short segments | Keeps shorter bursts | Clicks leak through, or short words disappear |
max_speech_duration |
float > 0 or None, seconds |
None |
Longer hard-split limit | Shorter hard-split limit | ASR works better with bounded segments |
intra_op_num_threads |
int >= 0 |
2 |
More CPU parallelism | Less CPU usage | CPU throughput needs tuning |
inter_op_num_threads |
int >= 0 |
0 |
More operator-level parallelism | Lets ORT decide | Advanced ONNX Runtime tuning |
Tuning Guide
| Symptom | Try first |
|---|---|
| Noise starts speech segments | Raise speaking_score to 0.6 or 0.7 |
| Quiet speech start is missed | Lower speaking_score to 0.35 or 0.4 |
| Segment tail is too long | Lower silence_score to 0.35 or 0.4 |
| Speech is cut too early | Raise silence_score to 0.6 |
| Segments are too fragmented | Raise fusion_threshold to 0.2 or 0.3 |
| Very short false segments appear | Raise min_speech_duration to 0.1 |
Dependencies
fasrnumpy >= 1.24onnxruntime >= 1.16.0- Python 3.10-3.12
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasr_vad_marblenet-0.5.2.tar.gz.
File metadata
- Download URL: fasr_vad_marblenet-0.5.2.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2810bc11e341f0d678f296610f3402d9f8a6064e0f19b251d7ea3120b58dae6e
|
|
| MD5 |
0b838100f95faaa80107eb66ffe759d3
|
|
| BLAKE2b-256 |
931b212c42a5257b3fe3b676030b968771d891e13da572c47ceac0bfec17c867
|
File details
Details for the file fasr_vad_marblenet-0.5.2-py3-none-any.whl.
File metadata
- Download URL: fasr_vad_marblenet-0.5.2-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa6723e6e02c7af4dae442a380f4a9120d88409804cf3613a03cf04bedcca390
|
|
| MD5 |
2b7fafcd61673ebbebf12cff3730e3bf
|
|
| BLAKE2b-256 |
da0428abd5df9b78841b2da063aad977fcebe909ee313990b398b9b937e90953
|