fsmn vad model for fasr
Project description
fasr-vad-fsmn
FSMN voice activity detection for fasr. The offline fsmn model delegates
feature extraction and ONNX inference to funasr_onnx; the plugin also provides
fsmn_online for streaming VAD.
Install
pip install fasr-vad-fsmn
Registered Models
| Registry name | Class | Best for |
|---|---|---|
fsmn |
FSMNVad |
Offline VAD, segmenting complete audio into speech spans |
fsmn_online |
FSMNVadOnline |
Streaming VAD, emitting speech chunks as audio arrives |
Pipeline Usage
Any keyword argument after component, model, batch_size, and other pipe
options is forwarded to the detector model. Put FSMN parameters directly on the
detector pipe:
from fasr import AudioPipeline
pipeline = (
AudioPipeline()
.add_pipe(
"detector",
model="fsmn",
max_end_silence_time=600,
speech_noise_thres=0.55,
num_threads=4,
)
.add_pipe("recognizer", model="paraformer")
.add_pipe("sentencizer", model="ct_transformer")
)
Quick choices:
| Goal | Use | Result |
|---|---|---|
| Keep long sentences together | max_end_silence_time=1000 |
Short pauses inside a sentence are less likely to split the segment |
| Lower endpoint latency | max_end_silence_time=300 |
Segments end sooner, but sentences may be split more often |
| Suppress noisy backgrounds | speech_noise_thres=0.7 |
Fewer noise false positives, with higher risk of missing quiet speech |
| Keep quiet or far-field speech | speech_noise_thres=0.45 |
More sensitive detection, with higher risk of including noise |
| Increase CPU throughput | num_threads=4 or num_threads=8 |
More ONNX Runtime CPU parallelism, with higher CPU usage |
| Use GPU | device_id=0 |
Uses GPU 0 through ONNX Runtime, after installing onnxruntime-gpu |
Confection Config
fasr config files use Confection's TOML-style syntax, not YAML.
To configure only the VAD model:
[vad_model]
@vad_models = "fsmn"
max_end_silence_time = 600
speech_noise_thres = 0.55
num_threads = 4
Inside a pipeline, model parameters live under
pipeline.pipes.detector.component.model:
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]
[pipeline.pipes]
[pipeline.pipes.detector]
@pipes = "thread_pipe"
batch_size = 4
batch_timeout = 0.1
[pipeline.pipes.detector.component]
@components = "detector"
num_threads = 2
max_segment_duration = 30.0
[pipeline.pipes.detector.component.model]
@vad_models = "fsmn"
max_end_silence_time = 600
speech_noise_thres = 0.55
num_threads = 4
Direct Model Usage
Model construction automatically downloads and loads the checkpoint.
from fasr.config import registry
from fasr.data import AudioSpan, Waveform
model = registry.vad_models.get("fsmn")(
max_end_silence_time=600,
speech_noise_thres=0.55,
)
audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
print(f"{segment.start_ms}ms - {segment.end_ms}ms")
Use a local checkpoint directory when needed:
model.load_checkpoint("/path/to/fsmn-vad")
Parameters
Offline fsmn exposes only the parameters that still affect funasr_onnx
inference. Generic checkpoint fields such as checkpoint, cache_dir,
endpoint, revision, and force_download are inherited from the base model.
| Parameter | Type / range | Default | Higher value | Lower value | Change when |
|---|---|---|---|---|---|
sample_rate |
int, recommended 16000 |
16000 |
Not recommended; adds resampling/inference cost | Not recommended; may lose speech detail | Usually never; keep model input at 16 kHz |
device_id |
None, -1, "cpu", or GPU id like 0 |
None |
GPU id uses that GPU | None / -1 / "cpu" uses CPU |
You need lower latency or higher concurrency |
num_threads |
int >= 0 |
2 |
Often faster on CPU, but uses more cores | Saves CPU, may slow inference | CPU deployment needs tuning |
max_end_silence_time |
int >= 0, milliseconds |
800 |
More tolerant of pauses; longer, more complete segments; later endpoint | Faster endpoint; more fragmented segments | Sentences are split too often, or endpoint latency is too high |
speech_noise_thres |
float, 0.0 to 1.0 |
0.6 |
More conservative; fewer noise false positives; may miss quiet speech | More sensitive; keeps weak speech; may include noise | Noise is detected as speech, or quiet speech is missed |
Tuning Guide
| Symptom | Try first |
|---|---|
| One sentence is split into many pieces | Raise max_end_silence_time to 1000 or 1200 |
| Speech end is detected too late | Lower max_end_silence_time to 300 to 500 |
| Background noise becomes speech | Raise speech_noise_thres to 0.7 or 0.8 |
| Quiet or far-field speech is missed | Lower speech_noise_thres to 0.45 or 0.5 |
| CPU usage is too high | Lower num_threads |
| CPU inference is too slow | Raise num_threads, or install onnxruntime-gpu and set device_id=0 |
For fsmn_online, use device="cpu" or device="cuda" instead of
device_id. It also exposes chunk_size_ms: smaller chunks improve realtime
responsiveness but increase scheduling overhead; larger chunks improve
throughput but delay output. The default 100 ms is a good starting point.
CPU / GPU
The default runtime is CPU ONNX Runtime. During model loading, the plugin logs whether CPU or GPU is being used.
For GPU inference:
uv pip install onnxruntime-gpu
model = registry.vad_models.get("fsmn")(device_id=0)
stream_model = registry.vad_models.get("fsmn_online")(device="cuda")
Dependencies
fasrfunasr-onnxnumpy >= 1.24onnxruntime >= 1.16, < 1.24- Python 3.10-3.12
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasr_vad_fsmn-0.5.2.tar.gz.
File metadata
- Download URL: fasr_vad_fsmn-0.5.2.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b279e3e16f41c6fc76f7d619b37b14cfae23d70fd39a7a7afb9e3817d7a3471
|
|
| MD5 |
32f7a910d093a10bd9a398e9c8ea45fe
|
|
| BLAKE2b-256 |
79b4da697f2b00520a3d9a68062d9633d15356d69fb0f66e688cf2d47adad8c2
|
File details
Details for the file fasr_vad_fsmn-0.5.2-py3-none-any.whl.
File metadata
- Download URL: fasr_vad_fsmn-0.5.2-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c137228b999cd82f19b543c92f4afddae360e704210df3c1f549c501c80f9567
|
|
| MD5 |
3c5d5b9f983f100606771f55f32fd563
|
|
| BLAKE2b-256 |
e2ef7c0ba335dc23cb028c86dde745b38b7e72db1289c9b53cf96e6473ddb4ed
|