Skip to main content

fsmn vad model for fasr

Project description

fasr-vad-fsmn

Chinese documentation

FSMN voice activity detection for fasr. The offline fsmn model delegates feature extraction and ONNX inference to funasr_onnx; the plugin also provides fsmn_online for streaming VAD.

Install

pip install fasr-vad-fsmn

Registered Models

Registry name Class Best for
fsmn FSMNVad Offline VAD, segmenting complete audio into speech spans
fsmn_online FSMNVadOnline Streaming VAD, emitting speech chunks as audio arrives

Pipeline Usage

Any keyword argument after component, model, batch_size, and other pipe options is forwarded to the detector model. Put FSMN parameters directly on the detector pipe:

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe(
        "detector",
        model="fsmn",
        max_end_silence_time=600,
        speech_noise_thres=0.55,
        num_threads=4,
    )
    .add_pipe("recognizer", model="paraformer")
    .add_pipe("sentencizer", model="ct_transformer")
)

Quick choices:

Goal Use Result
Keep long sentences together max_end_silence_time=1000 Short pauses inside a sentence are less likely to split the segment
Lower endpoint latency max_end_silence_time=300 Segments end sooner, but sentences may be split more often
Suppress noisy backgrounds speech_noise_thres=0.7 Fewer noise false positives, with higher risk of missing quiet speech
Keep quiet or far-field speech speech_noise_thres=0.45 More sensitive detection, with higher risk of including noise
Increase CPU throughput num_threads=4 or num_threads=8 More ONNX Runtime CPU parallelism, with higher CPU usage
Use GPU device_id=0 Uses GPU 0 through ONNX Runtime, after installing onnxruntime-gpu

Confection Config

fasr config files use Confection's TOML-style syntax, not YAML.

To configure only the VAD model:

[vad_model]
@vad_models = "fsmn"
max_end_silence_time = 600
speech_noise_thres = 0.55
num_threads = 4

Inside a pipeline, model parameters live under pipeline.pipes.detector.component.model:

[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]

[pipeline.pipes]

[pipeline.pipes.detector]
@pipes = "thread_pipe"
batch_size = 4
batch_timeout = 0.1

[pipeline.pipes.detector.component]
@components = "detector"
num_threads = 2
max_segment_duration = 30.0

[pipeline.pipes.detector.component.model]
@vad_models = "fsmn"
max_end_silence_time = 600
speech_noise_thres = 0.55
num_threads = 4

Direct Model Usage

Model construction automatically downloads and loads the checkpoint.

from fasr.config import registry
from fasr.data import AudioSpan, Waveform

model = registry.vad_models.get("fsmn")(
    max_end_silence_time=600,
    speech_noise_thres=0.55,
)

audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
    print(f"{segment.start_ms}ms - {segment.end_ms}ms")

Use a local checkpoint directory when needed:

model.load_checkpoint("/path/to/fsmn-vad")

Parameters

Offline fsmn exposes only the parameters that still affect funasr_onnx inference. Generic checkpoint fields such as checkpoint, cache_dir, endpoint, revision, and force_download are inherited from the base model.

Parameter Type / range Default Higher value Lower value Change when
sample_rate int, recommended 16000 16000 Not recommended; adds resampling/inference cost Not recommended; may lose speech detail Usually never; keep model input at 16 kHz
device_id None, -1, "cpu", or GPU id like 0 None GPU id uses that GPU None / -1 / "cpu" uses CPU You need lower latency or higher concurrency
num_threads int >= 0 2 Often faster on CPU, but uses more cores Saves CPU, may slow inference CPU deployment needs tuning
max_end_silence_time int >= 0, milliseconds 800 More tolerant of pauses; longer, more complete segments; later endpoint Faster endpoint; more fragmented segments Sentences are split too often, or endpoint latency is too high
speech_noise_thres float, 0.0 to 1.0 0.6 More conservative; fewer noise false positives; may miss quiet speech More sensitive; keeps weak speech; may include noise Noise is detected as speech, or quiet speech is missed

Tuning Guide

Symptom Try first
One sentence is split into many pieces Raise max_end_silence_time to 1000 or 1200
Speech end is detected too late Lower max_end_silence_time to 300 to 500
Background noise becomes speech Raise speech_noise_thres to 0.7 or 0.8
Quiet or far-field speech is missed Lower speech_noise_thres to 0.45 or 0.5
CPU usage is too high Lower num_threads
CPU inference is too slow Raise num_threads, or install onnxruntime-gpu and set device_id=0

For fsmn_online, use device="cpu" or device="cuda" instead of device_id. It also exposes chunk_size_ms: smaller chunks improve realtime responsiveness but increase scheduling overhead; larger chunks improve throughput but delay output. The default 100 ms is a good starting point.

CPU / GPU

The default runtime is CPU ONNX Runtime. During model loading, the plugin logs whether CPU or GPU is being used.

For GPU inference:

uv pip install onnxruntime-gpu
model = registry.vad_models.get("fsmn")(device_id=0)
stream_model = registry.vad_models.get("fsmn_online")(device="cuda")

Dependencies

  • fasr
  • funasr-onnx
  • numpy >= 1.24
  • onnxruntime >= 1.16, < 1.24
  • Python 3.10-3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_fsmn-0.5.2.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_fsmn-0.5.2-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_fsmn-0.5.2.tar.gz.

File metadata

  • Download URL: fasr_vad_fsmn-0.5.2.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.5.2.tar.gz
Algorithm Hash digest
SHA256 8b279e3e16f41c6fc76f7d619b37b14cfae23d70fd39a7a7afb9e3817d7a3471
MD5 32f7a910d093a10bd9a398e9c8ea45fe
BLAKE2b-256 79b4da697f2b00520a3d9a68062d9633d15356d69fb0f66e688cf2d47adad8c2

See more details on using hashes here.

File details

Details for the file fasr_vad_fsmn-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_fsmn-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_fsmn-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c137228b999cd82f19b543c92f4afddae360e704210df3c1f549c501c80f9567
MD5 3c5d5b9f983f100606771f55f32fd563
BLAKE2b-256 e2ef7c0ba335dc23cb028c86dde745b38b7e72db1289c9b53cf96e6473ddb4ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page