Skip to main content

FireRedVAD for fasr (bundled fireredvad inference)

Project description

fasr-vad-firered

Chinese documentation

FireRedVAD voice activity detection for fasr. This is an offline neural VAD that loads FireRed's PyTorch checkpoint and returns AudioSpan speech segments.

Install

pip install fasr-vad-firered

Registered Model

Registry name Class Best for
firered FireRedForVAD Offline VAD with FireRed checkpoints

The default checkpoint is FireRedTeam/FireRedVAD. Local checkpoint directories must contain the upstream VAD files, typically cmvn.ark and model.pth.tar or a VAD/ subdirectory containing them.

Pipeline Usage

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe(
        "detector",
        model="firered",
        speech_threshold=0.4,
        use_gpu=False,
    )
    .add_pipe("recognizer", model="firered_aed")
    .add_pipe("sentencizer", model="ct_transformer")
)

Quick choices:

Goal Use Result
Reduce noise false positives speech_threshold=0.55 Requires stronger speech posterior
Keep weak speech speech_threshold=0.3 More sensitive, but may include noise
Use GPU inference use_gpu=True Faster when CUDA is available

Confection Config

[vad_model]
@vad_models = "firered"
use_gpu = false
speech_threshold = 0.4

Inside a pipeline:

[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]

[pipeline.pipes]

[pipeline.pipes.detector]
@pipes = "thread_pipe"

[pipeline.pipes.detector.component]
@components = "detector"

[pipeline.pipes.detector.component.model]
@vad_models = "firered"
use_gpu = false
speech_threshold = 0.4

Direct Model Usage

from fasr.config import registry
from fasr.data import AudioSpan, Waveform

model = registry.vad_models.get("firered")(
    speech_threshold=0.4,
    use_gpu=True,
)

audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
    print(f"{segment.start_ms}ms - {segment.end_ms}ms")

Use local weights:

model.load_checkpoint("/path/to/FireRedVAD")

Parameters

Parameter Type / range Default Higher value Lower value Change when
use_gpu bool False Enables CUDA inference Uses CPU You have CUDA available and need speed
speech_threshold float, 0.0 to 1.0 0.4 More conservative; fewer false positives More sensitive; more weak speech retained Noise leaks in, or speech is missed

Generic checkpoint fields such as checkpoint, cache_dir, endpoint, revision, and force_download are inherited from the base model.

Tuning Guide

Symptom Try first
Noise is detected as speech Raise speech_threshold to 0.5 or 0.6
Quiet speech is missed Lower speech_threshold to 0.3
CPU inference is too slow Set use_gpu=True on a CUDA machine

Dependencies

  • fasr
  • torch >= 2.0.0
  • soundfile >= 0.12.0
  • kaldiio >= 2.18.0
  • kaldi-native-fbank >= 1.19.0
  • Python 3.10-3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_vad_firered-0.5.2.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_vad_firered-0.5.2-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file fasr_vad_firered-0.5.2.tar.gz.

File metadata

  • Download URL: fasr_vad_firered-0.5.2.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_firered-0.5.2.tar.gz
Algorithm Hash digest
SHA256 3352af0ccc995583f2153a25f9089484f0c2936138db40b568c7214936307d10
MD5 0a16ffe666fdb7a357cd6664146dbe55
BLAKE2b-256 fb4410f5de10fd588cdb2c2790da02dc6db55c68cfa1c7af24a6f53219a291c7

See more details on using hashes here.

File details

Details for the file fasr_vad_firered-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: fasr_vad_firered-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_vad_firered-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5d301ce14ebed21486d96b675782e17882fc23d5881c82d2b658a5f2da1fd173
MD5 9751bdb1e7dd7bdd00e12050608c48e8
BLAKE2b-256 c0a6d695aaa877e7e7486f0f25170ae292761be25e05fafa96093e573fde259c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page