Skip to main content

FireRedVAD integration for Pipecat — SOTA streaming Voice Activity Detection supporting 100+ languages

Project description

pipecat-ai-fireredvad

PyPI version Python 3.10+ License: Apache 2.0

A Pipecat integration for FireRedVAD — a SOTA industrial-grade streaming Voice Activity Detection model that supports 100+ languages and outperforms Silero-VAD, TEN-VAD, FunASR-VAD, and WebRTC-VAD on the FLEURS-VAD-102 benchmark.

Metric FireRedVAD Silero-VAD TEN-VAD WebRTC-VAD
F1 Score ↑ 97.57 95.95 95.19 52.30
AUC-ROC ↑ 99.60 97.99 97.81
False Alarm ↓ 2.69 9.41 15.47 2.83

Requirements

  • Python 3.10+
  • pipecat-ai >= 0.0.90
  • fireredvad (installed manually from GitHub — see setup below)
  • Audio: 16 kHz, 16-bit mono PCM

Installation

1. Install this package

pip install pipecat-firered-vad

2. Install FireRedVAD

fireredvad is not on PyPI. Clone and install it from GitHub:

git clone https://github.com/FireRedTeam/FireRedVAD.git
cd FireRedVAD
pip install -r requirements.txt
export PYTHONPATH=$PWD:$PYTHONPATH

3. Download model weights

# via Hugging Face
pip install -U "huggingface_hub[cli]"
huggingface-cli download FireRedTeam/FireRedVAD \
    --local-dir ./pretrained_models/FireRedVAD

# or via ModelScope (recommended if you're in China)
pip install -U modelscope
modelscope download --model xukaituo/FireRedVAD \
    --local_dir ./pretrained_models/FireRedVAD

4. Configure environment

cp .env.example .env
# Edit .env and set FIREREDVAD_MODEL_DIR to your downloaded weights path

Quick Start

import asyncio
from dotenv import load_dotenv
import os

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat_ai_fireredvad import FireVadAnalyzer

load_dotenv()

async def main():
    vad = FireVadAnalyzer(
        model_dir=os.environ["FIREREDVAD_MODEL_DIR"],
        params=VADParams(
            confidence=0.7,
            start_secs=0.2,
            stop_secs=0.5,
        ),
        speech_threshold=0.4,
        smooth_window_size=5,
    )

    # Pass the analyzer to your transport, e.g. DailyTransport:
    # transport = DailyTransport(..., vad_analyzer=vad)

asyncio.run(main())

Configuration Reference

FireVadAnalyzer constructor parameters

Parameter Type Default Description
model_dir str Required. Path to the Stream-VAD model directory.
sample_rate int None Must be 16000 if provided (enforced).
params VADParams None Pipecat-level smoothing (confidence, start/stop secs).
use_gpu bool False Run inference on GPU.
smooth_window_size int 5 Frame-level confidence smoothing window inside FireRedVAD.
speech_threshold float 0.4 Raw model threshold for speech vs silence.
pad_start_frame int 5 Frames prepended at speech onset to avoid clipping.
min_speech_frame int 8 Minimum consecutive frames before segment is confirmed.
max_speech_frame int 2000 Maximum frames in a single speech segment.
min_silence_frame int 20 Silence frames required before a segment ends.
max_buffer_frames int 50 Ring buffer capacity (oldest frames evicted on overflow).

Audio requirements

FireRedVAD only accepts 16 kHz, 16-bit mono PCM. Convert other formats with:

ffmpeg -i input.wav -ar 16000 -ac 1 -acodec pcm_s16le output.wav

Environment Variables

Variable Description
FIREREDVAD_MODEL_DIR Path to the downloaded Stream-VAD model directory.
FIREREDVAD_USE_GPU Set to 1 to enable GPU inference (default: 0).

See .env.example for a ready-to-copy template.


Related packages


Contributing

Pull requests are welcome. For major changes, please open an issue first.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-change
  3. Run linting: ruff check . && ruff format .
  4. Run tests: pytest
  5. Open a PR

License

Apache License 2.0 — see LICENSE for details.

FireRedVAD model weights are released under their own license. See the FireRedVAD repository for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_firered_vad-0.1.0.tar.gz (171.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipecat_firered_vad-0.1.0-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file pipecat_firered_vad-0.1.0.tar.gz.

File metadata

  • Download URL: pipecat_firered_vad-0.1.0.tar.gz
  • Upload date:
  • Size: 171.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pipecat_firered_vad-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ac5a218e744569a0e09c1886d97fda9275dd65bdad8329eff9c8a2944e4020ca
MD5 b1df59f34cddb283d85b22eb883453ee
BLAKE2b-256 45af9826c46aed02af3db7e0ab3f24521b77e0e1b1507d72c8f356ed4991fd3d

See more details on using hashes here.

File details

Details for the file pipecat_firered_vad-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pipecat_firered_vad-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 feff5008bc7c8102e824d0378a8fb39255dd5e132eae2d5d55b7116e3e1bac23
MD5 c2615114186edbe62024a60c6d89771a
BLAKE2b-256 be29e85cd268540d72bd5c544c52080eab52820ded12ab9ae08cfcb6602656b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page