Skip to main content

Rust speech gate with Python bindings

Project description

speechgate-rs

Rust implementation of the FASR energy speech gate with Python bindings.

Install

Install the package from PyPI after a release is published:

pip install speechgate-rs

Install the latest version directly from GitHub:

pip install "git+https://github.com/di-osc/speechgate-rs.git"

For local development, install an editable release build with maturin:

uv sync
env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release

Usage

import numpy as np
from speechgate_rs import EnergySpeechGate

audio = np.zeros(16000, dtype=np.float32)
gate = EnergySpeechGate(base_thresh=0.008, max_thresh=0.035)
mask = gate.compute_keep_mask(audio, sample_rate=16000)
gated = gate.apply_array(audio, sample_rate=16000)

For streaming audio, keep one stateful gate per stream:

from speechgate_rs import StreamingEnergySpeechGate

stream_gate = StreamingEnergySpeechGate(stream_context_ms=3000, fade_ms=5)
for chunk in chunks:
    gated_chunk = stream_gate.process_chunk(
        chunk,
        sample_rate=16000,
        is_last=False,
    )

For realtime services that process interleaved streams, use the multi-stream wrapper:

from speechgate_rs import MultiStreamEnergySpeechGate

gate = MultiStreamEnergySpeechGate()
gated_chunk = gate.process_chunk(
    "session-1",
    chunk,
    sample_rate=16000,
    is_last=False,
)

The binding keeps the same energy-gate semantics as the Python EnergySpeechGate implementation in fasr-service-realtime: adaptive RMS thresholding, short voice-burst removal, short silence-gap filling, padding, silence pass windows, streaming context, cross-chunk fade continuity, and fade envelopes.

Parameters

Parameter Default Suggested range Meaning Increase / decrease effect
enabled True True or False Enables the gate. When False, apply_array returns the input audio unchanged. Turn on to filter silence/noise; turn off to bypass the gate completely.
window_ms 10 5-30 ms Analysis window length. RMS energy is computed once per window. Larger is steadier but slower to react; smaller reacts faster but is more sensitive to clicks and short spikes.
base_thresh 0.008 0.001-0.03 RMS Minimum RMS threshold. The adaptive threshold will never go below this value. Larger rejects more quiet speech/noise; smaller keeps softer speech but may pass more background noise.
threshold_ratio 2.0 1.0-5.0 Multiplier applied to the estimated noise floor before clamping. Larger makes the gate stricter in noisy audio; smaller opens the gate more easily.
max_thresh 0.035 0.01-0.1 RMS Maximum RMS threshold. The adaptive threshold will never go above this value. Larger allows the adaptive threshold to become stricter in loud noise; smaller protects quieter speech from being rejected.
smooth_alpha 0.2 0.01-1.0 Exponential smoothing factor for per-window RMS values. Larger follows energy changes faster but may flicker; smaller is steadier but can lag at speech boundaries.
min_voice_windows 5 1-20 windows Minimum consecutive voice windows required to keep a speech region. Larger removes more short bursts but can drop very short words; smaller keeps brief sounds but may pass clicks.
attenuation 0.0 0.0-1.0 gain Gain applied to rejected audio. 0.0 fully mutes it. Larger keeps more background ambience; smaller makes rejected regions quieter.
noise_floor_percentile 20.0 1.0-50.0 Percentile of smoothed RMS values used as the adaptive noise floor estimate. Larger estimates a higher noise floor and becomes stricter; smaller estimates quieter background and opens more easily.
max_silence_gap_windows 8 0-30 windows Maximum silent gap to fill between two voice regions. Larger preserves pauses inside speech but may keep noise between phrases; smaller cuts internal pauses more aggressively.
fade_ms 5 0-50 ms Fade length when switching between kept and rejected audio. Larger makes transitions smoother but may smear boundaries; smaller is tighter but can click or sound abrupt.
stream_context_ms 3000 0-10000 ms Context duration used by streaming integrations to preserve recent audio history. Stateless array APIs keep it for config parity. Larger gives streaming code more history but uses more memory; smaller is lighter but has less context.
pad_voice_windows 2 0-20 windows Windows added before and after detected voice regions. Larger protects speech starts/ends but keeps more surrounding noise; smaller trims tighter but can clip onsets or offsets.
pass_windows 0 0-20 windows Non-voice windows kept after a voice region as a trailing hold. Larger makes streaming output less abrupt; smaller removes trailing silence sooner.

Verify

For performance-sensitive checks, build the native extension in release mode before running tests:

env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release
uv run pytest tests -q

The test suite includes a NumPy reference implementation and verifies that the Rust binding returns identical masks/gated output/compacted output while running faster than the NumPy reference on the benchmark audio.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechgate_rs-0.1.1.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (294.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file speechgate_rs-0.1.1.tar.gz.

File metadata

  • Download URL: speechgate_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for speechgate_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b9ac873fb6faf2ec11c5c3dfc62c38b32faddd96870e9f43f3f9759188da95e3
MD5 41206c692bc18ca3827073ce7e416d14
BLAKE2b-256 b35a9248841abf29fdf9798a02ebe16662e2ae87186ae756b1f6973a55c87b91

See more details on using hashes here.

File details

Details for the file speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

  • Download URL: speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 294.7 kB
  • Tags: CPython 3.12, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fcf7ec151e1298cf871b0fe9408eb470c9ea10e9fdd0d75b09451b54388552e7
MD5 b2318beb4cba003f4a66a9b665c74ef9
BLAKE2b-256 bf711f45dff8c8e6c8dc23cc040bee1aba83f9bffeb2d4c99f3cfc7714cbd73b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page