Rust speech gate with Python bindings
Project description
speechgate-rs
Rust implementation of the FASR energy speech gate with Python bindings.
Install
Install the package from PyPI after a release is published:
pip install speechgate-rs
Install the latest version directly from GitHub:
pip install "git+https://github.com/di-osc/speechgate-rs.git"
For local development, install an editable release build with maturin:
uv sync
env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release
Usage
import numpy as np
from speechgate_rs import EnergySpeechGate
audio = np.zeros(16000, dtype=np.float32)
gate = EnergySpeechGate(base_thresh=0.008, max_thresh=0.035)
mask = gate.compute_keep_mask(audio, sample_rate=16000)
gated = gate.apply_array(audio, sample_rate=16000)
For streaming audio, keep one stateful gate per stream:
from speechgate_rs import StreamingEnergySpeechGate
stream_gate = StreamingEnergySpeechGate(stream_context_ms=3000, fade_ms=5)
for chunk in chunks:
gated_chunk = stream_gate.process_chunk(
chunk,
sample_rate=16000,
is_last=False,
)
For realtime services that process interleaved streams, use the multi-stream wrapper:
from speechgate_rs import MultiStreamEnergySpeechGate
gate = MultiStreamEnergySpeechGate()
gated_chunk = gate.process_chunk(
"session-1",
chunk,
sample_rate=16000,
is_last=False,
)
The binding keeps the same energy-gate semantics as the Python EnergySpeechGate
implementation in fasr-service-realtime: adaptive RMS thresholding, short
voice-burst removal, short silence-gap filling, padding, silence pass windows,
streaming context, cross-chunk fade continuity, and fade envelopes.
Parameters
| Parameter | Default | Suggested range | Meaning | Increase / decrease effect |
|---|---|---|---|---|
enabled |
True |
True or False |
Enables the gate. When False, apply_array returns the input audio unchanged. |
Turn on to filter silence/noise; turn off to bypass the gate completely. |
window_ms |
10 |
5-30 ms |
Analysis window length. RMS energy is computed once per window. | Larger is steadier but slower to react; smaller reacts faster but is more sensitive to clicks and short spikes. |
base_thresh |
0.008 |
0.001-0.03 RMS |
Minimum RMS threshold. The adaptive threshold will never go below this value. | Larger rejects more quiet speech/noise; smaller keeps softer speech but may pass more background noise. |
threshold_ratio |
2.0 |
1.0-5.0 |
Multiplier applied to the estimated noise floor before clamping. | Larger makes the gate stricter in noisy audio; smaller opens the gate more easily. |
max_thresh |
0.035 |
0.01-0.1 RMS |
Maximum RMS threshold. The adaptive threshold will never go above this value. | Larger allows the adaptive threshold to become stricter in loud noise; smaller protects quieter speech from being rejected. |
smooth_alpha |
0.2 |
0.01-1.0 |
Exponential smoothing factor for per-window RMS values. | Larger follows energy changes faster but may flicker; smaller is steadier but can lag at speech boundaries. |
min_voice_windows |
5 |
1-20 windows |
Minimum consecutive voice windows required to keep a speech region. | Larger removes more short bursts but can drop very short words; smaller keeps brief sounds but may pass clicks. |
attenuation |
0.0 |
0.0-1.0 gain |
Gain applied to rejected audio. 0.0 fully mutes it. |
Larger keeps more background ambience; smaller makes rejected regions quieter. |
noise_floor_percentile |
20.0 |
1.0-50.0 |
Percentile of smoothed RMS values used as the adaptive noise floor estimate. | Larger estimates a higher noise floor and becomes stricter; smaller estimates quieter background and opens more easily. |
max_silence_gap_windows |
8 |
0-30 windows |
Maximum silent gap to fill between two voice regions. | Larger preserves pauses inside speech but may keep noise between phrases; smaller cuts internal pauses more aggressively. |
fade_ms |
5 |
0-50 ms |
Fade length when switching between kept and rejected audio. | Larger makes transitions smoother but may smear boundaries; smaller is tighter but can click or sound abrupt. |
stream_context_ms |
3000 |
0-10000 ms |
Context duration used by streaming integrations to preserve recent audio history. Stateless array APIs keep it for config parity. | Larger gives streaming code more history but uses more memory; smaller is lighter but has less context. |
pad_voice_windows |
2 |
0-20 windows |
Windows added before and after detected voice regions. | Larger protects speech starts/ends but keeps more surrounding noise; smaller trims tighter but can clip onsets or offsets. |
pass_windows |
0 |
0-20 windows |
Non-voice windows kept after a voice region as a trailing hold. | Larger makes streaming output less abrupt; smaller removes trailing silence sooner. |
Verify
For performance-sensitive checks, build the native extension in release mode before running tests:
env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release
uv run pytest tests -q
The test suite includes a NumPy reference implementation and verifies that the Rust binding returns identical masks/gated output/compacted output while running faster than the NumPy reference on the benchmark audio.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechgate_rs-0.1.1.tar.gz.
File metadata
- Download URL: speechgate_rs-0.1.1.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9ac873fb6faf2ec11c5c3dfc62c38b32faddd96870e9f43f3f9759188da95e3
|
|
| MD5 |
41206c692bc18ca3827073ce7e416d14
|
|
| BLAKE2b-256 |
b35a9248841abf29fdf9798a02ebe16662e2ae87186ae756b1f6973a55c87b91
|
File details
Details for the file speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: speechgate_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 294.7 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcf7ec151e1298cf871b0fe9408eb470c9ea10e9fdd0d75b09451b54388552e7
|
|
| MD5 |
b2318beb4cba003f4a66a9b665c74ef9
|
|
| BLAKE2b-256 |
bf711f45dff8c8e6c8dc23cc040bee1aba83f9bffeb2d4c99f3cfc7714cbd73b
|