Skip to main content

DTLN noise suppression plugin for LiveKit Agents — self-hosted, in-process, no cloud API

Project description

livekit-plugins-dtln

Python LiveKit plugin for DTLN (Dual-Signal Transformation LSTM Network) noise suppression — a fully self-hosted, open-source alternative to cloud-based noise cancellation services like Krisp or AI-coustics.

Runs entirely in-process using ONNX Runtime. No cloud API, no per-minute fees, no proprietary binaries. Works with self-hosted LiveKit servers.

Based on Westhausen & Meyer, "Noise Reduction with DTLN", Interspeech 2020 Original implementation: github.com/breizhn/DTLN

Live audio comparison demo →


Why DTLN?

DTLN (this plugin) Krisp / AI-coustics
Hosting Self-hosted, in-process Cloud API required
Cost Free (open weights) Per-minute billing
LiveKit Works with self-hosted Requires LiveKit Cloud
Latency ~8 ms (one block shift) Network round-trip
Privacy Audio never leaves your server Audio sent to third party
Real-time factor ~0.05× Varies

Installation

pip:

pip install livekit-plugins-dtln

requirements.txt:

livekit-plugins-dtln

From source:

git clone https://github.com/aloware/livekit-plugins-dtln.git
pip install -e ./livekit-plugins-dtln

The pretrained ONNX model weights (~4 MB) are bundled in the PyPI wheel — no separate download step needed.


Usage

Session pipeline (recommended)

from livekit.agents import room_io
from livekit.plugins import dtln

await session.start(
    # ...,
    room_options=room_io.RoomOptions(
        audio_input=room_io.AudioInputOptions(
            noise_cancellation=dtln.noise_suppression(),
        ),
    ),
)

Custom AudioStream

from livekit import rtc
from livekit.plugins import dtln

stream = rtc.AudioStream.from_track(
    track=track,
    noise_cancellation=dtln.noise_suppression(),
)

Note: Create one dtln.noise_suppression() instance per session. Each instance holds stateful LSTM hidden states that must be scoped to a single call.

Note: DTLN is trained on raw microphone audio. Do not chain it with another noise cancellation model — applying two models in series produces unexpected results.

Tuning suppression strength

dtln.noise_suppression(
    strength=0.5,  # 0.0 = bypass, 1.0 = full suppression (default: 0.5)
)

strength is a wet/dry blend factor. At 0.5, the output is an equal mix of the denoised signal and the original. Lower values preserve more of the original audio — useful if the model is over-suppressing speech (e.g. on telephone/SIP audio). Higher values apply more aggressive noise reduction.

Debug logging

dtln.noise_suppression(debug_logging=True)

Logs per-block diagnostics (spectral mask mean/min/max, input and output RMS) at DEBUG level every 100 blocks (~800 ms). Useful for diagnosing over-suppression: if mask_mean is consistently below 0.3, the model is treating speech as noise — lower strength.

Custom model paths

dtln.noise_suppression(
    model_1_path="/path/to/model_1.onnx",
    model_2_path="/path/to/model_2.onnx",
)

Requirements

  • Python >= 3.10
  • livekit >= 1.0.25
  • livekit-agents >= 1.4.4
  • onnxruntime >= 1.17.0
  • numpy >= 1.26.0

How It Works

DTLN uses two sequential LSTM-based models:

  1. Model 1 — Spectral masking: Computes the magnitude spectrum of a 32 ms window, runs it through an LSTM to produce a spectral mask, applies the mask in the frequency domain (preserving phase), and reconstructs the time-domain signal via IFFT.

  2. Model 2 — Time-domain refinement: Refines the output of Model 1 with a second LSTM that operates directly on the waveform, capturing residual artifacts that spectral processing misses.

The two models are chained: Model 1's output feeds Model 2. Both LSTMs are stateful — their hidden states persist across audio frames, giving the network temporal context across the full duration of a call.

Signal flow:

Input frame (any sample rate, any channels)
  → downsample to 16 kHz mono
  → overlap-add loop (512-sample window, 128-sample shift)
      → FFT → magnitude → Model 1 (spectral mask) → masked IFFT
      → Model 2 (time-domain refinement)
  → upsample back to original sample rate
  → restore original channel count
→ Denoised output frame

The overlap-add synthesis uses 75% overlap (512-sample window, 128-sample shift), identical to the original DTLN paper. This gives ~8 ms of algorithmic latency at 16 kHz.


Performance

Benchmarked on Apple M3 Pro, processing 16 kHz mono audio:

Metric Value
Steady-state latency per block ~0.7 ms
Real-time factor ~0.05×
Cold-start (first inference) ~500 ms (amortized by warmup in __init__)

The __init__ method runs a dummy forward pass to trigger ONNX Runtime's JIT compilation before the first real audio frame arrives, eliminating the cold-start stall.

Noise reduction on sample audio

Tested by running original audio files through DTLNNoiseSuppressor and measuring RMS reduction:

File Noise Level RMS Reduction Notes
krisp-original.mp3 Moderate noise 37.1% Active suppression
taxi-sample.mp3 Heavy background noise 48.6% Strong suppression
noproblem_raw.wav Clean speech 34.1% Correctly preserves speech

Run python tests/test_noise_suppression.py to reproduce.


Models

Pretrained weights are the official DTLN models published by the original authors:

File Source
model_1.onnx breizhn/DTLN · pretrained_model/
model_2.onnx breizhn/DTLN · pretrained_model/

The models are not bundled in this repository (to keep it lightweight). They are downloaded automatically by python agent.py download-files or by calling download_models() directly.


References


License

The plugin code in this repository is released under the MIT License.

The pretrained DTLN model weights are published by the original authors under the MIT License — see breizhn/DTLN.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_dtln-0.1.5.tar.gz (3.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_dtln-0.1.5-py3-none-any.whl (3.7 MB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_dtln-0.1.5.tar.gz.

File metadata

  • Download URL: livekit_plugins_dtln-0.1.5.tar.gz
  • Upload date:
  • Size: 3.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for livekit_plugins_dtln-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a1cb21d4c79e0b7eed81146b4eb8ad6ca0cc4c0fca8230bc6bde747b6aeb1581
MD5 42f73bb551810f1da6adc39280f99553
BLAKE2b-256 09d7b37392efd3a10599ddcabd66f32e8fb90d6cebf0a43af0ab497937a89c20

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_dtln-0.1.5.tar.gz:

Publisher: publish.yml on aloware/livekit-plugins-dtln

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_dtln-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_dtln-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 95cf0eac0ba0e828435228b5a7b191bf25c84e876925ff4a9adbe88555ac9344
MD5 07353780f62281c2f78a0f8ab44d9244
BLAKE2b-256 09200a8bd3ae6a69bab70667508695e275ffb5ee9dcdf23d88d699c0ed2a1f16

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_dtln-0.1.5-py3-none-any.whl:

Publisher: publish.yml on aloware/livekit-plugins-dtln

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page