Skip to main content

DTLN noise suppression plugin for LiveKit Agents — self-hosted, in-process, no cloud API

Project description

livekit-plugins-dtln

Python LiveKit plugin for DTLN (Dual-Signal Transformation LSTM Network) noise suppression — a fully self-hosted, open-source alternative to cloud-based noise cancellation services like Krisp or AI-coustics.

Runs entirely in-process using ONNX Runtime. No cloud API, no per-minute fees, no proprietary binaries. Works with self-hosted LiveKit servers.

Based on Westhausen & Meyer, "Noise Reduction with DTLN", Interspeech 2020 Original implementation: github.com/breizhn/DTLN

Live audio comparison demo →


Why DTLN?

DTLN (this plugin) Krisp / AI-coustics
Hosting Self-hosted, in-process Cloud API required
Cost Free (open weights) Per-minute billing
LiveKit Works with self-hosted Requires LiveKit Cloud
Latency ~8 ms (one block shift) Network round-trip
Privacy Audio never leaves your server Audio sent to third party
Real-time factor ~0.05× (20× faster than real-time) Varies

Installation

pip:

pip install livekit-plugins-dtln

requirements.txt:

livekit-plugins-dtln

From source:

git clone https://github.com/aloware/livekit-plugins-dtln.git
pip install -e ./livekit-plugins-dtln

The pretrained ONNX model weights (~4 MB) are bundled in the PyPI wheel — no separate download step needed.


Usage

Session pipeline (recommended)

from livekit.agents import room_io
from livekit.plugins import dtln

await session.start(
    # ...,
    room_options=room_io.RoomOptions(
        audio_input=room_io.AudioInputOptions(
            noise_cancellation=dtln.noise_suppression(),
        ),
    ),
)

Custom AudioStream

from livekit import rtc
from livekit.plugins import dtln

stream = rtc.AudioStream.from_track(
    track=track,
    noise_cancellation=dtln.noise_suppression(),
)

Note: Create one dtln.noise_suppression() instance per session. Each instance holds stateful LSTM hidden states that must be scoped to a single call.

Note: DTLN is trained on raw microphone audio. Do not chain it with another noise cancellation model — applying two models in series produces unexpected results.

Tuning suppression strength

dtln.noise_suppression(
    strength=0.5,  # 0.0 = bypass, 1.0 = full suppression (default: 0.5)
)

strength is a wet/dry blend factor. At 0.5, the output is an equal mix of the denoised signal and the original. Lower values preserve more of the original audio — useful if the model is over-suppressing speech (e.g. on telephone/SIP audio). Higher values apply more aggressive noise reduction.

Debug logging

dtln.noise_suppression(debug_logging=True)

Logs per-block diagnostics (spectral mask mean/min/max, input and output RMS) at DEBUG level every 100 blocks (~800 ms). Useful for diagnosing over-suppression: if mask_mean is consistently below 0.3, the model is treating speech as noise — lower strength.

Custom model paths

dtln.noise_suppression(
    model_1_path="/path/to/model_1.onnx",
    model_2_path="/path/to/model_2.onnx",
)

Requirements

  • Python >= 3.10
  • livekit >= 1.1.0
  • livekit-agents >= 1.4.4
  • onnxruntime >= 1.17.0
  • numpy >= 1.26.0

How It Works

DTLN uses two sequential LSTM-based models:

  1. Model 1 — Spectral masking: Computes the magnitude spectrum of a 32 ms window, runs it through an LSTM to produce a spectral mask, applies the mask in the frequency domain (preserving phase), and reconstructs the time-domain signal via IFFT.

  2. Model 2 — Time-domain refinement: Refines the output of Model 1 with a second LSTM that operates directly on the waveform, capturing residual artifacts that spectral processing misses.

The two models are chained: Model 1's output feeds Model 2. Both LSTMs are stateful — their hidden states persist across audio frames, giving the network temporal context across the full duration of a call.

Signal flow:

Input frame (any sample rate, any channels)
  → downsample to 16 kHz mono
  → overlap-add loop (512-sample window, 128-sample shift)
      → FFT → magnitude → Model 1 (spectral mask) → masked IFFT
      → Model 2 (time-domain refinement)
  → upsample back to original sample rate
  → restore original channel count
→ Denoised output frame

The overlap-add synthesis uses 75% overlap (512-sample window, 128-sample shift), identical to the original DTLN paper. This gives ~8 ms of algorithmic latency at 16 kHz.


Performance

Benchmarked on Apple M3 Pro, processing 16 kHz mono audio:

Metric Value
Steady-state latency per block ~0.7 ms
Real-time factor ~0.05×
Headroom vs real-time ~20×
Cold-start (first inference) ~500 ms (amortized by warmup in __init__)

The __init__ method runs a dummy forward pass to trigger ONNX Runtime's JIT compilation before the first real audio frame arrives, eliminating the cold-start stall.


Models

Pretrained weights are the official DTLN models published by the original authors:

File Source
model_1.onnx breizhn/DTLN · pretrained_model/
model_2.onnx breizhn/DTLN · pretrained_model/

The models are not bundled in this repository (to keep it lightweight). They are downloaded automatically by python agent.py download-files or by calling download_models() directly.


References


License

The plugin code in this repository is released under the MIT License.

The pretrained DTLN model weights are published by the original authors under the MIT License — see breizhn/DTLN.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_dtln-0.1.2.tar.gz (3.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_dtln-0.1.2-py3-none-any.whl (3.7 MB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_dtln-0.1.2.tar.gz.

File metadata

  • Download URL: livekit_plugins_dtln-0.1.2.tar.gz
  • Upload date:
  • Size: 3.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for livekit_plugins_dtln-0.1.2.tar.gz
Algorithm Hash digest
SHA256 897d2a10539fa8ec91fda614df3e3469df91e0107f15064b32f6c8eb0298fd43
MD5 31bd47d6e973c881d56664732b0f9024
BLAKE2b-256 8104164a0b3bd2f72456b53fbed4ee4fcf3a3594e6e36f2fdd0e2bf7ec51fc3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_dtln-0.1.2.tar.gz:

Publisher: publish.yml on aloware/livekit-plugins-dtln

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_dtln-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_dtln-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 364f49a874ec49d2f6faa0a5a71bbf815d00577ae807f3ad7776733587e1d1dc
MD5 5a42c7e18a03e8a05cbf10d7e7adcd7d
BLAKE2b-256 ff20522bb371fde841e886cda2b4c0527fb155f12da5ad1c32d081cf65ebcc49

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_dtln-0.1.2-py3-none-any.whl:

Publisher: publish.yml on aloware/livekit-plugins-dtln

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page