Skip to main content

CPU-only ONNX inference package for DPDFNet speech enhancement.

Project description

dpdfnet

CPU-only ONNX inference package for DPDFNet speech enhancement.

Installation

pip install dpdfnet

Requirements

  • Python >=3.11
  • OS support for soundfile / libsndfile

Runtime dependencies are installed automatically:

  • numpy
  • librosa
  • soundfile
  • onnxruntime
  • filelock
  • tqdm

Supported Audio Formats

The following input formats are supported out of the box (via soundfile/libsndfile):

Format Extensions
WAV .wav
FLAC .flac
Ogg Vorbis .ogg
AIFF .aiff, .aif
AU/SND .au, .snd

MP3 and other compressed formats require the optional pydub dependency and ffmpeg on your PATH:

pip install 'dpdfnet[mp3]'
# also install ffmpeg, e.g.:
#   Ubuntu/Debian:  sudo apt install ffmpeg
#   macOS:          brew install ffmpeg
#   Windows:        https://ffmpeg.org/download.html

Once installed, these additional formats are supported:

Format Extensions
MP3 .mp3
AAC / M4A .aac, .m4a
WMA .wma
Opus .opus

Output is always written as PCM16 .wav regardless of the input format.

CLI

Show help:

dpdfnet --help

Commands:

  1. dpdfnet models
  • List supported models and local availability.
  1. dpdfnet enhance <input> <output.wav> [--model <name>] [-v|--verbose]
  • Enhance one audio file (any supported format; output is always .wav).
  1. dpdfnet enhance-dir <input_dir> <output_dir> [--model <name>] [--workers N] [-v|--verbose]
  • Enhance all supported audio files in a directory (non-recursive).
  • Files are processed concurrently; --workers sets the thread count (default: CPU count).
  1. dpdfnet download [model] [--force|--refresh] [-q|--quiet | -v|--verbose]
  • Download all models when model is omitted, or one model when provided.

CLI examples:

# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4

# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2

# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --workers 4

# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force

Python API

Top-level exports:

  • dpdfnet.enhance
  • dpdfnet.enhance_file
  • dpdfnet.available_models
  • dpdfnet.download

In-memory enhancement:

import soundfile as sf
import dpdfnet

audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)

Enhance one file:

import dpdfnet

out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)

Model listing:

import dpdfnet

for row in dpdfnet.available_models():
    print(row["name"], row["ready"], row["cached"])

Download models via API:

import dpdfnet

dpdfnet.download()
dpdfnet.download("dpdfnet4")

Real-time Microphone Enhancement

Install sounddevice (not included in dpdfnet dependencies):

pip install sounddevice

StreamEnhancer processes audio chunk-by-chunk, preserving RNN state across calls. Any chunk size works; enhanced samples are returned as soon as enough data has accumulated for the first model frame (20 ms).

import numpy as np
import sounddevice as sd
import dpdfnet

INPUT_SR   = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010)   # 480 samples at 48 kHz

enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")

def callback(indata, outdata, frames, time, status):
    mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
    enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
    n = min(len(enhanced), frames)
    outdata[:n, 0] = enhanced[:n]
    if n < frames:
        outdata[n:] = 0.0   # silence while the first window accumulates

with sd.Stream(
    samplerate=INPUT_SR,
    blocksize=BLOCK_SIZE,
    channels=1,
    dtype="float32",
    callback=callback,
):
    print("Enhancing microphone input - press Ctrl+C to stop")
    try:
        while True:
            sd.sleep(100)
    except KeyboardInterrupt:
        pass

# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()

Notes:

Latency - the first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay. Sample rate - StreamEnhancer resamples internally. Pass your device's native rate as sample_rate; the return value is at the same rate. Block size - using BLOCK_SIZE = int(SR * 0.010) (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills. Multiple streams - create a separate StreamEnhancer per stream. Call enhancer.reset() between independent audio segments to clear RNN state.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpdfnet-0.4.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpdfnet-0.4.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file dpdfnet-0.4.0.tar.gz.

File metadata

  • Download URL: dpdfnet-0.4.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dpdfnet-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fa4305919951d5994f09dbadf52521e476ea8c521c88255cff49b5c0bc03fd4e
MD5 389bf9c780690fabfc7d46ac7b2d408f
BLAKE2b-256 9736cb9ab0d3002c8a3d4ab2ef7a79186c6801df37ef14b8448dc2331def8fbb

See more details on using hashes here.

File details

Details for the file dpdfnet-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dpdfnet-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dpdfnet-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9504a746a1a339ac32106c47c1c1b6e3746ac1c1f23758f3d5954cabc571ddf
MD5 a78a005d0f8f3d61d34dd424af190d8b
BLAKE2b-256 fca8aa195d11913c1c1000cf48722fa18146d4e354381327300ca2bb8fb5f2ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page