CPU-only ONNX inference package for DPDFNet speech enhancement.

These details have not been verified by PyPI

Project links

Project description

dpdfnet

CPU-only ONNX inference package for DPDFNet speech enhancement.

Installation

pip install dpdfnet

Requirements

Python >=3.11
OS support for soundfile / libsndfile

Runtime dependencies are installed automatically:

numpy
librosa
soundfile
onnxruntime
filelock
tqdm

Supported Audio Formats

The following input formats are supported out of the box (via soundfile/libsndfile):

Format	Extensions
WAV	`.wav`
FLAC	`.flac`
Ogg Vorbis	`.ogg`
AIFF	`.aiff`, `.aif`
AU/SND	`.au`, `.snd`

MP3 and other compressed formats require the optional pydub dependency and ffmpeg on your PATH:

pip install 'dpdfnet[mp3]'
# also install ffmpeg, e.g.:
#   Ubuntu/Debian:  sudo apt install ffmpeg
#   macOS:          brew install ffmpeg
#   Windows:        https://ffmpeg.org/download.html

Once installed, these additional formats are supported:

Format	Extensions
MP3	`.mp3`
AAC / M4A	`.aac`, `.m4a`
WMA	`.wma`
Opus	`.opus`

Output is always written as PCM16 .wav regardless of the input format.

CLI

Show help:

dpdfnet --help

Commands:

dpdfnet models

List supported models and local availability.

dpdfnet enhance <input> <output.wav> [--model <name>] [-v|--verbose]

Enhance one audio file (any supported format; output is always .wav).

dpdfnet enhance-dir <input_dir> <output_dir> [--model <name>] [--workers N] [-v|--verbose]

Enhance all supported audio files in a directory (non-recursive).
Files are processed concurrently; --workers sets the thread count (default: CPU count).

dpdfnet download [model] [--force|--refresh] [-q|--quiet | -v|--verbose]

Download all models when model is omitted, or one model when provided.

CLI examples:

# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4

# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2

# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --workers 4

# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force

Python API

Top-level exports:

dpdfnet.enhance
dpdfnet.enhance_file
dpdfnet.available_models
dpdfnet.download

In-memory enhancement:

import soundfile as sf
import dpdfnet

audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)

Enhance one file:

import dpdfnet

out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)

Model listing:

import dpdfnet

for row in dpdfnet.available_models():
    print(row["name"], row["ready"], row["cached"])

Download models via API:

import dpdfnet

dpdfnet.download()
dpdfnet.download("dpdfnet4")

Real-time Microphone Enhancement

Install sounddevice (not included in dpdfnet dependencies):

pip install sounddevice

StreamEnhancer processes audio chunk-by-chunk, preserving RNN state across calls. Any chunk size works; enhanced samples are returned as soon as enough data has accumulated for the first model frame (20 ms).

import numpy as np
import sounddevice as sd
import dpdfnet

INPUT_SR   = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010)   # 480 samples at 48 kHz

enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")

def callback(indata, outdata, frames, time, status):
    mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
    enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
    n = min(len(enhanced), frames)
    outdata[:n, 0] = enhanced[:n]
    if n < frames:
        outdata[n:] = 0.0   # silence while the first window accumulates

with sd.Stream(
    samplerate=INPUT_SR,
    blocksize=BLOCK_SIZE,
    channels=1,
    dtype="float32",
    callback=callback,
):
    print("Enhancing microphone input - press Ctrl+C to stop")
    try:
        while True:
            sd.sleep(100)
    except KeyboardInterrupt:
        pass

# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()

Notes:

Latency - the first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay. Sample rate - StreamEnhancer resamples internally. Pass your device's native rate as sample_rate; the return value is at the same rate. Block size - using BLOCK_SIZE = int(SR * 0.010) (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills. Multiple streams - create a separate StreamEnhancer per stream. Call enhancer.reset() between independent audio segments to clear RNN state.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.1

Apr 5, 2026

0.5.0

Mar 26, 2026

This version

0.4.0

Mar 20, 2026

0.3.0

Mar 8, 2026

0.2.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpdfnet-0.4.0.tar.gz (29.7 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dpdfnet-0.4.0-py3-none-any.whl (23.5 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file dpdfnet-0.4.0.tar.gz.

File metadata

Download URL: dpdfnet-0.4.0.tar.gz
Upload date: Mar 20, 2026
Size: 29.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dpdfnet-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`fa4305919951d5994f09dbadf52521e476ea8c521c88255cff49b5c0bc03fd4e`
MD5	`389bf9c780690fabfc7d46ac7b2d408f`
BLAKE2b-256	`9736cb9ab0d3002c8a3d4ab2ef7a79186c6801df37ef14b8448dc2331def8fbb`

See more details on using hashes here.

File details

Details for the file dpdfnet-0.4.0-py3-none-any.whl.

File metadata

Download URL: dpdfnet-0.4.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 23.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dpdfnet-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9504a746a1a339ac32106c47c1c1b6e3746ac1c1f23758f3d5954cabc571ddf`
MD5	`a78a005d0f8f3d61d34dd424af190d8b`
BLAKE2b-256	`fca8aa195d11913c1c1000cf48722fa18146d4e354381327300ca2bb8fb5f2ee`

See more details on using hashes here.

dpdfnet 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dpdfnet

Installation

Requirements

Supported Audio Formats

CLI

Python API

Real-time Microphone Enhancement

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes