CPU-only ONNX inference package for DPDFNet speech enhancement.
Project description
dpdfnet
CPU-only ONNX inference package for DPDFNet speech enhancement.
Installation
pip install dpdfnet
Requirements
- Python
>=3.11 - OS support for
soundfile/libsndfile
Runtime dependencies are installed automatically:
numpylibrosasoundfileonnxruntimefilelocktqdm
Supported Audio Formats
The following input formats are supported out of the box (via soundfile/libsndfile):
| Format | Extensions |
|---|---|
| WAV | .wav |
| FLAC | .flac |
| Ogg Vorbis | .ogg |
| AIFF | .aiff, .aif |
| AU/SND | .au, .snd |
MP3 and other compressed formats require the optional pydub dependency and
ffmpeg on your PATH:
pip install 'dpdfnet[mp3]'
# also install ffmpeg, e.g.:
# Ubuntu/Debian: sudo apt install ffmpeg
# macOS: brew install ffmpeg
# Windows: https://ffmpeg.org/download.html
Once installed, these additional formats are supported:
| Format | Extensions |
|---|---|
| MP3 | .mp3 |
| AAC / M4A | .aac, .m4a |
| WMA | .wma |
| Opus | .opus |
Output is always written as PCM16 .wav regardless of the input format.
CLI
Show help:
dpdfnet --help
Commands:
dpdfnet models
- List supported models and local availability.
dpdfnet enhance <input> <output.wav> [--model <name>] [--attn-limit-db DB] [-v|--verbose]
- Enhance one audio file (any supported format; output is always
.wav).
dpdfnet enhance-dir <input_dir> <output_dir> [--model <name>] [--workers N] [--attn-limit-db DB] [-v|--verbose]
- Enhance all supported audio files in a directory (non-recursive).
- Files are processed concurrently;
--workerssets the thread count (default: CPU count).
dpdfnet download [model] [--force|--refresh] [-q|--quiet | -v|--verbose]
- Download all models when
modelis omitted, or one model when provided.
CLI examples:
# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4 --attn-limit-db 12
# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --attn-limit-db 12
# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4 --attn-limit-db 12
# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force
Python API
Top-level exports:
dpdfnet.enhancedpdfnet.enhance_filedpdfnet.available_modelsdpdfnet.download
In-memory enhancement:
import soundfile as sf
import dpdfnet
audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4", attn_limit_db=12)
sf.write("enhanced.wav", enhanced, sr)
Enhance one file:
import dpdfnet
out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2", attn_limit_db=12)
print(out_path)
Model listing:
import dpdfnet
for row in dpdfnet.available_models():
print(row["name"], row["ready"], row["cached"])
Download models via API:
import dpdfnet
dpdfnet.download()
dpdfnet.download("dpdfnet4")
Real-time Microphone Enhancement
Install sounddevice (not included in dpdfnet dependencies):
pip install sounddevice
StreamEnhancer processes audio chunk-by-chunk, preserving RNN state across
calls. Any chunk size works; enhanced samples are returned as soon as enough
data has accumulated for the first model frame (20 ms).
import numpy as np
import sounddevice as sd
import dpdfnet
INPUT_SR = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010) # 480 samples at 48 kHz
enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")
def callback(indata, outdata, frames, time, status):
mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
n = min(len(enhanced), frames)
outdata[:n, 0] = enhanced[:n]
if n < frames:
outdata[n:] = 0.0 # silence while the first window accumulates
with sd.Stream(
samplerate=INPUT_SR,
blocksize=BLOCK_SIZE,
channels=1,
dtype="float32",
callback=callback,
):
print("Enhancing microphone input - press Ctrl+C to stop")
try:
while True:
sd.sleep(100)
except KeyboardInterrupt:
pass
# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()
Notes:
Latency - the first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay. Sample rate -
StreamEnhancerresamples internally. Pass your device's native rate assample_rate; the return value is at the same rate. Block size - usingBLOCK_SIZE = int(SR * 0.010)(one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills. Multiple streams - create a separateStreamEnhancerper stream. Callenhancer.reset()between independent audio segments to clear RNN state.
Links
- Homepage: https://github.com/ceva-ip/DPDFNet
- Issues: https://github.com/ceva-ip/DPDFNet/issues
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dpdfnet-0.5.1.tar.gz.
File metadata
- Download URL: dpdfnet-0.5.1.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b789d4c490274c2304b6c9de5a365c8eaac75dfd8fdf90eb9f1e7bd26a8d9b01
|
|
| MD5 |
715429513e7cee8de6692eb39798aebe
|
|
| BLAKE2b-256 |
01742d01ae24c652b05d040e8971eb4187098c28c637a04e2bc8faf4a0f95eb6
|
File details
Details for the file dpdfnet-0.5.1-py3-none-any.whl.
File metadata
- Download URL: dpdfnet-0.5.1-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61155d02f0f6a7e26e8a4f38c55cd271a1bcdcae328917fbdeaa772762f68806
|
|
| MD5 |
3a75b65ae7ad8c37775a577f3d37bfec
|
|
| BLAKE2b-256 |
0728dfdeff44de10b89594836bf8c4d7b0664a6219b3d2a6efc647053d3abff7
|