logWMSE is an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source separation systems.
Project description
torch-log-wmse-audio-quality
This repository contains the torch implementation of an audio quality metric, logWMSE, originally proposed by Iver Jordal of Nomono. In addition to the original metric, this implementation can also be used as a loss function for training audio models.
logWMSE is a custom metric and loss function for audio signals that calculates the logarithm (log) of a frequency-weighted (W) Mean Squared Error (MSE). It is designed to address several shortcomings of common audio metrics, most importantly the lack of support for digital silence targets.
Installation
pip install torch-log-wmse-audio-quality
Usage Example
import torch
from torch_log_wmse_audio_quality import LogWMSE
# Tensor shapes
audio_length = 1.0
sample_rate = 44100
audio_channels = 2 # stereo
audio_stems = 3 # 3 audio stems
batch = 4 # batch size
# Instantiate logWMSE
log_wmse = LogWMSE(audio_length=audio_length, sample_rate=sample_rate, return_as_loss=True)
# Generate random inputs (scale between -1 and 1)
audio_lengths_samples = int(audio_length * sample_rate)
unprocessed_audio = 2 * torch.rand(batch, audio_channels, audio_lengths_samples) - 1
processed_audio = unprocessed_audio.unsqueeze(2).expand(-1, -1, audio_stems, -1) * 0.1
target_audio = torch.zeros(batch, audio_channels, audio_stems, audio_lengths_samples)
log_wmse = log_wmse(unprocessed_audio, processed_audio, target_audio)
print(log_wmse) # Expected output: approx. -18.42
Motivation
- Supports digital silence targets not supported by other audio metrics. i.e. (SI-)SDR, SIR, SAR, ISR, VISQOL_audio, STOI, CDPAM, and VISQOL.
- Overcomes the small value range issue of MSE (i.e. between 1e-8 and 1e-3), making number formatting and sight-reading easier. Scaled similar to SI-SDR.
- Scale-invariant, aligns with the frequency sensitivity of human hearing.
- Invariant to the tiny errors of MSE that are inaudible to humans.
- Logarithmic, reflecting the logarithmic sensitivity of human hearing.
- Tailored specifically for audio signals.
Contributing
Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or new features to suggest.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Acknowledgements
Thanks to Whitebalance for backing this project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torch_log_wmse_audio_quality-0.1.1.tar.gz.
File metadata
- Download URL: torch_log_wmse_audio_quality-0.1.1.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75afabb3dbdaee7c2781c0bfbf74bb4a9d9242c3bacf0cde79b5864a37855e8c
|
|
| MD5 |
bcb95e782454626577759c829c9aa8e2
|
|
| BLAKE2b-256 |
2168b8ea169500270143a1254d1ce15369f76a0f69204059e13027b18321cc4b
|
File details
Details for the file torch_log_wmse_audio_quality-0.1.1-py3-none-any.whl.
File metadata
- Download URL: torch_log_wmse_audio_quality-0.1.1-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5950f2b9197b935963fba1c1f9fa68bb79a1611f4847318ac6e9c7408971b3a
|
|
| MD5 |
31550b3ab4b0e13f204a677d99df18e3
|
|
| BLAKE2b-256 |
948ff88b409c3982469d472de134fa91a792f18480417f2be039f0dbd906451c
|