Skip to main content

logWMSE is an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source separation systems.

Project description

torch-log-wmse-audio-quality

This repository contains the torch implementation of an audio quality metric, logWMSE, originally suggested by Iver Jordal of Nomono. In addition to the original metric, this implementation can also be used as a loss function for training audio models.

logWMSE is a custom metric and loss function for audio signals that calculates the logarithm (log) of a frequency-weighted (W) Mean Squared Error (MSE). It is designed to address several shortcomings of common audio metrics, most importantly the lack of support for digital silence targets.

Installation

To use this repository, clone it and install the required dependencies:

git clone https://github.com/your-username/torch-log-wmse-audio-quality.git
cd torch-log-wmse-audio-quality
pip install -r requirements.txt

Usage Example

import torch
from torch_log_wmse_audio_quality import LogWMSE

# Tensor shapes
audio_length = 1.0
sample_rate = 44100
audio_channels = 2 # stereo
audio_stems = 3 # 3 audio stems
batch = 4 # batch size

# Instantiate logWMSELoss
log_wmse_loss = LogWMSE(audio_length=audio_length, sample_rate=sample_rate, return_as_loss=True)

# Generate random inputs (scale between -1 and 1)
audio_lengths_samples = int(audio_length * sample_rate)
unprocessed_audio = 2 * torch.rand(batch, audio_channels, audio_lengths_samples) - 1
processed_audio = unprocessed_audio.unsqueeze(2).expand(-1, -1, audio_stems, -1) * 0.1
target_audio = torch.zeros(batch, audio_channels, audio_stems, audio_lengths_samples)

log_wmse = log_wmse_loss(unprocessed_audio, processed_audio, target_audio)
print(log_wmse)  # Expected output: approx. -18.42

Motivation

  • Supports digital silence targets not supported by other audio metrics. i.e. (SI-)SDR, SIR, SAR, ISR, VISQOL_audio, STOI, CDPAM, and VISQOL.
  • Overcomes the small value range issue of MSE (i.e. between 1e-8 and 1e-3), making number formatting and sight-reading easier. Scaled similar to SI-SDR.
  • Scale-invariant, aligns with the frequency sensitivity of human hearing.
  • Invariant to the tiny errors of MSE that are inaudible to humans.
  • Logarithmic, reflecting the logarithmic sensitivity of human hearing.
  • Tailored specifically for audio signals.

Contributing

Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or new features to suggest.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgements

Thanks to Whitebalance for backing this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_log_wmse_audio_quality-0.1.0.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torch_log_wmse_audio_quality-0.1.0-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file torch_log_wmse_audio_quality-0.1.0.tar.gz.

File metadata

File hashes

Hashes for torch_log_wmse_audio_quality-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2555200ce0b4cfac21f3e46a8078234b257be16f527ff89275845e2bc90bc14
MD5 1d6f23ad040d1571c03193d915e96c70
BLAKE2b-256 3c324123235f7534657084bc966f29a290f128c408be70aef7452490b8f1452b

See more details on using hashes here.

File details

Details for the file torch_log_wmse_audio_quality-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for torch_log_wmse_audio_quality-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a14851193c1a9cd25be7bcac4c77e398d83af2cd818524688f0565143b6ae008
MD5 9e178c95d4787610a20824de640f4db8
BLAKE2b-256 612b5153a040c1359c6a3eedd85cf0742341f1639aa982daead10b0263548c1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page