Skip to main content

Measure one or more aspects of one or more audio files.

Project description

analyzeAudio

Measure one or more aspects of one or more audio files.

Note well: FFmpeg & FFprobe binaries must be in PATH

Some options to download FFmpeg and FFprobe at ffmpeg.org.

Install FFmpeg on Google Colab

from analyzeAudio.ffmpeg import verifyFFmpegColab
verifyFFmpegColab()

pip install analyzeAudio uv add analyzeAudio

What is in the package

analyzeAudio provides two user-facing kinds of audio analysis.

  • Audio aspects measure one audio file.
  • Audio contests compare two audio files, waveforms, tensors, or spectrograms.

The main user workflows are:

What you want Use
One value for each selected measurement on one file analyzeAudioFile
The same selected measurements for many files analyzeAudioListPathFilenames
A TSV, CSV, or other delimited output file dataTabularTOpathFilenameDelimited
One specific measurement or detailed frame data Import a direct analyzer function
One comparison score between two files Import a filename contest function
One comparison score between two tensors or spectrograms Import a tensor or spectrogram contest function

One-file measurements

Use these names with analyzeAudioFile or analyzeAudioListPathFilenames. Names are case-sensitive.

Loudness and true peak:

Name What it measures
LUFS integrated Whole-file integrated loudness.
LUFS momentary maximum Maximum momentary loudness.
LUFS short-term maximum Maximum short-term loudness.
LUFS loudness range Loudness range.
LUFS low Low loudness range boundary.
LUFS high High loudness range boundary.
true_peak maximum Maximum true peak level.

Signal level, dynamics, and samples:

Name What it measures
RMS_level overall Overall RMS level.
RMS_peak overall Overall RMS peak.
RMS_trough overall Overall RMS trough.
RMS_difference overall Overall RMS difference between adjacent samples.
Peak_level overall Overall peak level.
Peak_count total Total detected peak count.
Abs_Peak_count total Total absolute peak count.
Crest_factor mean Mean crest factor.
Dynamic_range overall Overall dynamic range.
DC_offset mean Mean DC offset.
Bit_depth mean Mean detected bit depth.
Entropy mean Mean signal entropy.
Flat_factor mean Mean flat factor.
Max_difference overall Maximum sample difference.
Max_level overall Maximum sample level.
Mean_difference mean Mean adjacent-sample difference.
Min_difference overall Minimum sample difference.
Min_level overall Minimum sample level.
Noise_floor overall Overall noise floor.
Noise_floor_count total Total noise-floor count.
Number_of_samples total Total samples.
Zero_crossings total Total zero crossings.
Zero_crossings_rate overall Overall zero-crossing rate.

Spectral measurements:

Name What it measures
Power spectral density mean Mean power spectral density.
Spectral centroid mean Mean spectral center of mass from filename analysis.
Spectral crest mean Mean spectral crest.
Spectral decrease mean Mean spectral decrease.
Spectral entropy mean Mean spectral entropy.
Spectral flatness mean Mean spectral flatness from filename analysis.
Spectral flux mean Mean spectral flux.
Spectral kurtosis mean Mean spectral kurtosis.
Spectral rolloff mean Mean spectral rolloff.
Spectral skewness mean Mean spectral skewness.
Spectral slope mean Mean spectral slope.
Spectral spread mean Mean spectral spread.
Spectral variance mean Mean spectral variance.
Spectral Bandwidth mean Mean librosa spectral bandwidth.
Spectral Centroid mean Mean librosa spectral centroid.
Spectral Contrast mean Mean librosa spectral contrast.
Spectral Flatness mean Mean librosa spectral flatness ratio.
Spectral Flatness dB mean Mean librosa spectral flatness in decibels.
Chromagram mean Mean chroma energy across pitch classes.

Waveform, rhythm, and speech measurements:

Name What it measures
RMS Waveform mean Mean waveform RMS amplitude.
RMS Waveform dB mean Mean waveform RMS level in decibels.
Tempogram mean Mean tempogram value.
Tempo mean Mean estimated tempo.
Zero Crossing Rate mean Mean waveform zero-crossing rate.
SRMR mean Mean speech-to-reverberation modulation energy ratio.

Some names intentionally differ only by capitalization or wording. For example, Spectral flatness mean and Spectral Flatness mean are different measurements. Use the exact name from the table.

Measure one file

from analyzeAudio import analyzeAudioFile

listAspectNames = [
    "LUFS integrated",
    "true_peak maximum",
    "RMS_level overall",
    "Spectral Flatness mean",
    "Zero Crossing Rate mean",
]

listValues = analyzeAudioFile("voice.wav", listAspectNames)
measurements = dict(zip(listAspectNames, listValues, strict=True))
print(measurements)

analyzeAudioFile returns one value for each requested name, in the same order. If a requested name is unavailable, that value is "not found".

Measure many files

from pathlib import Path
from analyzeAudio import analyzeAudioListPathFilenames

listPathFilenames = tuple(Path("audio").glob("*.wav"))
listAspectNames = [
    "LUFS integrated",
    "LUFS loudness range",
    "true_peak maximum",
]

rows = analyzeAudioListPathFilenames(
    listPathFilenames,
    listAspectNames,
    CPUlimit=4,
)

for row in rows:
    print(row)

Each row starts with the analyzed filename, followed by the requested values. Rows are returned as files finish, so row order can differ from input order.

Save measurements

from analyzeAudio import (
    analyzeAudioListPathFilenames,
    dataTabularTOpathFilenameDelimited,
)

listAspectNames = ["LUFS integrated", "true_peak maximum"]
rows = analyzeAudioListPathFilenames(["one.wav", "two.wav"], listAspectNames)

dataTabularTOpathFilenameDelimited(
    "measurements.tsv",
    rows,
    ["pathFilename", *listAspectNames],
)

For CSV output, use a comma delimiter:

dataTabularTOpathFilenameDelimited(
    "measurements.csv",
    rows,
    ["pathFilename", *listAspectNames],
    delimiterOutput=",",
)

Get detailed arrays

Summary names usually return one number. Direct analyzer functions without summary words usually return the per-frame, per-channel, or per-band values.

from analyzeAudio.analyzersUseFilename import (
    analyzeLUFSIntegratedOverall,
    analyzeLUFSMomentary,
)

integrated = analyzeLUFSIntegratedOverall("voice.wav")
momentaryFrames = analyzeLUFSMomentary("voice.wav")

Use audio already loaded in Python

Waveform analyzers accept waveform samples shaped as channels by samples.

import numpy
import soundfile
from analyzeAudio.analyzersUseWaveform import (
    analyzeRMSWaveformMean,
    analyzeTempoMean,
    analyzeZeroCrossingRateMean,
)

with soundfile.SoundFile("voice.wav") as audioFile:
    sampleRate = audioFile.samplerate
    waveform = audioFile.read(dtype="float32", always_2d=True).astype(numpy.float32).T

rms = analyzeRMSWaveformMean(waveform)
tempo = analyzeTempoMean(waveform, sampleRate)
zeroCrossingRate = analyzeZeroCrossingRateMean(waveform)

Spectrogram analyzers accept magnitude or power spectrograms.

import librosa
import numpy
from analyzeAudio.analyzersUseSpectrogram import (
    analyzeChromagramMean,
    analyzeSpectralCentroidMean,
)

spectrogram = librosa.stft(waveform)
spectrogramMagnitude = numpy.absolute(spectrogram)
spectrogramPower = spectrogramMagnitude**2

spectralCentroid = analyzeSpectralCentroidMean(spectrogramMagnitude)
chromagram = analyzeChromagramMean(spectrogramPower, sampleRate)

Two-input comparisons

Filename contests compare two audio files:

Function What it compares
analyzePSNRmean Mean peak signal-to-noise ratio.
analyzeSDRmean Mean signal-to-distortion ratio.
analyzeSI_SDRmean Mean scale-invariant signal-to-distortion ratio.
analyzeKPSNRmean Bounded score from PSNR.
analyzeKSDRmean Bounded score from SDR.
analyzeKSI_SDRmean Bounded score from SI-SDR.
from analyzeAudio.analyzersUseFilename import (
    analyzePSNRmean,
    analyzeSDRmean,
    analyzeSI_SDRmean,
)

pathReference = "reference.wav"
pathEstimate = "estimate.wav"

psnr = analyzePSNRmean(pathReference, pathEstimate)
sdr = analyzeSDRmean(pathReference, pathEstimate)
si_sdr = analyzeSI_SDRmean(pathReference, pathEstimate)

Tensor waveform contests usually compare two PyTorch waveform tensors:

Function What it compares
analyzeL1SNRMean Mean L1 signal-to-noise ratio.
analyzeL1SNRDBMean Mean L1 signal-to-noise ratio in decibels.
analyzeMultiL1SNRDBMean Multi-source L1 SNR in decibels.
analyzeSTFTL1SNRDBMean STFT-domain L1 SNR in decibels.
analyzeLogWMSEMean Mean log weighted MSE audio-quality score for reference, estimate, and mixture tensors.
analyzeDCLoss DC loss.
analyzeESRLoss Error-to-signal ratio loss.
analyzeLogCoshLoss Log-cosh loss.
analyzeSNRLoss Signal-to-noise ratio loss.
analyzeSISDRLoss Scale-invariant SDR loss.
analyzeSDSDRLoss Scale-dependent SDR loss.
analyzeSTFTLoss STFT loss.
analyzeMelSTFTLoss Mel-STFT loss.
analyzeChromaSTFTLoss Chroma-STFT loss.
analyzeMultiResolutionSTFTLoss Multi-resolution STFT loss.
analyzeRandomResolutionSTFTLoss Random-resolution STFT loss.
analyzeSumAndDifferenceSTFTLoss Sum-and-difference STFT loss.
from analyzeAudio.contestsTensor import (
    analyzeL1SNRDBMean,
    analyzeMultiResolutionSTFTLoss,
)

l1snrdb = analyzeL1SNRDBMean(tensorReference, tensorEstimate)
mrstft = analyzeMultiResolutionSTFTLoss(tensorReference, tensorEstimate)

analyzeLogWMSEMean also needs the original mixture and sample rate:

from analyzeAudio.contestsTensor import analyzeLogWMSEMean

logwmse = analyzeLogWMSEMean(
    tensorReference,
    tensorEstimate,
    tensorMixture,
    sampleRate,
)

Tensor spectrogram contests compare two PyTorch magnitude spectrogram tensors:

Function What it compares
analyzeSpectralConvergenceLoss Spectral convergence loss.
analyzeSTFTMagnitudeLoss STFT magnitude loss.
analyzeL1FrequencyLoss L1 frequency score.
from analyzeAudio.contestsTensorSpectrogram import (
    analyzeSpectralConvergenceLoss,
    analyzeSTFTMagnitudeLoss,
)

spectralConvergence = analyzeSpectralConvergenceLoss(
    tensorSpectrogramMagnitudeReference,
    tensorSpectrogramMagnitudeEstimate,
)
stftMagnitude = analyzeSTFTMagnitudeLoss(
    tensorSpectrogramMagnitudeReference,
    tensorSpectrogramMagnitudeEstimate,
)

NumPy spectrogram helpers compare two magnitude spectrograms:

Function What it returns
analyzeBleedFullMelDB Arrays of added and missing mel-scaled dB content.
analyzeBleedFullMelDBMean Two scores: bleed and full.
from analyzeAudio.contestsSpectrogram import analyzeBleedFullMelDBMean

bleedFull = analyzeBleedFullMelDBMean(
    spectrogramMagnitudeReference,
    spectrogramMagnitudeEstimate,
)
print(bleedFull.bleed, bleedFull.full)

Exact-name checks

The tables above describe what is in the package. These helpers are available when you want a copyable list from the installed version:

from analyzeAudio import getListAvailableAudioAspects, getListAvailableAudioContests

print(getListAvailableAudioAspects())
print(getListAvailableAudioContests())

The terminal commands are:

whatAspects
whatContests

API standardization

A top priority for this package is a public API that is as standardized as possible across filename, waveform, spectrogram, tensor, and contest analyzers. The package wraps libraries with very different calling conventions, but analyzer function signatures should model this package's dispatcher inputs, not every underlying library option.

Wishlist

  • Overhaul the semiotic system.
  • Install FFmpeg in GitHub Actions for testing.
  • Improve speed
    • Sophisticated caching of large objects and un-hashable objects.

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Perceptual Effects of Spectral Modifications on Musical Timbres

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

A Robust Audio Classification and Segmentation Method

Music Type Classification by Spectral Contrast Feature

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Zero-Crossing Rate

Performance Measurement in Blind Audio Source Separation

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Signal Processing for Music Analysis

The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals

Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata

SDR - Half-Baked or Well Done?

Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation

Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation

Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level

An Overview on Sound Features in Time and Frequency Domain

Perceptual Loss Function for Neural Modelling of Audio Systems

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

auraloss: Audio focused loss functions in PyTorch

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Neural source-filter waveform models for statistical parametric speech synthesis

DDSP: Differentiable Digital Signal Processing

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Packages and documentation

My recovery

Static Badge YouTube Channel Subscribers

CC-BY-NC-4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyzeaudio-0.4.0.tar.gz (97.3 kB view details)

Uploaded Source

File details

Details for the file analyzeaudio-0.4.0.tar.gz.

File metadata

  • Download URL: analyzeaudio-0.4.0.tar.gz
  • Upload date:
  • Size: 97.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for analyzeaudio-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dc62004b9540a401bf786fa7ce8c33503f2d297eab99ed65a8705e581bb50eb8
MD5 ae4ca813cf46c230c38243c7373348f4
BLAKE2b-256 aeb7f390c6b3a2fa0a0e48a3c75cdd359448aa28837d4c27adca039ecfaed29f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page