Skip to main content

Measure one or more aspects of one or more audio files.

Project description

analyzeAudio

Measure one or more aspects of one or more audio files.

Note well: FFmpeg & FFprobe binaries must be in PATH

Some options to download FFmpeg and FFprobe at ffmpeg.org.

pip install analyzeAudio uv add analyzeAudio

Some ways to use this package

analyzeAudio works at five practical levels: audio path, waveform array, spectrogram array, waveform tensor, and magnitude-spectrogram tensor. The top-level API is the quickest way to ask for named measurements from audio paths. The lower-level modules are there when you already have decoded audio in memory, want the full array or tensor instead of one summary float, or want to call direct comparison and loss analyzers yourself.

Top-level exports you will probably reach for first

Export Purpose
analyzeAudioFile(pathFilename, listAspectNames) Analyze one audio path and return one result per requested registered aspect name.
analyzeAudioListPathFilenames(listPathFilenames, listAspectNames, CPUlimit=None) Analyze many audio paths in parallel and return one completed row per audio path.
getListAvailableAudioAspects() Return the sorted list of registered aspect names.
audioAspects Registry of aspect name -> analyzer callable + required parameter names.
truncateTensors(listTensors) Trim multiple tensors to the same trailing length before direct comparison.
dataTabularTOpathFilenameDelimited(...) Write batch results to a delimited text file.

The package also re-exports the type aliases Audio, Spectrogram, SpectrogramMagnitude, and SpectrogramPower so downstream code can annotate the same representations the analyzers expect.

Choose the module that matches the representation you already have

If you already have... Reach for... What you get
one or more audio paths top-level API or analyzersUseFilename named single-path measurements, paired-path comparisons, FFprobe/FFmpeg-derived arrays, loudness
waveform numpy.ndarray + sampleRate analyzersUseWaveform tempogram, RMS, tempo, and zero-crossing arrays or their mean summaries
spectrogram magnitude/power numpy.ndarray analyzersUseSpectrogram chroma and spectral-descriptor arrays or their mean summaries
waveform torch.Tensor analyzersUseTensor SRMR, logWMSE, source-separation scores, and waveform/STFT-domain loss analyzers
magnitude spectrogram torch.Tensor analyzersUseTensorSpectrogram spectrogram-magnitude comparison losses such as spectral convergence and STFT-magnitude distance

The registry spans more than single-path measurements. Some registered names are convenient single-audio measurements, some are paired-path comparisons, some expect waveform tensors, and some expect magnitude spectrogram tensors.

Use analyzeAudioFile to measure registered single-audio aspects from one path

from analyzeAudio import analyzeAudioFile

listAspectNames = [
    'LUFS integrated',
    'RMS peak',
    'SRMR mean',
    'Spectral Flatness mean',
]

listMeasurements = analyzeAudioFile(pathFilename, listAspectNames)
dictionaryMeasurements = dict(zip(listAspectNames, listMeasurements, strict=True))

analyzeAudioFile reads one audio path, prepares shared intermediate representations, and lets registered analyzers reuse those representations. Under the hood that means one call can support measurements that need the raw waveform, sampleRate, a torch.Tensor waveform, a complex STFT, spectrogram magnitude, or spectrogram power.

analyzeAudioFile preserves the order of listAspectNames. If a requested aspect name is not registered, the matching return entry is 'not found'.

Registered names are case-sensitive, and some intentionally similar names come from different analysis routes. For example, Spectral Flatness mean and Spectral flatness mean are different registered names, and so are Zero-crossing rate mean and Zero-crossings rate.

Some registered names require inputs that cannot come from a single audio path alone. Examples include SI-SDR mean, LogWMSE, L1SNRDB, and SpectralConvergenceLoss. For those, inspect the registry and call the analyzer directly.

Use analyzeAudioListPathFilenames to batch single-audio measurements across many paths

from analyzeAudio import analyzeAudioListPathFilenames, dataTabularTOpathFilenameDelimited

listAspectNames = ['LUFS integrated', 'Spectral Flatness mean']
rowsListFilenameAspectValues = analyzeAudioListPathFilenames(listPathFilenames, listAspectNames)

dataTabularTOpathFilenameDelimited(
    pathFilenameOutput,
    rowsListFilenameAspectValues,
    ['pathFilename', *listAspectNames],
)

Each returned row starts with the audio path converted to POSIX text, followed by the requested values. The rows are returned in worker-completion order rather than the original input order. Use CPUlimit when you want to cap the worker count explicitly.

Use getListAvailableAudioAspects and audioAspects to inspect the registry or call an analyzer directly

from analyzeAudio import audioAspects, getListAvailableAudioAspects

print(getListAvailableAudioAspects())
print(audioAspects['Chromagram mean']['analyzerParameters'])
print(audioAspects['SI-SDR mean']['analyzerParameters'])
print(audioAspects['LogWMSE']['analyzerParameters'])

SI_SDR_channelsMean = audioAspects['SI-SDR mean']['analyzer'](
    pathFilenameAudioFile,
    pathFilenameDifferentAudioFile,
)

Use audioAspects[name]['analyzerParameters'] first. It tells you whether the registered name expects one audio path, two audio paths, waveform tensors, a reference-estimate-mixture triple, or spectrogram magnitudes.

That is the quickest way to discover whether a name is meant for the high-level single-audio API or for direct invocation.

Use the lower-level modules when you want the actual analyzer instead of one registry float

These are the actual analyzers, organized by the representation they consume.

  • analyzeAudio.analyzersUseFilename
    • paired-path comparison metrics: getPSNRmean, getSDRmean, getSI_SDRmean
    • framewise spectral arrays with matching mean wrappers: analyzeSpectralCentroid, analyzeSpectralCrest, analyzeSpectralDecrease, analyzeSpectralEntropy, analyzeSpectralFlatness, analyzeSpectralFlux, analyzeSpectralKurtosis, analyzeSpectralMean, analyzeSpectralRolloff, analyzeSpectralSkewness, analyzeSpectralSlope, analyzeSpectralSpread, analyzeSpectralVariance
    • file-level FFprobe astats scalars: analyzeZero_crossings, analyzeZero_crossings_rate, analyzeDCoffset, analyzeDynamicRange, analyzeSignalEntropy, analyzeNumber_of_samples, analyzePeak_level, analyzeRMS_level, analyzeCrest_factor, analyzeRMS_peak, analyzeAbs_Peak_count, analyzeBit_depth, analyzeFlat_factor, analyzeMax_difference, analyzeMax_level, analyzeMean_difference, analyzeMin_difference, analyzeMin_level, analyzeNoise_floor, analyzeNoise_floor_count, analyzePeak_count, analyzeRMS_difference, analyzeRMS_trough
    • loudness and true-peak arrays plus scalar summaries: analyzeTruePeak, analyzeLUFSMomentary, analyzeLUFSShortTerm, analyzeLUFSIntegrated, analyzeLRA, analyzeLUFSlow, analyzeLUFShigh, plus the matching ...Overall scalar functions
  • analyzeAudio.analyzersUseWaveform
    • raw arrays: analyzeTempogram, analyzeRMS, analyzeTempo, analyzeZeroCrossingRate
    • mean summaries: analyzeTempogramMean, analyzeRMSMean, analyzeTempoMean, analyzeZeroCrossingRateMean
  • analyzeAudio.analyzersUseSpectrogram
    • raw arrays: analyzeChromagram, analyzeSpectralContrast, analyzeSpectralBandwidth, analyzeSpectralCentroid, analyzeSpectralFlatness
    • mean summaries: analyzeChromagramMean, analyzeSpectralContrastMean, analyzeSpectralBandwidthMean, analyzeSpectralCentroidMean, analyzeSpectralFlatnessMean
  • analyzeAudio.analyzersUseTensor
    • reverberation and intelligibility: analyzeSRMR, analyzeSRMRMean
    • reference-estimate-mixture scoring: analyzeLogWMSEMean
    • source-separation scores: analyzeL1SNRMean, analyzeL1SNRDBMean, analyzeMultiL1SNRDBMean, analyzeSTFTL1SNRDBMean
    • waveform-domain and STFT-domain loss analyzers: analyzeDCLoss, analyzeESRLoss, analyzeLogCoshLoss, analyzeSNRLoss, analyzeSISDRLoss, analyzeSDSDRLoss, analyzeSTFTLoss, analyzeMelSTFTLoss, analyzeChromaSTFTLoss, analyzeMultiResolutionSTFTLoss, analyzeRandomResolutionSTFTLoss, analyzeSumAndDifferenceSTFTLoss
  • analyzeAudio.analyzersUseTensorSpectrogram
    • magnitude-spectrogram comparison analyzers: analyzeSpectralConvergenceLoss, analyzeSTFTMagnitudeLoss, analyzeL1FrequencyLoss
  • analyzeAudio.ffmpeg
    • environment check for Colab-style sessions: verifyFFmpegColab

Several concept names exist in more than one module. That is intentional. For example, Spectral flatness mean comes from the filename-based FFprobe route, while Spectral Flatness mean comes from the spectrogram route. Similar names do not necessarily mean duplicate implementations.

import numpy
import soundfile

from analyzeAudio.analyzersUseWaveform import analyzeTempogram

with soundfile.SoundFile(pathFilename) as readSoundFile:
    sampleRate = readSoundFile.samplerate
    waveform = readSoundFile.read(dtype='float32').astype(numpy.float32).T

tempogram = analyzeTempogram(waveform, sampleRate)
from analyzeAudio.analyzersUseTensor import analyzeL1SNRDBMean
from analyzeAudio.analyzersUseTensorSpectrogram import analyzeSpectralConvergenceLoss

valueScore = analyzeL1SNRDBMean(tensorAudioReference, tensorAudioEstimate)
valueLoss = analyzeSpectralConvergenceLoss(tensorMagnitudeReference, tensorMagnitudeEstimate)

Use truncateTensors when you want the aligned tensors yourself

Most tensor comparison analyzers already trim inputs internally. truncateTensors is there for the times when you want the aligned tensors yourself before reusing them across several metrics.

from analyzeAudio import truncateTensors

tensorAudioReference, tensorAudioEstimate = truncateTensors([
    tensorAudioReference,
    tensorAudioEstimate,
])

Use whatMeasurements to list registered measurements from the command line

whatMeasurements

This prints the same sorted registry names returned by getListAvailableAudioAspects().

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Perceptual Effects of Spectral Modifications on Musical Timbres

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

A Robust Audio Classification and Segmentation Method

Music Type Classification by Spectral Contrast Feature

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Zero-Crossing Rate

Performance Measurement in Blind Audio Source Separation

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Signal Processing for Music Analysis

The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals

Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata

SDR - Half-Baked or Well Done?

Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation

Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation

Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level

An Overview on Sound Features in Time and Frequency Domain

Perceptual Loss Function for Neural Modelling of Audio Systems

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

auraloss: Audio focused loss functions in PyTorch

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Neural source-filter waveform models for statistical parametric speech synthesis

DDSP: Differentiable Digital Signal Processing

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Packages and documentation

My recovery

Static Badge YouTube Channel Subscribers

CC-BY-NC-4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyzeaudio-0.2.0.tar.gz (72.6 kB view details)

Uploaded Source

File details

Details for the file analyzeaudio-0.2.0.tar.gz.

File metadata

  • Download URL: analyzeaudio-0.2.0.tar.gz
  • Upload date:
  • Size: 72.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for analyzeaudio-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9e93b24a062dc01ac0ee02037b82d3fcde215740d10b3f1a146c1e898b269726
MD5 123edaf41fa349c67da08598a3bcc5a7
BLAKE2b-256 f1fa21febe78ba61702e68c771801fea04254328b84bf9441a36a97d99dfa4fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page