Measure one or more aspects of one or more audio files.
Project description
analyzeAudio
Measure one or more aspects of one or more audio files.
Note well: FFmpeg & FFprobe binaries must be in PATH
Some options to download FFmpeg and FFprobe at ffmpeg.org.
Some ways to use this package
analyzeAudio works at five practical levels: audio path, waveform array, spectrogram array,
waveform tensor, and magnitude-spectrogram tensor. The top-level API is the quickest way to ask
for named measurements from audio paths. The lower-level modules are there when you already have
decoded audio in memory, want the full array or tensor instead of one summary float, or want to
call direct comparison and loss analyzers yourself.
Top-level exports you will probably reach for first
| Export | Purpose |
|---|---|
analyzeAudioFile(pathFilename, listAspectNames) |
Analyze one audio path and return one result per requested registered aspect name. |
analyzeAudioListPathFilenames(listPathFilenames, listAspectNames, CPUlimit=None) |
Analyze many audio paths in parallel and return one completed row per audio path. |
getListAvailableAudioAspects() |
Return the sorted list of registered aspect names. |
audioAspects |
Registry of aspect name -> analyzer callable + required parameter names. |
truncateTensors(listTensors) |
Trim multiple tensors to the same trailing length before direct comparison. |
dataTabularTOpathFilenameDelimited(...) |
Write batch results to a delimited text file. |
The package also re-exports the type aliases Audio, Spectrogram, SpectrogramMagnitude, and
SpectrogramPower so downstream code can annotate the same representations the analyzers expect.
Choose the module that matches the representation you already have
| If you already have... | Reach for... | What you get |
|---|---|---|
| one or more audio paths | top-level API or analyzersUseFilename |
named single-path measurements, paired-path comparisons, FFprobe/FFmpeg-derived arrays, loudness |
waveform numpy.ndarray + sampleRate |
analyzersUseWaveform |
tempogram, RMS, tempo, and zero-crossing arrays or their mean summaries |
spectrogram magnitude/power numpy.ndarray |
analyzersUseSpectrogram |
chroma and spectral-descriptor arrays or their mean summaries |
waveform torch.Tensor |
analyzersUseTensor |
SRMR, logWMSE, source-separation scores, and waveform/STFT-domain loss analyzers |
magnitude spectrogram torch.Tensor |
analyzersUseTensorSpectrogram |
spectrogram-magnitude comparison losses such as spectral convergence and STFT-magnitude distance |
The registry spans more than single-path measurements. Some registered names are convenient single-audio measurements, some are paired-path comparisons, some expect waveform tensors, and some expect magnitude spectrogram tensors.
Use analyzeAudioFile to measure registered single-audio aspects from one path
from analyzeAudio import analyzeAudioFile
listAspectNames = [
'LUFS integrated',
'RMS peak',
'SRMR mean',
'Spectral Flatness mean',
]
listMeasurements = analyzeAudioFile(pathFilename, listAspectNames)
dictionaryMeasurements = dict(zip(listAspectNames, listMeasurements, strict=True))
analyzeAudioFile reads one audio path, prepares shared intermediate representations, and lets
registered analyzers reuse those representations. Under the hood that means one call can support
measurements that need the raw waveform, sampleRate, a torch.Tensor waveform, a complex STFT,
spectrogram magnitude, or spectrogram power.
analyzeAudioFile preserves the order of listAspectNames. If a requested aspect name is not
registered, the matching return entry is 'not found'.
Registered names are case-sensitive, and some intentionally similar names come from different
analysis routes. For example, Spectral Flatness mean and Spectral flatness mean are different
registered names, and so are Zero-crossing rate mean and Zero-crossings rate.
Some registered names require inputs that cannot come from a single audio path alone. Examples
include SI-SDR mean, LogWMSE, L1SNRDB, and SpectralConvergenceLoss. For those, inspect the
registry and call the analyzer directly.
Use analyzeAudioListPathFilenames to batch single-audio measurements across many paths
from analyzeAudio import analyzeAudioListPathFilenames, dataTabularTOpathFilenameDelimited
listAspectNames = ['LUFS integrated', 'Spectral Flatness mean']
rowsListFilenameAspectValues = analyzeAudioListPathFilenames(listPathFilenames, listAspectNames)
dataTabularTOpathFilenameDelimited(
pathFilenameOutput,
rowsListFilenameAspectValues,
['pathFilename', *listAspectNames],
)
Each returned row starts with the audio path converted to POSIX text, followed by the requested
values. The rows are returned in worker-completion order rather than the original input order.
Use CPUlimit when you want to cap the worker count explicitly.
Use getListAvailableAudioAspects and audioAspects to inspect the registry or call an analyzer directly
from analyzeAudio import audioAspects, getListAvailableAudioAspects
print(getListAvailableAudioAspects())
print(audioAspects['Chromagram mean']['analyzerParameters'])
print(audioAspects['SI-SDR mean']['analyzerParameters'])
print(audioAspects['LogWMSE']['analyzerParameters'])
SI_SDR_channelsMean = audioAspects['SI-SDR mean']['analyzer'](
pathFilenameAudioFile,
pathFilenameDifferentAudioFile,
)
Use audioAspects[name]['analyzerParameters'] first. It tells you whether the registered name
expects one audio path, two audio paths, waveform tensors, a reference-estimate-mixture triple, or
spectrogram magnitudes.
That is the quickest way to discover whether a name is meant for the high-level single-audio API or for direct invocation.
Use the lower-level modules when you want the actual analyzer instead of one registry float
These are the actual analyzers, organized by the representation they consume.
analyzeAudio.analyzersUseFilename- paired-path comparison metrics:
getPSNRmean,getSDRmean,getSI_SDRmean - framewise spectral arrays with matching mean wrappers:
analyzeSpectralCentroid,analyzeSpectralCrest,analyzeSpectralDecrease,analyzeSpectralEntropy,analyzeSpectralFlatness,analyzeSpectralFlux,analyzeSpectralKurtosis,analyzeSpectralMean,analyzeSpectralRolloff,analyzeSpectralSkewness,analyzeSpectralSlope,analyzeSpectralSpread,analyzeSpectralVariance - file-level FFprobe
astatsscalars:analyzeZero_crossings,analyzeZero_crossings_rate,analyzeDCoffset,analyzeDynamicRange,analyzeSignalEntropy,analyzeNumber_of_samples,analyzePeak_level,analyzeRMS_level,analyzeCrest_factor,analyzeRMS_peak,analyzeAbs_Peak_count,analyzeBit_depth,analyzeFlat_factor,analyzeMax_difference,analyzeMax_level,analyzeMean_difference,analyzeMin_difference,analyzeMin_level,analyzeNoise_floor,analyzeNoise_floor_count,analyzePeak_count,analyzeRMS_difference,analyzeRMS_trough - loudness and true-peak arrays plus scalar summaries:
analyzeTruePeak,analyzeLUFSMomentary,analyzeLUFSShortTerm,analyzeLUFSIntegrated,analyzeLRA,analyzeLUFSlow,analyzeLUFShigh, plus the matching...Overallscalar functions
- paired-path comparison metrics:
analyzeAudio.analyzersUseWaveform- raw arrays:
analyzeTempogram,analyzeRMS,analyzeTempo,analyzeZeroCrossingRate - mean summaries:
analyzeTempogramMean,analyzeRMSMean,analyzeTempoMean,analyzeZeroCrossingRateMean
- raw arrays:
analyzeAudio.analyzersUseSpectrogram- raw arrays:
analyzeChromagram,analyzeSpectralContrast,analyzeSpectralBandwidth,analyzeSpectralCentroid,analyzeSpectralFlatness - mean summaries:
analyzeChromagramMean,analyzeSpectralContrastMean,analyzeSpectralBandwidthMean,analyzeSpectralCentroidMean,analyzeSpectralFlatnessMean
- raw arrays:
analyzeAudio.analyzersUseTensor- reverberation and intelligibility:
analyzeSRMR,analyzeSRMRMean - reference-estimate-mixture scoring:
analyzeLogWMSEMean - source-separation scores:
analyzeL1SNRMean,analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - waveform-domain and STFT-domain loss analyzers:
analyzeDCLoss,analyzeESRLoss,analyzeLogCoshLoss,analyzeSNRLoss,analyzeSISDRLoss,analyzeSDSDRLoss,analyzeSTFTLoss,analyzeMelSTFTLoss,analyzeChromaSTFTLoss,analyzeMultiResolutionSTFTLoss,analyzeRandomResolutionSTFTLoss,analyzeSumAndDifferenceSTFTLoss
- reverberation and intelligibility:
analyzeAudio.analyzersUseTensorSpectrogram- magnitude-spectrogram comparison analyzers:
analyzeSpectralConvergenceLoss,analyzeSTFTMagnitudeLoss,analyzeL1FrequencyLoss
- magnitude-spectrogram comparison analyzers:
analyzeAudio.ffmpeg- environment check for Colab-style sessions:
verifyFFmpegColab
- environment check for Colab-style sessions:
Several concept names exist in more than one module. That is intentional. For example,
Spectral flatness mean comes from the filename-based FFprobe route, while Spectral Flatness mean
comes from the spectrogram route. Similar names do not necessarily mean duplicate implementations.
import numpy
import soundfile
from analyzeAudio.analyzersUseWaveform import analyzeTempogram
with soundfile.SoundFile(pathFilename) as readSoundFile:
sampleRate = readSoundFile.samplerate
waveform = readSoundFile.read(dtype='float32').astype(numpy.float32).T
tempogram = analyzeTempogram(waveform, sampleRate)
from analyzeAudio.analyzersUseTensor import analyzeL1SNRDBMean
from analyzeAudio.analyzersUseTensorSpectrogram import analyzeSpectralConvergenceLoss
valueScore = analyzeL1SNRDBMean(tensorAudioReference, tensorAudioEstimate)
valueLoss = analyzeSpectralConvergenceLoss(tensorMagnitudeReference, tensorMagnitudeEstimate)
Use truncateTensors when you want the aligned tensors yourself
Most tensor comparison analyzers already trim inputs internally. truncateTensors is there for the
times when you want the aligned tensors yourself before reusing them across several metrics.
from analyzeAudio import truncateTensors
tensorAudioReference, tensorAudioEstimate = truncateTensors([
tensorAudioReference,
tensorAudioEstimate,
])
Use whatMeasurements to list registered measurements from the command line
whatMeasurements
This prints the same sorted registry names returned by getListAvailableAudioAspects().
Reference materials
A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis
- Common name: spectral flatness
- BibTeX citation.
- DOI: 10.1109/TASSP.1974.1162572
- IEEE Xplore: document 1162647
- Implementation:
- librosa/librosa.feature.spectral_flatness
Perceptual Effects of Spectral Modifications on Musical Timbres
Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments
- Common name: spectral entropy
- BibTeX citation.
- DOI: 10.21437/ICSLP.1998-527
- Proceedings: ISCA Archive
- Free author PDF: Columbia University
Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music
- Common name: chroma features
- BibTeX citation.
- Proceedings: University of Michigan ICMC archive
- CCRMA HTML copy: Stanford CCRMA
- Implementation:
- librosa/librosa.feature.chroma_stft
A Robust Audio Classification and Segmentation Method
- BibTeX citation.
- Technical report: Microsoft Research
- Free PDF: Microsoft Research
- Implementations:
Music Type Classification by Spectral Contrast Feature
- Common name: spectral contrast
- BibTeX citation.
- DOI: 10.1109/ICME.2002.1035731
- Free PDF: Tsinghua University
- Implementation:
- librosa/librosa.feature.spectral_contrast
A Speech/Music Discriminator Based on RMS and Zero-Crossings
- Common names: RMS, zero-crossing rate
- BibTeX citation.
- DOI: 10.1109/TMM.2004.840604
- Free author proof: University of Crete
- Implementation:
- librosa/librosa.feature.rms
- librosa/librosa.feature.zero_crossing_rate
Zero-Crossing Rate
- Common name: zero-crossing rate
- BibTeX citation.
- Online chapter: Introduction to Speech Processing
- Implementation:
- librosa/librosa.feature.zero_crossing_rate
Performance Measurement in Blind Audio Source Separation
- Common name: BSS Eval SDR
- BibTeX citation.
- DOI: 10.1109/TSA.2005.858005
- Free author PDF: IRISA
- Implementations:
Automatic Chord Recognition from Audio Using a HMM with Supervised Learning
- BibTeX citation.
- Proceedings: ISMIR 2006
- Free PDF: Stanford CCRMA
- Implementation:
- librosa/librosa.feature.chroma_stft
Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals
- Common name: tempogram
- BibTeX citation.
- DOI: 10.1109/ICASSP.2010.5495219
- Free author PDF: AudioLabs Erlangen
- Implementations:
- librosa/librosa.feature.tempogram
- Vamp Tempogram Plugin
A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech
- Common name: SRMR
- BibTeX citation.
- DOI: 10.1109/TASL.2010.2052247
- Free author PDF: MUSEA Lab
- Implementation:
Signal Processing for Music Analysis
- BibTeX citation.
- DOI: 10.1109/JSTSP.2011.2112333
- Free author PDF: Columbia University
- Implementation:
The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals
- BibTeX citation.
- DOI: 10.1121/1.3642604
- Free PDF: McGill University
- Implementations:
Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata
- Common name: APSNR reference
- BibTeX citation.
- DOI: 10.14257/ijsia.2016.10.9.18
- Free repository copy: Universidad Autonoma de Madrid
- Implementation:
SDR - Half-Baked or Well Done?
- Common name: SI-SDR
- BibTeX citation. TeX Source with precise formulas for AI agents.
- DOI: 10.1109/ICASSP.2019.8683855
- Free author PDF: Jonathan Le Roux
- Implementations:
Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation
- Common name: momentary LUFS
- BibTeX citation.
- Standard: EBU Tech 3341
- Free PDF: EBU
- Implementation:
Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation
- Common name: LUFS
- BibTeX citation.
- Standard: EBU Tech 3342
- Free PDF: EBU
- Implementation:
Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level
- Common name: True peak
- BibTeX citation.
- Standard: ITU-R BS.1770-5
- Free PDF: ITU
- Implementation:
An Overview on Sound Features in Time and Frequency Domain
- BibTeX citation.
- DOI: 10.2478/ijasitels-2023-0006
- Open access article: Reference Global
- PDF: Reference Global
Perceptual Loss Function for Neural Modelling of Audio Systems
- Common names: ESR loss, DC loss
- Used by:
analyzeESRLoss,analyzeDCLoss - BibTeX citation.
- arXiv abstract: 1911.08922
- TeX source with formulas for AI agents: arXiv source
- PDF: arXiv PDF
Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder
- Common name: log-cosh loss
- Used by:
analyzeLogCoshLoss - BibTeX citation.
- OpenReview page: rkglvsC9Ym
- PDF: OpenReview PDF
logWMSE Audio Quality Metric and PyTorch Loss Implementation
- Common name: logWMSE
- Used by:
analyzeLogWMSEMean - Original implementation:
- PyTorch implementation:
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
- Common names: spectral convergence, STFT magnitude loss terms
- Used by:
analyzeSpectralConvergenceLoss,analyzeSTFTMagnitudeLoss,analyzeSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1808.06719
- arXiv abstract: 1808.06719
- TeX source with formulas for AI agents: arXiv source
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation
- Used by:
analyzeSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1904.04472
- arXiv abstract: 1904.04472
- TeX source with formulas for AI agents: arXiv source
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
- Common name: multi-resolution STFT
- Used by:
analyzeMultiResolutionSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1910.11480
- arXiv abstract: 1910.11480
- TeX source with formulas for AI agents: arXiv source
auraloss: Audio focused loss functions in PyTorch
- Common names: random-resolution STFT loss implementation source
- Used by:
analyzeRandomResolutionSTFTLoss - BibTeX citation.
- Workshop paper PDF: DMRN+15 PDF
- Source:
Automatic multitrack mixing with a differentiable mixing console of neural audio effects
- Common names: sum-and-difference STFT loss in neural mixing
- Used by:
analyzeSumAndDifferenceSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.2010.10291
- arXiv abstract: 2010.10291
- TeX source with formulas for AI agents: arXiv source
Neural source-filter waveform models for statistical parametric speech synthesis
- Related in auraloss docs for multi-resolution spectral training context
- BibTeX citation.
- DOI: 10.48550/arXiv.1904.12088
- arXiv abstract: 1904.12088
- TeX source with formulas for AI agents: arXiv source
DDSP: Differentiable Digital Signal Processing
- Related in auraloss docs for STFT-magnitude formulation context
- BibTeX citation.
- DOI: 10.48550/arXiv.2001.04643
- arXiv abstract: 2001.04643
- TeX source with formulas for AI agents: arXiv source
A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
- Common name: L1SNR reference
- Used by:
analyzeL1SNRMean - BibTeX citation.
- DOI: 10.1109/OJSP.2023.3339428
- arXiv abstract: 2309.02539
- TeX source with formulas for AI agents: arXiv source
A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems
- Common name: L1SNR reference
- Used by:
analyzeL1SNRMean,analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- DOI: 10.48550/arXiv.2406.18747
- arXiv abstract: 2406.18747
- arXiv HTML used by docstrings: 2406.18747v2
- TeX source with formulas for AI agents: arXiv source
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries
- Common name: L1SNRDB reference
- Used by:
analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- DOI: 10.48550/arXiv.2501.16171
- arXiv abstract: 2501.16171
- arXiv HTML used by docstrings: 2501.16171v1
- TeX source with formulas for AI agents: arXiv source
torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch
- Common name: torch-l1-snr
- Used by:
analyzeL1SNRMean,analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- Source: crlandsc/torch-l1-snr
Packages and documentation
- analyzeAudio
- FFmpeg documentation
- librosa/librosa
- Lightning-AI/torchmetrics
- PyTorch
torch.nn.Module - sigsep/sigsep-mus-eval
- mir-evaluation/mir_eval
My recovery
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file analyzeaudio-0.2.0.tar.gz.
File metadata
- Download URL: analyzeaudio-0.2.0.tar.gz
- Upload date:
- Size: 72.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e93b24a062dc01ac0ee02037b82d3fcde215740d10b3f1a146c1e898b269726
|
|
| MD5 |
123edaf41fa349c67da08598a3bcc5a7
|
|
| BLAKE2b-256 |
f1fa21febe78ba61702e68c771801fea04254328b84bf9441a36a97d99dfa4fc
|