Measure one or more aspects of one or more audio files.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hunterhogan

These details have not been verified by PyPI

Project links

Project description

analyzeAudio

Measure one or more aspects of one or more audio files.

Note well: FFmpeg & FFprobe binaries must be in PATH

Some options to download FFmpeg and FFprobe at ffmpeg.org.

Some ways to use this package

analyzeAudio works at five practical levels: audio path, waveform array, spectrogram array, waveform tensor, and magnitude-spectrogram tensor. The top-level API is the quickest way to ask for named measurements from audio paths. The lower-level modules are there when you already have decoded audio in memory, want the full array or tensor instead of one summary float, or want to call direct comparison and loss analyzers yourself.

Top-level exports you will probably reach for first

Export	Purpose
`analyzeAudioFile(pathFilename, listAspectNames)`	Analyze one audio path and return one result per requested registered aspect name.
`analyzeAudioListPathFilenames(listPathFilenames, listAspectNames, CPUlimit=None)`	Analyze many audio paths in parallel and return one completed row per audio path.
`getListAvailableAudioAspects()`	Return the sorted list of registered aspect names.
`audioAspects`	Registry of `aspect name -> analyzer callable + required parameter names`.
`truncateTensors(listTensors)`	Trim multiple tensors to the same trailing length before direct comparison.
`dataTabularTOpathFilenameDelimited(...)`	Write batch results to a delimited text file.

The package also re-exports the type aliases Audio, Spectrogram, SpectrogramMagnitude, and SpectrogramPower so downstream code can annotate the same representations the analyzers expect.

Choose the module that matches the representation you already have

If you already have...	Reach for...	What you get
one or more audio paths	top-level API or `analyzersUseFilename`	named single-path measurements, paired-path comparisons, FFprobe/FFmpeg-derived arrays, loudness
waveform `numpy.ndarray` + `sampleRate`	`analyzersUseWaveform`	tempogram, RMS, tempo, and zero-crossing arrays or their mean summaries
spectrogram magnitude/power `numpy.ndarray`	`analyzersUseSpectrogram`	chroma and spectral-descriptor arrays or their mean summaries
waveform `torch.Tensor`	`analyzersUseTensor`	SRMR, logWMSE, source-separation scores, and waveform/STFT-domain loss analyzers
magnitude spectrogram `torch.Tensor`	`analyzersUseTensorSpectrogram`	spectrogram-magnitude comparison losses such as spectral convergence and STFT-magnitude distance

The registry spans more than single-path measurements. Some registered names are convenient single-audio measurements, some are paired-path comparisons, some expect waveform tensors, and some expect magnitude spectrogram tensors.

Use `analyzeAudioFile` to measure registered single-audio aspects from one path

from analyzeAudio import analyzeAudioFile

listAspectNames = [
    'LUFS integrated',
    'RMS peak',
    'SRMR mean',
    'Spectral Flatness mean',
]

listMeasurements = analyzeAudioFile(pathFilename, listAspectNames)
dictionaryMeasurements = dict(zip(listAspectNames, listMeasurements, strict=True))

analyzeAudioFile reads one audio path, prepares shared intermediate representations, and lets registered analyzers reuse those representations. Under the hood that means one call can support measurements that need the raw waveform, sampleRate, a torch.Tensor waveform, a complex STFT, spectrogram magnitude, or spectrogram power.

analyzeAudioFile preserves the order of listAspectNames. If a requested aspect name is not registered, the matching return entry is 'not found'.

Registered names are case-sensitive, and some intentionally similar names come from different analysis routes. For example, Spectral Flatness mean and Spectral flatness mean are different registered names, and so are Zero-crossing rate mean and Zero-crossings rate.

Some registered names require inputs that cannot come from a single audio path alone. Examples include SI-SDR mean, LogWMSE, L1SNRDB, and SpectralConvergenceLoss. For those, inspect the registry and call the analyzer directly.

Use `analyzeAudioListPathFilenames` to batch single-audio measurements across many paths

from analyzeAudio import analyzeAudioListPathFilenames, dataTabularTOpathFilenameDelimited

listAspectNames = ['LUFS integrated', 'Spectral Flatness mean']
rowsListFilenameAspectValues = analyzeAudioListPathFilenames(listPathFilenames, listAspectNames)

dataTabularTOpathFilenameDelimited(
    pathFilenameOutput,
    rowsListFilenameAspectValues,
    ['pathFilename', *listAspectNames],
)

Each returned row starts with the audio path converted to POSIX text, followed by the requested values. The rows are returned in worker-completion order rather than the original input order. Use CPUlimit when you want to cap the worker count explicitly.

Use `getListAvailableAudioAspects` and `audioAspects` to inspect the registry or call an analyzer directly

from analyzeAudio import audioAspects, getListAvailableAudioAspects

print(getListAvailableAudioAspects())
print(audioAspects['Chromagram mean']['analyzerParameters'])
print(audioAspects['SI-SDR mean']['analyzerParameters'])
print(audioAspects['LogWMSE']['analyzerParameters'])

SI_SDR_channelsMean = audioAspects['SI-SDR mean']['analyzer'](
    pathFilenameAudioFile,
    pathFilenameDifferentAudioFile,
)

Use audioAspects[name]['analyzerParameters'] first. It tells you whether the registered name expects one audio path, two audio paths, waveform tensors, a reference-estimate-mixture triple, or spectrogram magnitudes.

That is the quickest way to discover whether a name is meant for the high-level single-audio API or for direct invocation.

Use the lower-level modules when you want the actual analyzer instead of one registry float

These are the actual analyzers, organized by the representation they consume.

analyzeAudio.analyzersUseFilename
- paired-path comparison metrics: getPSNRmean, getSDRmean, getSI_SDRmean
- framewise spectral arrays with matching mean wrappers: analyzeSpectralCentroid, analyzeSpectralCrest, analyzeSpectralDecrease, analyzeSpectralEntropy, analyzeSpectralFlatness, analyzeSpectralFlux, analyzeSpectralKurtosis, analyzeSpectralMean, analyzeSpectralRolloff, analyzeSpectralSkewness, analyzeSpectralSlope, analyzeSpectralSpread, analyzeSpectralVariance
- file-level FFprobe astats scalars: analyzeZero_crossings, analyzeZero_crossings_rate, analyzeDCoffset, analyzeDynamicRange, analyzeSignalEntropy, analyzeNumber_of_samples, analyzePeak_level, analyzeRMS_level, analyzeCrest_factor, analyzeRMS_peak, analyzeAbs_Peak_count, analyzeBit_depth, analyzeFlat_factor, analyzeMax_difference, analyzeMax_level, analyzeMean_difference, analyzeMin_difference, analyzeMin_level, analyzeNoise_floor, analyzeNoise_floor_count, analyzePeak_count, analyzeRMS_difference, analyzeRMS_trough
- loudness and true-peak arrays plus scalar summaries: analyzeTruePeak, analyzeLUFSMomentary, analyzeLUFSShortTerm, analyzeLUFSIntegrated, analyzeLRA, analyzeLUFSlow, analyzeLUFShigh, plus the matching ...Overall scalar functions
analyzeAudio.analyzersUseWaveform
- raw arrays: analyzeTempogram, analyzeRMS, analyzeTempo, analyzeZeroCrossingRate
- mean summaries: analyzeTempogramMean, analyzeRMSMean, analyzeTempoMean, analyzeZeroCrossingRateMean
analyzeAudio.analyzersUseSpectrogram
- raw arrays: analyzeChromagram, analyzeSpectralContrast, analyzeSpectralBandwidth, analyzeSpectralCentroid, analyzeSpectralFlatness
- mean summaries: analyzeChromagramMean, analyzeSpectralContrastMean, analyzeSpectralBandwidthMean, analyzeSpectralCentroidMean, analyzeSpectralFlatnessMean
analyzeAudio.analyzersUseTensor
- reverberation and intelligibility: analyzeSRMR, analyzeSRMRMean
- reference-estimate-mixture scoring: analyzeLogWMSEMean
- source-separation scores: analyzeL1SNRMean, analyzeL1SNRDBMean, analyzeMultiL1SNRDBMean, analyzeSTFTL1SNRDBMean
- waveform-domain and STFT-domain loss analyzers: analyzeDCLoss, analyzeESRLoss, analyzeLogCoshLoss, analyzeSNRLoss, analyzeSISDRLoss, analyzeSDSDRLoss, analyzeSTFTLoss, analyzeMelSTFTLoss, analyzeChromaSTFTLoss, analyzeMultiResolutionSTFTLoss, analyzeRandomResolutionSTFTLoss, analyzeSumAndDifferenceSTFTLoss
analyzeAudio.analyzersUseTensorSpectrogram
- magnitude-spectrogram comparison analyzers: analyzeSpectralConvergenceLoss, analyzeSTFTMagnitudeLoss, analyzeL1FrequencyLoss
analyzeAudio.ffmpeg
- environment check for Colab-style sessions: verifyFFmpegColab

Several concept names exist in more than one module. That is intentional. For example, Spectral flatness mean comes from the filename-based FFprobe route, while Spectral Flatness mean comes from the spectrogram route. Similar names do not necessarily mean duplicate implementations.

import numpy
import soundfile

from analyzeAudio.analyzersUseWaveform import analyzeTempogram

with soundfile.SoundFile(pathFilename) as readSoundFile:
    sampleRate = readSoundFile.samplerate
    waveform = readSoundFile.read(dtype='float32').astype(numpy.float32).T

tempogram = analyzeTempogram(waveform, sampleRate)

from analyzeAudio.analyzersUseTensor import analyzeL1SNRDBMean
from analyzeAudio.analyzersUseTensorSpectrogram import analyzeSpectralConvergenceLoss

valueScore = analyzeL1SNRDBMean(tensorAudioReference, tensorAudioEstimate)
valueLoss = analyzeSpectralConvergenceLoss(tensorMagnitudeReference, tensorMagnitudeEstimate)

Use `truncateTensors` when you want the aligned tensors yourself

Most tensor comparison analyzers already trim inputs internally. truncateTensors is there for the times when you want the aligned tensors yourself before reusing them across several metrics.

from analyzeAudio import truncateTensors

tensorAudioReference, tensorAudioEstimate = truncateTensors([
    tensorAudioReference,
    tensorAudioEstimate,
])

Use `whatMeasurements` to list registered measurements from the command line

whatMeasurements

This prints the same sorted registry names returned by getListAvailableAudioAspects().

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Common name: spectral flatness
BibTeX citation.
DOI: 10.1109/TASSP.1974.1162572
IEEE Xplore: document 1162647
Implementation:
- librosa/librosa.feature.spectral_flatness

Perceptual Effects of Spectral Modifications on Musical Timbres

BibTeX citation.
DOI: 10.1121/1.381843

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Common name: spectral entropy
BibTeX citation.
DOI: 10.21437/ICSLP.1998-527
Proceedings: ISCA Archive
Free author PDF: Columbia University

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

Common name: chroma features
BibTeX citation.
Proceedings: University of Michigan ICMC archive
CCRMA HTML copy: Stanford CCRMA
Implementation:
- librosa/librosa.feature.chroma_stft

A Robust Audio Classification and Segmentation Method

Music Type Classification by Spectral Contrast Feature

Common name: spectral contrast
BibTeX citation.
DOI: 10.1109/ICME.2002.1035731
Free PDF: Tsinghua University
Implementation:
- librosa/librosa.feature.spectral_contrast

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Common names: RMS, zero-crossing rate
BibTeX citation.
DOI: 10.1109/TMM.2004.840604
Free author proof: University of Crete
Implementation:
- librosa/librosa.feature.rms
- librosa/librosa.feature.zero_crossing_rate

Zero-Crossing Rate

Common name: zero-crossing rate
BibTeX citation.
Online chapter: Introduction to Speech Processing
Implementation:
- librosa/librosa.feature.zero_crossing_rate

Performance Measurement in Blind Audio Source Separation

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

BibTeX citation.
Proceedings: ISMIR 2006
Free PDF: Stanford CCRMA
Implementation:
- librosa/librosa.feature.chroma_stft

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

Common name: tempogram
BibTeX citation.
DOI: 10.1109/ICASSP.2010.5495219
Free author PDF: AudioLabs Erlangen
Implementations:
- librosa/librosa.feature.tempogram
- Vamp Tempogram Plugin

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Common name: SRMR
BibTeX citation.
DOI: 10.1109/TASL.2010.2052247
Free author PDF: MUSEA Lab
Implementation:
- Lightning-AI/torchmetrics
  - SRMR official documentation.
  - Python source with implementation details for AI agents.

Signal Processing for Music Analysis

An Overview on Sound Features in Time and Frequency Domain

Perceptual Loss Function for Neural Modelling of Audio Systems

Common names: ESR loss, DC loss
Used by: analyzeESRLoss, analyzeDCLoss
BibTeX citation.
arXiv abstract: 1911.08922
TeX source with formulas for AI agents: arXiv source
PDF: arXiv PDF

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

Common name: log-cosh loss
Used by: analyzeLogCoshLoss
BibTeX citation.
OpenReview page: rkglvsC9Ym
PDF: OpenReview PDF

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Common name: logWMSE
Used by: analyzeLogWMSEMean
Original implementation:
- BibTeX citation.
- nomonosound/log-wmse-audio-quality
PyTorch implementation:
- BibTeX citation.
- crlandsc/torch-log-wmse

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Common names: spectral convergence, STFT magnitude loss terms
Used by: analyzeSpectralConvergenceLoss, analyzeSTFTMagnitudeLoss, analyzeSTFTLoss
BibTeX citation.
DOI: 10.48550/arXiv.1808.06719
arXiv abstract: 1808.06719
TeX source with formulas for AI agents: arXiv source

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

Used by: analyzeSTFTLoss
BibTeX citation.
DOI: 10.48550/arXiv.1904.04472
arXiv abstract: 1904.04472
TeX source with formulas for AI agents: arXiv source

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Common name: multi-resolution STFT
Used by: analyzeMultiResolutionSTFTLoss
BibTeX citation.
DOI: 10.48550/arXiv.1910.11480
arXiv abstract: 1910.11480
TeX source with formulas for AI agents: arXiv source

auraloss: Audio focused loss functions in PyTorch

Common names: random-resolution STFT loss implementation source
Used by: analyzeRandomResolutionSTFTLoss
BibTeX citation.
Workshop paper PDF: DMRN+15 PDF
Source:
- BibTeX citation for the source repository.
- csteinmetz1/auraloss

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Common names: sum-and-difference STFT loss in neural mixing
Used by: analyzeSumAndDifferenceSTFTLoss
BibTeX citation.
DOI: 10.48550/arXiv.2010.10291
arXiv abstract: 2010.10291
TeX source with formulas for AI agents: arXiv source

Neural source-filter waveform models for statistical parametric speech synthesis

Related in auraloss docs for multi-resolution spectral training context
BibTeX citation.
DOI: 10.48550/arXiv.1904.12088
arXiv abstract: 1904.12088
TeX source with formulas for AI agents: arXiv source

DDSP: Differentiable Digital Signal Processing

Related in auraloss docs for STFT-magnitude formulation context
BibTeX citation.
DOI: 10.48550/arXiv.2001.04643
arXiv abstract: 2001.04643
TeX source with formulas for AI agents: arXiv source

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Common name: L1SNR reference
Used by: analyzeL1SNRMean
BibTeX citation.
DOI: 10.1109/OJSP.2023.3339428
arXiv abstract: 2309.02539
TeX source with formulas for AI agents: arXiv source

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Common name: L1SNR reference
Used by: analyzeL1SNRMean, analyzeL1SNRDBMean, analyzeMultiL1SNRDBMean, analyzeSTFTL1SNRDBMean
BibTeX citation.
DOI: 10.48550/arXiv.2406.18747
arXiv abstract: 2406.18747
arXiv HTML used by docstrings: 2406.18747v2
TeX source with formulas for AI agents: arXiv source

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

Common name: L1SNRDB reference
Used by: analyzeL1SNRDBMean, analyzeMultiL1SNRDBMean, analyzeSTFTL1SNRDBMean
BibTeX citation.
DOI: 10.48550/arXiv.2501.16171
arXiv abstract: 2501.16171
arXiv HTML used by docstrings: 2501.16171v1
TeX source with formulas for AI agents: arXiv source

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Common name: torch-l1-snr
Used by: analyzeL1SNRMean, analyzeL1SNRDBMean, analyzeMultiL1SNRDBMean, analyzeSTFTL1SNRDBMean
BibTeX citation.
Source: crlandsc/torch-l1-snr

Packages and documentation

My recovery

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hunterhogan

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Jul 1, 2026

0.7.0

Jun 26, 2026

0.6.0

Jun 16, 2026

0.5.0

Jun 14, 2026

0.4.0

Jun 10, 2026

0.3.0

Jun 8, 2026

This version

0.2.0

Jun 1, 2026

0.1.1

May 30, 2026

0.1.0

May 30, 2026

0.0.20

May 22, 2026

0.0.19

May 20, 2026

0.0.18

Feb 4, 2026

0.0.17

Jul 11, 2025

0.0.16

May 20, 2025

0.0.15

Mar 30, 2025

0.0.14

Mar 7, 2025

0.0.13

Mar 6, 2025

0.0.12

Mar 4, 2025

0.0.11

Jan 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyzeaudio-0.2.0.tar.gz (72.6 kB view details)

Uploaded Jun 1, 2026 Source

File details

Details for the file analyzeaudio-0.2.0.tar.gz.

File metadata

Download URL: analyzeaudio-0.2.0.tar.gz
Upload date: Jun 1, 2026
Size: 72.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for analyzeaudio-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9e93b24a062dc01ac0ee02037b82d3fcde215740d10b3f1a146c1e898b269726`
MD5	`123edaf41fa349c67da08598a3bcc5a7`
BLAKE2b-256	`f1fa21febe78ba61702e68c771801fea04254328b84bf9441a36a97d99dfa4fc`

See more details on using hashes here.

analyzeAudio 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

analyzeAudio

Note well: FFmpeg & FFprobe binaries must be in PATH

Some ways to use this package

Top-level exports you will probably reach for first

Choose the module that matches the representation you already have

Use analyzeAudioFile to measure registered single-audio aspects from one path

Use analyzeAudioListPathFilenames to batch single-audio measurements across many paths

Use getListAvailableAudioAspects and audioAspects to inspect the registry or call an analyzer directly

Use the lower-level modules when you want the actual analyzer instead of one registry float

Use truncateTensors when you want the aligned tensors yourself

Use whatMeasurements to list registered measurements from the command line

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Perceptual Effects of Spectral Modifications on Musical Timbres

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

A Robust Audio Classification and Segmentation Method

Music Type Classification by Spectral Contrast Feature

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Zero-Crossing Rate

Performance Measurement in Blind Audio Source Separation

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Signal Processing for Music Analysis

The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals

Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata

SDR - Half-Baked or Well Done?

Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation

Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation

Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level

An Overview on Sound Features in Time and Frequency Domain

Perceptual Loss Function for Neural Modelling of Audio Systems

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

auraloss: Audio focused loss functions in PyTorch

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Neural source-filter waveform models for statistical parametric speech synthesis

DDSP: Differentiable Digital Signal Processing

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Packages and documentation

My recovery

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Use `analyzeAudioFile` to measure registered single-audio aspects from one path

Use `analyzeAudioListPathFilenames` to batch single-audio measurements across many paths

Use `getListAvailableAudioAspects` and `audioAspects` to inspect the registry or call an analyzer directly

Use `truncateTensors` when you want the aligned tensors yourself

Use `whatMeasurements` to list registered measurements from the command line