Measure one or more aspects of one or more audio files.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hunterhogan

These details have not been verified by PyPI

Project links

Project description

analyzeAudio

Measure one or more aspects of one or more audio files.

Note well: FFmpeg & FFprobe binaries must be in PATH

Some options to download FFmpeg and FFprobe at ffmpeg.org.

Install FFmpeg on Google Colab

from analyzeAudio.ffmpeg import verifyFFmpegColab
verifyFFmpegColab()

What is in the package

analyzeAudio provides two user-facing kinds of audio analysis.

Audio aspects measure one audio file.
Audio contests compare two audio files, waveforms, tensors, or spectrograms.

The main user workflows are:

What you want	Use
One value for each selected measurement on one file	`analyzeAudioFile`
The same selected measurements for many files	`analyzeAudioListPathFilenames`
A TSV, CSV, or other delimited output file	`dataTabularTOpathFilenameDelimited`
One specific measurement or detailed frame data	Import a direct analyzer function
One comparison score between two files	Import a filename contest function
One comparison score between two tensors or spectrograms	Import a tensor or spectrogram contest function

One-file measurements

Use these names with analyzeAudioFile or analyzeAudioListPathFilenames. Names are case-sensitive.

Loudness and true peak:

Name	What it measures
`LUFS integrated`	Whole-file integrated loudness.
`LUFS momentary maximum`	Maximum momentary loudness.
`LUFS short-term maximum`	Maximum short-term loudness.
`LUFS loudness range`	Loudness range.
`LUFS low`	Low loudness range boundary.
`LUFS high`	High loudness range boundary.
`true_peak maximum`	Maximum true peak level.

Signal level, dynamics, and samples:

Name	What it measures
`RMS_level overall`	Overall RMS level.
`RMS_peak overall`	Overall RMS peak.
`RMS_trough overall`	Overall RMS trough.
`RMS_difference overall`	Overall RMS difference between adjacent samples.
`Peak_level overall`	Overall peak level.
`Peak_count total`	Total detected peak count.
`Abs_Peak_count total`	Total absolute peak count.
`Crest_factor mean`	Mean crest factor.
`Dynamic_range overall`	Overall dynamic range.
`DC_offset mean`	Mean DC offset.
`Bit_depth mean`	Mean detected bit depth.
`Entropy mean`	Mean signal entropy.
`Flat_factor mean`	Mean flat factor.
`Max_difference overall`	Maximum sample difference.
`Max_level overall`	Maximum sample level.
`Mean_difference mean`	Mean adjacent-sample difference.
`Min_difference overall`	Minimum sample difference.
`Min_level overall`	Minimum sample level.
`Noise_floor overall`	Overall noise floor.
`Noise_floor_count total`	Total noise-floor count.
`Number_of_samples total`	Total samples.
`Zero_crossings total`	Total zero crossings.
`Zero_crossings_rate overall`	Overall zero-crossing rate.

Spectral measurements:

Name	What it measures
`Power spectral density mean`	Mean power spectral density.
`Spectral centroid mean`	Mean spectral center of mass from filename analysis.
`Spectral crest mean`	Mean spectral crest.
`Spectral decrease mean`	Mean spectral decrease.
`Spectral entropy mean`	Mean spectral entropy.
`Spectral flatness mean`	Mean spectral flatness from filename analysis.
`Spectral flux mean`	Mean spectral flux.
`Spectral kurtosis mean`	Mean spectral kurtosis.
`Spectral rolloff mean`	Mean spectral rolloff.
`Spectral skewness mean`	Mean spectral skewness.
`Spectral slope mean`	Mean spectral slope.
`Spectral spread mean`	Mean spectral spread.
`Spectral variance mean`	Mean spectral variance.
`Spectral Bandwidth mean`	Mean librosa spectral bandwidth.
`Spectral Centroid mean`	Mean librosa spectral centroid.
`Spectral Contrast mean`	Mean librosa spectral contrast.
`Spectral Flatness mean`	Mean librosa spectral flatness ratio.
`Spectral Flatness dB mean`	Mean librosa spectral flatness in decibels.
`Chromagram mean`	Mean chroma energy across pitch classes.

Waveform, rhythm, and speech measurements:

Name	What it measures
`RMS Waveform mean`	Mean waveform RMS amplitude.
`RMS Waveform dB mean`	Mean waveform RMS level in decibels.
`Tempogram mean`	Mean tempogram value.
`Tempo mean`	Mean estimated tempo.
`Zero Crossing Rate mean`	Mean waveform zero-crossing rate.
`SRMR mean`	Mean speech-to-reverberation modulation energy ratio.

Some names intentionally differ only by capitalization or wording. For example, Spectral flatness mean and Spectral Flatness mean are different measurements. Use the exact name from the table.

Measure one file

from analyzeAudio import analyzeAudioFile

listAspectNames = [
    "LUFS integrated",
    "true_peak maximum",
    "RMS_level overall",
    "Spectral Flatness mean",
    "Zero Crossing Rate mean",
]

listValues = analyzeAudioFile("voice.wav", listAspectNames)
measurements = dict(zip(listAspectNames, listValues, strict=True))
print(measurements)

analyzeAudioFile returns one value for each requested name, in the same order. If a requested name is unavailable, that value is "not found".

Measure many files

from pathlib import Path
from analyzeAudio import analyzeAudioListPathFilenames

listPathFilenames = tuple(Path("audio").glob("*.wav"))
listAspectNames = [
    "LUFS integrated",
    "LUFS loudness range",
    "true_peak maximum",
]

rows = analyzeAudioListPathFilenames(
    listPathFilenames,
    listAspectNames,
    CPUlimit=4,
)

for row in rows:
    print(row)

Each row starts with the analyzed filename, followed by the requested values. Rows are returned as files finish, so row order can differ from input order.

Save measurements

from analyzeAudio import (
    analyzeAudioListPathFilenames,
    dataTabularTOpathFilenameDelimited,
)

listAspectNames = ["LUFS integrated", "true_peak maximum"]
rows = analyzeAudioListPathFilenames(["one.wav", "two.wav"], listAspectNames)

dataTabularTOpathFilenameDelimited(
    "measurements.tsv",
    rows,
    ["pathFilename", *listAspectNames],
)

For CSV output, use a comma delimiter:

dataTabularTOpathFilenameDelimited(
    "measurements.csv",
    rows,
    ["pathFilename", *listAspectNames],
    delimiterOutput=",",
)

Get detailed arrays

Summary names usually return one number. Direct analyzer functions without summary words usually return the per-frame, per-channel, or per-band values.

from analyzeAudio.analyzersUseFilename import (
    analyzeLUFSIntegratedOverall,
    analyzeLUFSMomentary,
)

integrated = analyzeLUFSIntegratedOverall("voice.wav")
momentaryFrames = analyzeLUFSMomentary("voice.wav")

Use audio already loaded in Python

Waveform analyzers accept waveform samples shaped as channels by samples.

import numpy
import soundfile
from analyzeAudio.analyzersUseWaveform import (
    analyzeRMSWaveformMean,
    analyzeTempoMean,
    analyzeZeroCrossingRateMean,
)

with soundfile.SoundFile("voice.wav") as audioFile:
    sampleRate = audioFile.samplerate
    waveform = audioFile.read(dtype="float32", always_2d=True).astype(numpy.float32).T

rms = analyzeRMSWaveformMean(waveform)
tempo = analyzeTempoMean(waveform, sampleRate)
zeroCrossingRate = analyzeZeroCrossingRateMean(waveform)

Spectrogram analyzers accept magnitude or power spectrograms.

import librosa
import numpy
from analyzeAudio.analyzersUseSpectrogram import (
    analyzeChromagramMean,
    analyzeSpectralCentroidMean,
)

spectrogram = librosa.stft(waveform)
spectrogramMagnitude = numpy.absolute(spectrogram)
spectrogramPower = spectrogramMagnitude**2

spectralCentroid = analyzeSpectralCentroidMean(spectrogramMagnitude)
chromagram = analyzeChromagramMean(spectrogramPower, sampleRate)

Two-input comparisons

Filename contests compare two audio files:

Function	What it compares
`analyzePSNRmean`	Mean peak signal-to-noise ratio.
`analyzeSDRmean`	Mean signal-to-distortion ratio.
`analyzeSI_SDRmean`	Mean scale-invariant signal-to-distortion ratio.
`analyzeKPSNRmean`	Bounded score from PSNR.
`analyzeKSDRmean`	Bounded score from SDR.
`analyzeKSI_SDRmean`	Bounded score from SI-SDR.

from analyzeAudio.analyzersUseFilename import (
    analyzePSNRmean,
    analyzeSDRmean,
    analyzeSI_SDRmean,
)

pathReference = "reference.wav"
pathEstimate = "estimate.wav"

psnr = analyzePSNRmean(pathReference, pathEstimate)
sdr = analyzeSDRmean(pathReference, pathEstimate)
si_sdr = analyzeSI_SDRmean(pathReference, pathEstimate)

Tensor waveform contests usually compare two PyTorch waveform tensors:

Function	What it compares
`analyzeL1SNRMean`	Mean L1 signal-to-noise ratio.
`analyzeL1SNRDBMean`	Mean L1 signal-to-noise ratio in decibels.
`analyzeMultiL1SNRDBMean`	Multi-source L1 SNR in decibels.
`analyzeSTFTL1SNRDBMean`	STFT-domain L1 SNR in decibels.
`analyzeLogWMSEMean`	Mean log weighted MSE audio-quality score for reference, estimate, and mixture tensors.
`analyzeDCLoss`	DC loss.
`analyzeESRLoss`	Error-to-signal ratio loss.
`analyzeLogCoshLoss`	Log-cosh loss.
`analyzeSNRLoss`	Signal-to-noise ratio loss.
`analyzeSISDRLoss`	Scale-invariant SDR loss.
`analyzeSDSDRLoss`	Scale-dependent SDR loss.
`analyzeSTFTLoss`	STFT loss.
`analyzeMelSTFTLoss`	Mel-STFT loss.
`analyzeChromaSTFTLoss`	Chroma-STFT loss.
`analyzeMultiResolutionSTFTLoss`	Multi-resolution STFT loss.
`analyzeRandomResolutionSTFTLoss`	Random-resolution STFT loss.
`analyzeSumAndDifferenceSTFTLoss`	Sum-and-difference STFT loss.

from analyzeAudio.contestsTensor import (
    analyzeL1SNRDBMean,
    analyzeMultiResolutionSTFTLoss,
)

l1snrdb = analyzeL1SNRDBMean(tensorReference, tensorEstimate)
mrstft = analyzeMultiResolutionSTFTLoss(tensorReference, tensorEstimate)

analyzeLogWMSEMean also needs the original mixture and sample rate:

from analyzeAudio.contestsTensor import analyzeLogWMSEMean

logwmse = analyzeLogWMSEMean(
    tensorReference,
    tensorEstimate,
    tensorMixture,
    sampleRate,
)

Tensor spectrogram contests compare two PyTorch magnitude spectrogram tensors:

Function	What it compares
`analyzeSpectralConvergenceLoss`	Spectral convergence loss.
`analyzeSTFTMagnitudeLoss`	STFT magnitude loss.
`analyzeL1FrequencyLoss`	L1 frequency score.

from analyzeAudio.contestsTensorSpectrogram import (
    analyzeSpectralConvergenceLoss,
    analyzeSTFTMagnitudeLoss,
)

spectralConvergence = analyzeSpectralConvergenceLoss(
    tensorSpectrogramMagnitudeReference,
    tensorSpectrogramMagnitudeEstimate,
)
stftMagnitude = analyzeSTFTMagnitudeLoss(
    tensorSpectrogramMagnitudeReference,
    tensorSpectrogramMagnitudeEstimate,
)

NumPy spectrogram helpers compare two magnitude spectrograms:

Function	What it returns
`analyzeBleedFullMelDB`	Arrays of added and missing mel-scaled dB content.
`analyzeBleedFullMelDBMean`	Two scores: `bleed` and `full`.

from analyzeAudio.contestsSpectrogram import analyzeBleedFullMelDBMean

bleedFull = analyzeBleedFullMelDBMean(
    spectrogramMagnitudeReference,
    spectrogramMagnitudeEstimate,
)
print(bleedFull.bleed, bleedFull.full)

Exact-name checks

The tables above describe what is in the package. These helpers are available when you want a copyable list from the installed version:

from analyzeAudio import getListAvailableAudioAspects, getListAvailableAudioContests

print(getListAvailableAudioAspects())
print(getListAvailableAudioContests())

The terminal commands are:

whatAspects
whatContests

API standardization

A top priority for this package is a public API that is as standardized as possible across filename, waveform, spectrogram, tensor, and contest analyzers. The package wraps libraries with very different calling conventions, but analyzer function signatures should model this package's dispatcher inputs, not every underlying library option.

Wishlist

Overhaul the semiotic system.
Install FFmpeg in GitHub Actions for testing.
Assist with installing FFmpeg in arbitrary environments.

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Common name: spectral flatness
DOI: 10.1109/TASSP.1974.1162572. Download BibTeX citation.
Implementation: librosa/librosa.feature.spectral_flatness

Perceptual Effects of Spectral Modifications on Musical Timbres

DOI: 10.1121/1.381843. Download BibTeX citation.

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Common name: spectral entropy
DOI: 10.21437/ICSLP.1998-527. Download BibTeX citation.

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

Common name: chroma features
Proceedings: University of Michigan ICMC archive. Download BibTeX citation.
Implementation: librosa/librosa.feature.chroma_stft

A Robust Audio Classification and Segmentation Method

Technical report: Microsoft Research. Download BibTeX citation.
Implementations:

Music Type Classification by Spectral Contrast Feature

Common name: spectral contrast
DOI: 10.1109/ICME.2002.1035731. Download BibTeX citation.
Free PDF: Tsinghua University
Implementation: librosa/librosa.feature.spectral_contrast

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Common names: RMS, zero-crossing rate
DOI: 10.1109/TMM.2004.840604. Download BibTeX citation.
Free author proof: University of Crete
Implementations:
- librosa/librosa.feature.rms
- librosa/librosa.feature.zero_crossing_rate

Zero-Crossing Rate

Common name: zero-crossing rate
Online chapter: Introduction to Speech Processing. Download BibTeX citation.
Implementation: librosa/librosa.feature.zero_crossing_rate

Performance Measurement in Blind Audio Source Separation

Common name: BSS Eval SDR
DOI: 10.1109/TSA.2005.858005. Download BibTeX citation.
Free author PDF: IRISA
Implementations:

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

Proceedings: ISMIR 2006. Download BibTeX citation.
Implementation: librosa/librosa.feature.chroma_stft

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

Common name: tempogram
DOI: 10.1109/ICASSP.2010.5495219. Download BibTeX citation.
Free author PDF: AudioLabs Erlangen
Implementations:
- librosa/librosa.feature.tempogram
- Vamp Tempogram Plugin

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Common name: SRMR
DOI: 10.1109/TASL.2010.2052247. Download BibTeX citation.
Free author PDF: MUSEA Lab
Implementation:
- Lightning-AI/torchmetrics
  - SRMR official documentation.
  - Python source with implementation details for AI agents.

Perceptual Loss Function for Neural Modelling of Audio Systems

Common names: ESR loss, DC loss
arXiv abstract: 1911.08922. Download BibTeX citation.
TeX source with formulas for AI agents: arXiv source

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

Common name: log-cosh loss
OpenReview page: rkglvsC9Ym. Download BibTeX citation.

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Common name: logWMSE
Implementations:
- nomonosound/log-wmse-audio-quality. Download BibTeX citation.
- crlandsc/torch-log-wmse. Download BibTeX citation.

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Common names: spectral convergence, STFT magnitude loss terms
DOI: 10.48550/arXiv.1808.06719. Download BibTeX citation.
arXiv abstract: 1808.06719; TeX source with formulas for AI agents: arXiv source

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

DOI: 10.48550/arXiv.1904.04472. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Common name: multi-resolution STFT
DOI: 10.48550/arXiv.1910.11480. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

auraloss: Audio focused loss functions in PyTorch

Common names: random-resolution STFT loss implementation source
Workshop paper PDF: DMRN+15 PDF. Download BibTeX citation.
Implementation: csteinmetz1/auraloss. BibTeX citation for the source repository.

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Common names: sum-and-difference STFT loss in neural mixing
DOI: 10.48550/arXiv.2010.10291. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

Neural source-filter waveform models for statistical parametric speech synthesis

Related in auraloss docs for multi-resolution spectral training context
DOI: 10.48550/arXiv.1904.12088. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

DDSP: Differentiable Digital Signal Processing

DOI: 10.48550/arXiv.2001.04643. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Common name: L1SNR reference
DOI: 10.1109/OJSP.2023.3339428. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Common name: L1SNR reference
DOI: 10.48550/arXiv.2406.18747. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

Common name: L1SNRDB reference
DOI: 10.48550/arXiv.2501.16171. Download BibTeX citation. TeX source with formulas for AI agents: arXiv source

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Common name: torch-l1-snr
Source: crlandsc/torch-l1-snr. Download BibTeX citation.

Packages and documentation

My recovery

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hunterhogan

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Jul 1, 2026

0.7.0

Jun 26, 2026

This version

0.6.0

Jun 16, 2026

0.5.0

Jun 14, 2026

0.4.0

Jun 10, 2026

0.3.0

Jun 8, 2026

0.2.0

Jun 1, 2026

0.1.1

May 30, 2026

0.1.0

May 30, 2026

0.0.20

May 22, 2026

0.0.19

May 20, 2026

0.0.18

Feb 4, 2026

0.0.17

Jul 11, 2025

0.0.16

May 20, 2025

0.0.15

Mar 30, 2025

0.0.14

Mar 7, 2025

0.0.13

Mar 6, 2025

0.0.12

Mar 4, 2025

0.0.11

Jan 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyzeaudio-0.6.0.tar.gz (97.4 kB view details)

Uploaded Jun 16, 2026 Source

File details

Details for the file analyzeaudio-0.6.0.tar.gz.

File metadata

Download URL: analyzeaudio-0.6.0.tar.gz
Upload date: Jun 16, 2026
Size: 97.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for analyzeaudio-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`923b78b3218de1b8d249e9107777c6e10bbb232568164d0cf94432a22a962401`
MD5	`17309cfa0b91c2797cb50f24f33e1ff0`
BLAKE2b-256	`6def058db92e178c3fd0f14fd5f4453f526688bf5e223f3e3167c7725426876e`

See more details on using hashes here.

analyzeAudio 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

analyzeAudio

Note well: FFmpeg & FFprobe binaries must be in PATH

Install FFmpeg on Google Colab

What is in the package

One-file measurements

Measure one file

Measure many files

Save measurements

Get detailed arrays

Use audio already loaded in Python

Two-input comparisons

Exact-name checks

API standardization

Wishlist

Reference materials

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Perceptual Effects of Spectral Modifications on Musical Timbres

Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments

Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music

A Robust Audio Classification and Segmentation Method

Music Type Classification by Spectral Contrast Feature

A Speech/Music Discriminator Based on RMS and Zero-Crossings

Zero-Crossing Rate

Performance Measurement in Blind Audio Source Separation

Automatic Chord Recognition from Audio Using a HMM with Supervised Learning

Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals

A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

Signal Processing for Music Analysis

The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals

Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata

SDR - Half-Baked or Well Done?

Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation

Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation

Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level

An Overview on Sound Features in Time and Frequency Domain

Perceptual Loss Function for Neural Modelling of Audio Systems

Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder

logWMSE Audio Quality Metric and PyTorch Loss Implementation

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

auraloss: Audio focused loss functions in PyTorch

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Neural source-filter waveform models for statistical parametric speech synthesis

DDSP: Differentiable Digital Signal Processing

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries

torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch

Packages and documentation

My recovery

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes