Measure one or more aspects of one or more audio files.
Project description
analyzeAudio
Measure one or more aspects of one or more audio files.
Note well: FFmpeg & FFprobe binaries must be in PATH
Some options to download FFmpeg and FFprobe at ffmpeg.org.
Install FFmpeg on Google Colab
from analyzeAudio.ffmpeg import verifyFFmpegColab
verifyFFmpegColab()
What is in the package
analyzeAudio provides two user-facing kinds of audio analysis.
- Audio aspects measure one audio file.
- Audio contests compare two audio files, waveforms, tensors, or spectrograms.
The main user workflows are:
| What you want | Use |
|---|---|
| One value for each selected measurement on one file | analyzeAudioFile |
| The same selected measurements for many files | analyzeAudioListPathFilenames |
| A TSV, CSV, or other delimited output file | dataTabularTOpathFilenameDelimited |
| One specific measurement or detailed frame data | Import a direct analyzer function |
| One comparison score between two files | Import a filename contest function |
| One comparison score between two tensors or spectrograms | Import a tensor or spectrogram contest function |
One-file measurements
Use these names with analyzeAudioFile or
analyzeAudioListPathFilenames. Names are case-sensitive.
Loudness and true peak:
| Name | What it measures |
|---|---|
LUFS integrated |
Whole-file integrated loudness. |
LUFS momentary maximum |
Maximum momentary loudness. |
LUFS short-term maximum |
Maximum short-term loudness. |
LUFS loudness range |
Loudness range. |
LUFS low |
Low loudness range boundary. |
LUFS high |
High loudness range boundary. |
true_peak maximum |
Maximum true peak level. |
Signal level, dynamics, and samples:
| Name | What it measures |
|---|---|
RMS_level overall |
Overall RMS level. |
RMS_peak overall |
Overall RMS peak. |
RMS_trough overall |
Overall RMS trough. |
RMS_difference overall |
Overall RMS difference between adjacent samples. |
Peak_level overall |
Overall peak level. |
Peak_count total |
Total detected peak count. |
Abs_Peak_count total |
Total absolute peak count. |
Crest_factor mean |
Mean crest factor. |
Dynamic_range overall |
Overall dynamic range. |
DC_offset mean |
Mean DC offset. |
Bit_depth mean |
Mean detected bit depth. |
Entropy mean |
Mean signal entropy. |
Flat_factor mean |
Mean flat factor. |
Max_difference overall |
Maximum sample difference. |
Max_level overall |
Maximum sample level. |
Mean_difference mean |
Mean adjacent-sample difference. |
Min_difference overall |
Minimum sample difference. |
Min_level overall |
Minimum sample level. |
Noise_floor overall |
Overall noise floor. |
Noise_floor_count total |
Total noise-floor count. |
Number_of_samples total |
Total samples. |
Zero_crossings total |
Total zero crossings. |
Zero_crossings_rate overall |
Overall zero-crossing rate. |
Spectral measurements:
| Name | What it measures |
|---|---|
Power spectral density mean |
Mean power spectral density. |
Spectral centroid mean |
Mean spectral center of mass from filename analysis. |
Spectral crest mean |
Mean spectral crest. |
Spectral decrease mean |
Mean spectral decrease. |
Spectral entropy mean |
Mean spectral entropy. |
Spectral flatness mean |
Mean spectral flatness from filename analysis. |
Spectral flux mean |
Mean spectral flux. |
Spectral kurtosis mean |
Mean spectral kurtosis. |
Spectral rolloff mean |
Mean spectral rolloff. |
Spectral skewness mean |
Mean spectral skewness. |
Spectral slope mean |
Mean spectral slope. |
Spectral spread mean |
Mean spectral spread. |
Spectral variance mean |
Mean spectral variance. |
Spectral Bandwidth mean |
Mean librosa spectral bandwidth. |
Spectral Centroid mean |
Mean librosa spectral centroid. |
Spectral Contrast mean |
Mean librosa spectral contrast. |
Spectral Flatness mean |
Mean librosa spectral flatness ratio. |
Spectral Flatness dB mean |
Mean librosa spectral flatness in decibels. |
Chromagram mean |
Mean chroma energy across pitch classes. |
Waveform, rhythm, and speech measurements:
| Name | What it measures |
|---|---|
RMS Waveform mean |
Mean waveform RMS amplitude. |
RMS Waveform dB mean |
Mean waveform RMS level in decibels. |
Tempogram mean |
Mean tempogram value. |
Tempo mean |
Mean estimated tempo. |
Zero Crossing Rate mean |
Mean waveform zero-crossing rate. |
SRMR mean |
Mean speech-to-reverberation modulation energy ratio. |
Some names intentionally differ only by capitalization or wording. For example,
Spectral flatness mean and Spectral Flatness mean are different
measurements. Use the exact name from the table.
Measure one file
from analyzeAudio import analyzeAudioFile
listAspectNames = [
"LUFS integrated",
"true_peak maximum",
"RMS_level overall",
"Spectral Flatness mean",
"Zero Crossing Rate mean",
]
listValues = analyzeAudioFile("voice.wav", listAspectNames)
measurements = dict(zip(listAspectNames, listValues, strict=True))
print(measurements)
analyzeAudioFile returns one value for each requested name, in the same order.
If a requested name is unavailable, that value is "not found".
Measure many files
from pathlib import Path
from analyzeAudio import analyzeAudioListPathFilenames
listPathFilenames = tuple(Path("audio").glob("*.wav"))
listAspectNames = [
"LUFS integrated",
"LUFS loudness range",
"true_peak maximum",
]
rows = analyzeAudioListPathFilenames(
listPathFilenames,
listAspectNames,
CPUlimit=4,
)
for row in rows:
print(row)
Each row starts with the analyzed filename, followed by the requested values. Rows are returned as files finish, so row order can differ from input order.
Save measurements
from analyzeAudio import (
analyzeAudioListPathFilenames,
dataTabularTOpathFilenameDelimited,
)
listAspectNames = ["LUFS integrated", "true_peak maximum"]
rows = analyzeAudioListPathFilenames(["one.wav", "two.wav"], listAspectNames)
dataTabularTOpathFilenameDelimited(
"measurements.tsv",
rows,
["pathFilename", *listAspectNames],
)
For CSV output, use a comma delimiter:
dataTabularTOpathFilenameDelimited(
"measurements.csv",
rows,
["pathFilename", *listAspectNames],
delimiterOutput=",",
)
Get detailed arrays
Summary names usually return one number. Direct analyzer functions without summary words usually return the per-frame, per-channel, or per-band values.
from analyzeAudio.analyzersUseFilename import (
analyzeLUFSIntegratedOverall,
analyzeLUFSMomentary,
)
integrated = analyzeLUFSIntegratedOverall("voice.wav")
momentaryFrames = analyzeLUFSMomentary("voice.wav")
Use audio already loaded in Python
Waveform analyzers accept waveform samples shaped as channels by samples.
import numpy
import soundfile
from analyzeAudio.analyzersUseWaveform import (
analyzeRMSWaveformMean,
analyzeTempoMean,
analyzeZeroCrossingRateMean,
)
with soundfile.SoundFile("voice.wav") as audioFile:
sampleRate = audioFile.samplerate
waveform = audioFile.read(dtype="float32", always_2d=True).astype(numpy.float32).T
rms = analyzeRMSWaveformMean(waveform)
tempo = analyzeTempoMean(waveform, sampleRate)
zeroCrossingRate = analyzeZeroCrossingRateMean(waveform)
Spectrogram analyzers accept magnitude or power spectrograms.
import librosa
import numpy
from analyzeAudio.analyzersUseSpectrogram import (
analyzeChromagramMean,
analyzeSpectralCentroidMean,
)
spectrogram = librosa.stft(waveform)
spectrogramMagnitude = numpy.absolute(spectrogram)
spectrogramPower = spectrogramMagnitude**2
spectralCentroid = analyzeSpectralCentroidMean(spectrogramMagnitude)
chromagram = analyzeChromagramMean(spectrogramPower, sampleRate)
Two-input comparisons
Filename contests compare two audio files:
| Function | What it compares |
|---|---|
analyzePSNRmean |
Mean peak signal-to-noise ratio. |
analyzeSDRmean |
Mean signal-to-distortion ratio. |
analyzeSI_SDRmean |
Mean scale-invariant signal-to-distortion ratio. |
analyzeKPSNRmean |
Bounded score from PSNR. |
analyzeKSDRmean |
Bounded score from SDR. |
analyzeKSI_SDRmean |
Bounded score from SI-SDR. |
from analyzeAudio.analyzersUseFilename import (
analyzePSNRmean,
analyzeSDRmean,
analyzeSI_SDRmean,
)
pathReference = "reference.wav"
pathEstimate = "estimate.wav"
psnr = analyzePSNRmean(pathReference, pathEstimate)
sdr = analyzeSDRmean(pathReference, pathEstimate)
si_sdr = analyzeSI_SDRmean(pathReference, pathEstimate)
Tensor waveform contests usually compare two PyTorch waveform tensors:
| Function | What it compares |
|---|---|
analyzeL1SNRMean |
Mean L1 signal-to-noise ratio. |
analyzeL1SNRDBMean |
Mean L1 signal-to-noise ratio in decibels. |
analyzeMultiL1SNRDBMean |
Multi-source L1 SNR in decibels. |
analyzeSTFTL1SNRDBMean |
STFT-domain L1 SNR in decibels. |
analyzeLogWMSEMean |
Mean log weighted MSE audio-quality score for reference, estimate, and mixture tensors. |
analyzeDCLoss |
DC loss. |
analyzeESRLoss |
Error-to-signal ratio loss. |
analyzeLogCoshLoss |
Log-cosh loss. |
analyzeSNRLoss |
Signal-to-noise ratio loss. |
analyzeSISDRLoss |
Scale-invariant SDR loss. |
analyzeSDSDRLoss |
Scale-dependent SDR loss. |
analyzeSTFTLoss |
STFT loss. |
analyzeMelSTFTLoss |
Mel-STFT loss. |
analyzeChromaSTFTLoss |
Chroma-STFT loss. |
analyzeMultiResolutionSTFTLoss |
Multi-resolution STFT loss. |
analyzeRandomResolutionSTFTLoss |
Random-resolution STFT loss. |
analyzeSumAndDifferenceSTFTLoss |
Sum-and-difference STFT loss. |
from analyzeAudio.contestsTensor import (
analyzeL1SNRDBMean,
analyzeMultiResolutionSTFTLoss,
)
l1snrdb = analyzeL1SNRDBMean(tensorReference, tensorEstimate)
mrstft = analyzeMultiResolutionSTFTLoss(tensorReference, tensorEstimate)
analyzeLogWMSEMean also needs the original mixture and sample rate:
from analyzeAudio.contestsTensor import analyzeLogWMSEMean
logwmse = analyzeLogWMSEMean(
tensorReference,
tensorEstimate,
tensorMixture,
sampleRate,
)
Tensor spectrogram contests compare two PyTorch magnitude spectrogram tensors:
| Function | What it compares |
|---|---|
analyzeSpectralConvergenceLoss |
Spectral convergence loss. |
analyzeSTFTMagnitudeLoss |
STFT magnitude loss. |
analyzeL1FrequencyLoss |
L1 frequency score. |
from analyzeAudio.contestsTensorSpectrogram import (
analyzeSpectralConvergenceLoss,
analyzeSTFTMagnitudeLoss,
)
spectralConvergence = analyzeSpectralConvergenceLoss(
tensorSpectrogramMagnitudeReference,
tensorSpectrogramMagnitudeEstimate,
)
stftMagnitude = analyzeSTFTMagnitudeLoss(
tensorSpectrogramMagnitudeReference,
tensorSpectrogramMagnitudeEstimate,
)
NumPy spectrogram helpers compare two magnitude spectrograms:
| Function | What it returns |
|---|---|
analyzeBleedFullMelDB |
Arrays of added and missing mel-scaled dB content. |
analyzeBleedFullMelDBMean |
Two scores: bleed and full. |
from analyzeAudio.contestsSpectrogram import analyzeBleedFullMelDBMean
bleedFull = analyzeBleedFullMelDBMean(
spectrogramMagnitudeReference,
spectrogramMagnitudeEstimate,
)
print(bleedFull.bleed, bleedFull.full)
Exact-name checks
The tables above describe what is in the package. These helpers are available when you want a copyable list from the installed version:
from analyzeAudio import getListAvailableAudioAspects, getListAvailableAudioContests
print(getListAvailableAudioAspects())
print(getListAvailableAudioContests())
The terminal commands are:
whatAspects
whatContests
API standardization
A top priority for this package is a public API that is as standardized as possible across filename, waveform, spectrogram, tensor, and contest analyzers. The package wraps libraries with very different calling conventions, but analyzer function signatures should model this package's dispatcher inputs, not every underlying library option.
Wishlist
- Overhaul the semiotic system.
- Install FFmpeg in GitHub Actions for testing.
- Improve speed
- Sophisticated caching of large objects and un-hashable objects.
Reference materials
A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis
- Common name: spectral flatness
- BibTeX citation.
- DOI: 10.1109/TASSP.1974.1162572
- IEEE Xplore: document 1162647
- Implementation:
- librosa/librosa.feature.spectral_flatness
Perceptual Effects of Spectral Modifications on Musical Timbres
Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments
- Common name: spectral entropy
- BibTeX citation.
- DOI: 10.21437/ICSLP.1998-527
- Proceedings: ISCA Archive
- Free author PDF: Columbia University
Realtime Chord Recognition of Musical Sound: A System Using Common Lisp Music
- Common name: chroma features
- BibTeX citation.
- Proceedings: University of Michigan ICMC archive
- CCRMA HTML copy: Stanford CCRMA
- Implementation:
- librosa/librosa.feature.chroma_stft
A Robust Audio Classification and Segmentation Method
- BibTeX citation.
- Technical report: Microsoft Research
- Free PDF: Microsoft Research
- Implementations:
Music Type Classification by Spectral Contrast Feature
- Common name: spectral contrast
- BibTeX citation.
- DOI: 10.1109/ICME.2002.1035731
- Free PDF: Tsinghua University
- Implementation:
- librosa/librosa.feature.spectral_contrast
A Speech/Music Discriminator Based on RMS and Zero-Crossings
- Common names: RMS, zero-crossing rate
- BibTeX citation.
- DOI: 10.1109/TMM.2004.840604
- Free author proof: University of Crete
- Implementation:
- librosa/librosa.feature.rms
- librosa/librosa.feature.zero_crossing_rate
Zero-Crossing Rate
- Common name: zero-crossing rate
- BibTeX citation.
- Online chapter: Introduction to Speech Processing
- Implementation:
- librosa/librosa.feature.zero_crossing_rate
Performance Measurement in Blind Audio Source Separation
- Common name: BSS Eval SDR
- BibTeX citation.
- DOI: 10.1109/TSA.2005.858005
- Free author PDF: IRISA
- Implementations:
Automatic Chord Recognition from Audio Using a HMM with Supervised Learning
- BibTeX citation.
- Proceedings: ISMIR 2006
- Free PDF: Stanford CCRMA
- Implementation:
- librosa/librosa.feature.chroma_stft
Cyclic Tempogram: A Mid-Level Tempo Representation for Music Signals
- Common name: tempogram
- BibTeX citation.
- DOI: 10.1109/ICASSP.2010.5495219
- Free author PDF: AudioLabs Erlangen
- Implementations:
- librosa/librosa.feature.tempogram
- Vamp Tempogram Plugin
A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech
- Common name: SRMR
- BibTeX citation.
- DOI: 10.1109/TASL.2010.2052247
- Free author PDF: MUSEA Lab
- Implementation:
Signal Processing for Music Analysis
- BibTeX citation.
- DOI: 10.1109/JSTSP.2011.2112333
- Free author PDF: Columbia University
- Implementation:
The Timbre Toolbox: Extracting Audio Descriptors from Musical Signals
- BibTeX citation.
- DOI: 10.1121/1.3642604
- Free PDF: McGill University
- Implementations:
Blind Audio Watermarking Technique Based on Two Dimensional Cellular Automata
- Common name: APSNR reference
- BibTeX citation.
- DOI: 10.14257/ijsia.2016.10.9.18
- Free repository copy: Universidad Autonoma de Madrid
- Implementation:
SDR - Half-Baked or Well Done?
- Common name: SI-SDR
- BibTeX citation. TeX Source with precise formulas for AI agents.
- DOI: 10.1109/ICASSP.2019.8683855
- Free author PDF: Jonathan Le Roux
- Implementations:
Loudness Metering: EBU Mode Metering to Supplement Loudness Normalisation
- Common name: momentary LUFS
- BibTeX citation.
- Standard: EBU Tech 3341
- Free PDF: EBU
- Implementation:
Loudness Range: A Measure to Supplement EBU R 128 Loudness Normalisation
- Common name: LUFS
- BibTeX citation.
- Standard: EBU Tech 3342
- Free PDF: EBU
- Implementation:
Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level
- Common name: True peak
- BibTeX citation.
- Standard: ITU-R BS.1770-5
- Free PDF: ITU
- Implementation:
An Overview on Sound Features in Time and Frequency Domain
- BibTeX citation.
- DOI: 10.2478/ijasitels-2023-0006
- Open access article: Reference Global
- PDF: Reference Global
Perceptual Loss Function for Neural Modelling of Audio Systems
- Common names: ESR loss, DC loss
- Used by:
analyzeESRLoss,analyzeDCLoss - BibTeX citation.
- arXiv abstract: 1911.08922
- TeX source with formulas for AI agents: arXiv source
- PDF: arXiv PDF
Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder
- Common name: log-cosh loss
- Used by:
analyzeLogCoshLoss - BibTeX citation.
- OpenReview page: rkglvsC9Ym
- PDF: OpenReview PDF
logWMSE Audio Quality Metric and PyTorch Loss Implementation
- Common name: logWMSE
- Used by:
analyzeLogWMSEMean - Original implementation:
- PyTorch implementation:
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
- Common names: spectral convergence, STFT magnitude loss terms
- Used by:
analyzeSpectralConvergenceLoss,analyzeSTFTMagnitudeLoss,analyzeSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1808.06719
- arXiv abstract: 1808.06719
- TeX source with formulas for AI agents: arXiv source
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation
- Used by:
analyzeSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1904.04472
- arXiv abstract: 1904.04472
- TeX source with formulas for AI agents: arXiv source
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
- Common name: multi-resolution STFT
- Used by:
analyzeMultiResolutionSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.1910.11480
- arXiv abstract: 1910.11480
- TeX source with formulas for AI agents: arXiv source
auraloss: Audio focused loss functions in PyTorch
- Common names: random-resolution STFT loss implementation source
- Used by:
analyzeRandomResolutionSTFTLoss - BibTeX citation.
- Workshop paper PDF: DMRN+15 PDF
- Source:
Automatic multitrack mixing with a differentiable mixing console of neural audio effects
- Common names: sum-and-difference STFT loss in neural mixing
- Used by:
analyzeSumAndDifferenceSTFTLoss - BibTeX citation.
- DOI: 10.48550/arXiv.2010.10291
- arXiv abstract: 2010.10291
- TeX source with formulas for AI agents: arXiv source
Neural source-filter waveform models for statistical parametric speech synthesis
- Related in auraloss docs for multi-resolution spectral training context
- BibTeX citation.
- DOI: 10.48550/arXiv.1904.12088
- arXiv abstract: 1904.12088
- TeX source with formulas for AI agents: arXiv source
DDSP: Differentiable Digital Signal Processing
- Related in auraloss docs for STFT-magnitude formulation context
- BibTeX citation.
- DOI: 10.48550/arXiv.2001.04643
- arXiv abstract: 2001.04643
- TeX source with formulas for AI agents: arXiv source
A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
- Common name: L1SNR reference
- Used by:
analyzeL1SNRMean - BibTeX citation.
- DOI: 10.1109/OJSP.2023.3339428
- arXiv abstract: 2309.02539
- TeX source with formulas for AI agents: arXiv source
A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems
- Common name: L1SNR reference
- Used by:
analyzeL1SNRMean,analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- DOI: 10.48550/arXiv.2406.18747
- arXiv abstract: 2406.18747
- arXiv HTML used by docstrings: 2406.18747v2
- TeX source with formulas for AI agents: arXiv source
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries
- Common name: L1SNRDB reference
- Used by:
analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- DOI: 10.48550/arXiv.2501.16171
- arXiv abstract: 2501.16171
- arXiv HTML used by docstrings: 2501.16171v1
- TeX source with formulas for AI agents: arXiv source
torch-l1-snr: L1 Signal-to-Noise Ratio Loss Functions for Audio Source Separation in PyTorch
- Common name: torch-l1-snr
- Used by:
analyzeL1SNRMean,analyzeL1SNRDBMean,analyzeMultiL1SNRDBMean,analyzeSTFTL1SNRDBMean - BibTeX citation.
- Source: crlandsc/torch-l1-snr
Packages and documentation
- analyzeAudio
- FFmpeg documentation
- librosa/librosa
- Lightning-AI/torchmetrics
- PyTorch
torch.nn.Module - sigsep/sigsep-mus-eval
- mir-evaluation/mir_eval
My recovery
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file analyzeaudio-0.5.0.tar.gz.
File metadata
- Download URL: analyzeaudio-0.5.0.tar.gz
- Upload date:
- Size: 96.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e895283addc5d064960f74a27e214563510c3f002397eddeca0002991d18654c
|
|
| MD5 |
07b6b1e68bb660322cf0b195e2dfcb14
|
|
| BLAKE2b-256 |
4216eb5418d103d990cf7c1a07432632d7e80e78475b6d1920e4f5ade8acc29e
|