Skip to main content

A comprehensive PyTorch-based audio feature extraction library for machine learning, research, and audio analysis

Project description

audiofeat

PyPI version Python License: MIT

A comprehensive PyTorch-based audio feature extraction library for speech research, music analysis, and audio ML pipelines. Extract 140+ features across temporal, spectral, cepstral, pitch, voice-quality, and rhythm domains — from a single pip install.

import audiofeat

features = audiofeat.extract_features_from_file("recording.wav")
print(features["f0_mean_hz"], features["rms_mean"], features["mfcc_0_mean"])

Why audiofeat?

  • One library, all features. Temporal, spectral, cepstral, pitch, voice quality, rhythm, formants, and tonal features in a single package.
  • PyTorch-first. Every feature returns a torch.Tensor. Plug directly into your training loop — no numpy-to-tensor conversion needed.
  • Librosa-grade accuracy. Primary paths delegate to librosa when available for bit-exact parity; pure-PyTorch fallbacks when it's not installed.
  • Beginner to production. Use individual functions for exploration, or the built-in CLI and batch extraction for production pipelines.
  • Validated. Built-in Praat comparison tooling and a gold-standard scorecard for reproducible research.

Features

Temporal Features

Feature Function Description
RMS rms() Root-mean-square amplitude per frame
Short-Time Energy short_time_energy() Sum of squared signal values in each frame
Zero-Crossing Rate zero_crossing_rate() Rate at which the signal changes sign
Zero-Crossing Count zero_crossing_count() Number of zero-crossings per frame
Loudness loudness() Perceptual loudness estimation
Log Attack Time log_attack_time() MPEG-7 style attack time (10%–90% rise)
Decay Time decay_time() Time for envelope to decay from peak
Temporal Centroid temporal_centroid() Center of gravity of the amplitude envelope
Amplitude Modulation amplitude_modulation_depth() Depth of amplitude modulation over a sliding window
Entropy of Energy entropy_of_energy() Abrupt changes in energy within a frame
Teager Energy teager_energy_operator() Teager-Kaiser energy for amplitude/frequency tracking
Breath Group Duration breath_group_duration() Estimated duration of breath groups
Speech Rate speech_rate() Syllables per second estimation
Tristimulus tristimulus() T1/T2/T3 timbre ratios from harmonic amplitudes

Spectral Features

Feature Function Description
Spectral Centroid spectral_centroid() Center of mass of the spectrum
Spectral Rolloff spectral_rolloff() Frequency below which X% of energy is concentrated
Spectral Flux spectral_flux() Rate of change of the power spectrum
Spectral Flatness spectral_flatness() How noise-like a sound is (Wiener entropy)
Spectral Entropy spectral_entropy() Randomness of the spectral distribution
Spectral Bandwidth spectral_bandwidth() Spread of the spectrum around the centroid
Spectral Spread spectral_spread() Standard deviation of the spectral distribution
Spectral Slope spectral_slope() Linear regression slope fitted to the spectrum
Spectral Skewness spectral_skewness() Asymmetry of the spectral distribution
Spectral Crest Factor spectral_crest_factor() Peak-to-average ratio (peakiness)
Spectral Contrast spectral_contrast() Peak-valley amplitude difference across sub-bands
Spectral Deviation spectral_deviation() Jaggedness of the spectral envelope
Spectral Sharpness spectral_sharpness() Perceived sharpness (Zwicker model)
Spectral Roughness spectral_roughness() Sensory dissonance measure
Spectral Tonality spectral_tonality() Tonal vs. noise-like character
Spectral Irregularity spectral_irregularity() Irregularity of the spectral envelope
Low-High Energy Ratio low_high_energy_ratio() Energy below 1 kHz vs. above 3 kHz
HNR harmonic_to_noise_ratio() Harmonic-to-noise ratio
Harmonic Richness harmonic_richness_factor() Richness of harmonic content
Inharmonicity inharmonicity_index() Inharmonicity of the spectrum
Phase Coherence phase_coherence() Phase coherence across frequency bins
Sibilant Peak sibilant_spectral_peak_frequency() Peak frequency in the sibilant region

Spectrograms & Transforms

Feature Function Description
Linear Spectrogram linear_spectrogram() STFT magnitude spectrogram
Mel Spectrogram mel_spectrogram() Mel-scaled frequency spectrogram
Log Mel Spectrogram log_mel_spectrogram() Log-scaled Mel spectrogram
CQT Spectrogram cqt_spectrogram() Constant-Q transform (log-frequency bins)
MFCCs mfcc() Mel-Frequency Cepstral Coefficients
Chroma chroma() 12-bin pitch class intensity (chromagram)
Tonnetz tonnetz() 6D tonal centroid features

Formant Analysis

Feature Function Description
Formant Frequencies formant_frequencies() Extract F1, F2, F3, ... via Burg LPC
Formant Contours formant_contours() Time-varying formant trajectories
Formant Bandwidths formant_bandwidths() Bandwidth of each formant
Formant Dispersion formant_dispersion() Average spacing between formants

Linear Prediction

Feature Function Description
LPC lpc_coefficients() Linear Prediction Coefficients (Burg method)
LSP lsp_coefficients() Line Spectral Pairs from LPC

Cepstral Features

Feature Function Description
LPCC lpcc() Linear Predictive Cepstral Coefficients
GTCC gtcc() Gammatone Cepstral Coefficients
GFCC gfcc() Gammatone Frequency Cepstral Coefficients
ERB Cepstral erb_cepstral_coefficients() ERB-scale cepstral coefficients
Delta delta() First-order derivative of a feature contour
Delta-Delta delta_delta() Second-order derivative (acceleration)

Pitch Features

Feature Function Description
F0 (Autocorrelation) fundamental_frequency_autocorr() F0 via autocorrelation
F0 (YIN) fundamental_frequency_yin() F0 via YIN algorithm
F0 (pYIN) fundamental_frequency_pyin() Probabilistic YIN (requires librosa)
F0 (Praat) fundamental_frequency_praat() Exact Praat parity (requires parselmouth)
Pitch Strength pitch_strength() Strength of periodicity
Semitone Std Dev semitone_sd() F0 variation in semitones

Voice Quality Features

Feature Function Description
Jitter jitter() Cycle-to-cycle F0 variation
Jitter (local) jitter_local() Average absolute period difference (%)
Jitter (PPQ5) jitter_ppq5() Five-point Period Perturbation Quotient
Jitter (DDP) jitter_ddp() Difference of Differences of Periods
Shimmer shimmer() Cycle-to-cycle amplitude variation
Shimmer (local) shimmer_local() Local shimmer (%)
Shimmer (dB) shimmer_local_db() Shimmer in decibels
Shimmer (APQ3) shimmer_apq3() Three-point Amplitude Perturbation Quotient
Shimmer (DDA) shimmer_dda() Difference of Differences of Amplitudes
CPP cepstral_peak_prominence() Cepstral Peak Prominence for dysphonia detection
Alpha Ratio alpha_ratio() Energy ratio: 50–1000 Hz vs 1–5 kHz
Hammarberg Index hammarberg_index() Max energy ratio: 0–2 kHz vs 2–5 kHz
Harmonic Differences harmonic_differences() H1-H2, H1-A3, and other harmonic ratios
SHR subharmonic_to_harmonic_ratio() Subharmonic-to-harmonic power ratio
NAQ normalized_amplitude_quotient() Normalized Amplitude Quotient
Closed Quotient closed_quotient() Closed phase ratio from EGG
Soft Phonation Index soft_phonation_index() Low/high band energy ratio
GNE glottal_to_noise_excitation() Glottal-to-Noise Excitation ratio
MFDR maximum_flow_declination_rate() Maximum Flow Declination Rate
Vocal Fry Index vocal_fry_index() Ratio of fry frames to voiced frames
VOT voice_onset_time() Voice Onset Time estimation
Vocal Tract Length vocal_tract_length() Estimated from F1 and F2
Nasality Index nasality_index() Nasal vs. oral microphone energy

Rhythm Features

Feature Function Description
Tempo tempo() BPM estimation from onset autocorrelation
Beat Tracking beat_track() Beat positions in the audio signal
Onset Detection onset_detect() Transient event detection

Statistical Functionals

Apply to any time-series feature via compute_functionals(): mean, standard deviation, min, max, skewness, kurtosis.

Architecture

audiofeat
├── temporal/      # RMS, ZCR, energy, attack, loudness, rhythm, ...
├── spectral/      # Centroid, rolloff, flux, MFCCs, chroma, formants, ...
├── cepstral/      # LPCC, GTCC, ERB cepstral, deltas
├── pitch/         # Autocorrelation, YIN, pYIN, Praat backends
├── voice/         # Jitter, shimmer, CPP, harmonic ratios, glottal flow
├── rhythm/        # Beat detection
├── stats/         # Statistical functionals
├── io/            # Audio loading, single-file & batch extraction, CSV export
├── validation/    # Praat comparison, gold-standard scorecard
├── standards/     # openSMILE eGeMAPS/ComParE wrappers
└── catalog/       # Auto-discovered feature catalog

How it works: Each feature function checks if librosa is available. If so, it delegates to librosa's implementation for bit-exact parity with the research standard. If librosa is not installed, a pure-PyTorch fallback computes the same feature. Either way, you always get a torch.Tensor back.

Installation

Python >=3.8 is required. We recommend creating a virtual environment first.

pip (from PyPI)

pip install audiofeat

From source

git clone https://github.com/ankitshah009/audiofeat.git
cd audiofeat
pip install -e .

With a virtual environment

# Option A: venv
python -m venv .venv
source .venv/bin/activate
pip install audiofeat

# Option B: conda
conda create -n audiofeat python=3.11 -y
conda activate audiofeat
pip install audiofeat

# Option C: uv
uv venv && source .venv/bin/activate
uv pip install audiofeat

Optional extras

Extra What it adds Install command
dev pytest, black, mypy, flake8 pip install "audiofeat[dev]"
examples matplotlib, librosa, soundfile pip install "audiofeat[examples]"
validation Praat/parselmouth backend pip install "audiofeat[validation]"
standards openSMILE eGeMAPS/ComParE pip install "audiofeat[standards]"
models ASR, diarization, VAD, denoising pip install "audiofeat[models]"
full examples + validation + standards pip install "audiofeat[full]"

Quick Start

Extract features from a file

The simplest way to get started. This extracts all core features and returns a flat dictionary of summary statistics:

from audiofeat.io.features import extract_features_from_file

features = extract_features_from_file("path/to/audio.wav")

# What you get back:
print(features["f0_mean_hz"])        # Mean fundamental frequency
print(features["rms_mean"])          # Mean RMS energy
print(features["spectral_centroid_mean"])  # Mean spectral centroid
print(features["mfcc_0_mean"])       # Mean of first MFCC coefficient

Compute individual features

For fine-grained control, call feature functions directly. Every function accepts a 1D torch.Tensor waveform:

import torch
import audiofeat

# Load your audio (or use a test signal)
sr = 22050
waveform = torch.randn(sr * 3)  # 3 seconds of noise

# Temporal features
rms = audiofeat.rms(waveform, frame_length=2048, hop_length=512)
zcr = audiofeat.zero_crossing_rate(waveform, frame_length=2048, hop_length=512)

# Spectral features
centroid = audiofeat.spectral_centroid(waveform, frame_length=2048, hop_length=512, sample_rate=sr)
rolloff = audiofeat.spectral_rolloff(waveform, frame_length=2048, hop_length=512, sample_rate=sr)
contrast = audiofeat.spectral_contrast(waveform, sample_rate=sr)

# Cepstral features
mfccs = audiofeat.mfcc(waveform, sr)
chroma = audiofeat.chroma(waveform, sr)

# Pitch
f0 = audiofeat.fundamental_frequency_yin(waveform, fs=sr, frame_length=2048, hop_length=512)

# Voice quality
jit = audiofeat.jitter(waveform, fs=sr)
shim = audiofeat.shimmer(waveform, fs=sr)

# Every result is a torch.Tensor
print(f"RMS shape: {rms.shape}")
print(f"MFCCs shape: {mfccs.shape}")
print(f"F0 shape: {f0.shape}")

Load a real audio file

from audiofeat.io import load_audio

waveform, sr = load_audio("path/to/audio.wav", target_sr=16000)
# waveform is a 1D torch.Tensor, sr is an int

Aggregate over time with statistical functionals

from audiofeat import compute_functionals

rms = audiofeat.rms(waveform, frame_length=2048, hop_length=512)
stats = compute_functionals(rms)
# {'mean': tensor(...), 'std': tensor(...), 'min': tensor(...),
#  'max': tensor(...), 'skewness': tensor(...), 'kurtosis': tensor(...)}

Batch extraction to CSV

from audiofeat.io.features import extract_features_for_directory

extract_features_for_directory("audio_folder/", "output.csv")

CLI

audiofeat includes a command-line interface for common workflows.

audiofeat --help

Extract features from a single file

audiofeat extract recording.wav --output features.json

Batch extract an entire directory

audiofeat batch-extract audio_folder/ output.csv

Diagnose your environment

audiofeat doctor --audio-dir examples

Checks installed dependencies, verifies audio files are valid, and reports any issues.

Browse available features

audiofeat list-features
audiofeat list-features --format markdown --output FEATURES.md

Advanced Topics

For Praat validation, openSMILE integration, the gold-standard scorecard, and troubleshooting, see docs/VALIDATION.md.

For the auto-generated feature catalog, see docs/FEATURE_CATALOG.md.

Testing

pytest -q

With coverage:

pytest --cov=audiofeat --cov-report=term-missing -q

Contributing

We welcome contributions! If you have new features, bug fixes, or improvements, please open a pull request on GitHub.

Citation

If you use audiofeat in your research, please cite:

@phdthesis{shah2024computational,
  title={Computational Audition with Imprecise Labels},
  author={Shah, Ankit Parag},
  year={2024},
  school={Carnegie Mellon University Pittsburgh, PA}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiofeat-1.1.1.tar.gz (103.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiofeat-1.1.1-py3-none-any.whl (114.2 kB view details)

Uploaded Python 3

File details

Details for the file audiofeat-1.1.1.tar.gz.

File metadata

  • Download URL: audiofeat-1.1.1.tar.gz
  • Upload date:
  • Size: 103.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for audiofeat-1.1.1.tar.gz
Algorithm Hash digest
SHA256 3d93056e42be72ac809c81eb4f8fae9e1f0a1768093df28ee3553a9044a214e7
MD5 c9e7e4d8171c7aa8f90ddc776e7fdb2e
BLAKE2b-256 21ca25b58c8c312c4399ba7f1b80e39a4fab4ce9cc5a28d31b311dfeebbacd85

See more details on using hashes here.

File details

Details for the file audiofeat-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: audiofeat-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 114.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for audiofeat-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1645386f2419b59f62e57c827b0441b2ba5d7fcce08e279ddbe04cfb54f6b70b
MD5 b661ecf7c0d5cca2223bb151ee245fe6
BLAKE2b-256 bdd9faa73b602930fc26c47ce52d311f015fbd8687d7406afeda38898c5b0c88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page