A comprehensive PyTorch-based audio feature extraction library for machine learning, research, and audio analysis

These details have not been verified by PyPI

Project links

Project description

audiofeat

A comprehensive PyTorch-based audio feature extraction library for speech research, music analysis, and audio ML pipelines. Extract 140+ features across temporal, spectral, cepstral, pitch, voice-quality, and rhythm domains — from a single pip install.

import audiofeat

features = audiofeat.extract_features_from_file("recording.wav")
print(features["f0_mean_hz"], features["rms_mean"], features["mfcc_0_mean"])

Why audiofeat?

One library, all features. Temporal, spectral, cepstral, pitch, voice quality, rhythm, formants, and tonal features in a single package.
PyTorch-first. Every feature returns a torch.Tensor. Plug directly into your training loop — no numpy-to-tensor conversion needed.
Librosa-grade accuracy. Primary paths delegate to librosa when available for bit-exact parity; pure-PyTorch fallbacks when it's not installed.
Beginner to production. Use individual functions for exploration, or the built-in CLI and batch extraction for production pipelines.
Validated. Built-in Praat comparison tooling and a gold-standard scorecard for reproducible research.

Features

Temporal Features

Feature	Function	Description
RMS	`rms()`	Root-mean-square amplitude per frame
Short-Time Energy	`short_time_energy()`	Sum of squared signal values in each frame
Zero-Crossing Rate	`zero_crossing_rate()`	Rate at which the signal changes sign
Zero-Crossing Count	`zero_crossing_count()`	Number of zero-crossings per frame
Loudness	`loudness()`	Perceptual loudness estimation
Log Attack Time	`log_attack_time()`	MPEG-7 style attack time (10%–90% rise)
Decay Time	`decay_time()`	Time for envelope to decay from peak
Temporal Centroid	`temporal_centroid()`	Center of gravity of the amplitude envelope
Amplitude Modulation	`amplitude_modulation_depth()`	Depth of amplitude modulation over a sliding window
Entropy of Energy	`entropy_of_energy()`	Abrupt changes in energy within a frame
Teager Energy	`teager_energy_operator()`	Teager-Kaiser energy for amplitude/frequency tracking
Breath Group Duration	`breath_group_duration()`	Estimated duration of breath groups
Speech Rate	`speech_rate()`	Syllables per second estimation
Tristimulus	`tristimulus()`	T1/T2/T3 timbre ratios from harmonic amplitudes

Spectral Features

Feature	Function	Description
Spectral Centroid	`spectral_centroid()`	Center of mass of the spectrum
Spectral Rolloff	`spectral_rolloff()`	Frequency below which X% of energy is concentrated
Spectral Flux	`spectral_flux()`	Rate of change of the power spectrum
Spectral Flatness	`spectral_flatness()`	How noise-like a sound is (Wiener entropy)
Spectral Entropy	`spectral_entropy()`	Randomness of the spectral distribution
Spectral Bandwidth	`spectral_bandwidth()`	Spread of the spectrum around the centroid
Spectral Spread	`spectral_spread()`	Standard deviation of the spectral distribution
Spectral Slope	`spectral_slope()`	Linear regression slope fitted to the spectrum
Spectral Skewness	`spectral_skewness()`	Asymmetry of the spectral distribution
Spectral Crest Factor	`spectral_crest_factor()`	Peak-to-average ratio (peakiness)
Spectral Contrast	`spectral_contrast()`	Peak-valley amplitude difference across sub-bands
Spectral Deviation	`spectral_deviation()`	Jaggedness of the spectral envelope
Spectral Sharpness	`spectral_sharpness()`	Perceived sharpness (Zwicker model)
Spectral Roughness	`spectral_roughness()`	Sensory dissonance measure
Spectral Tonality	`spectral_tonality()`	Tonal vs. noise-like character
Spectral Irregularity	`spectral_irregularity()`	Irregularity of the spectral envelope
Low-High Energy Ratio	`low_high_energy_ratio()`	Energy below 1 kHz vs. above 3 kHz
HNR	`harmonic_to_noise_ratio()`	Harmonic-to-noise ratio
Harmonic Richness	`harmonic_richness_factor()`	Richness of harmonic content
Inharmonicity	`inharmonicity_index()`	Inharmonicity of the spectrum
Phase Coherence	`phase_coherence()`	Phase coherence across frequency bins
Sibilant Peak	`sibilant_spectral_peak_frequency()`	Peak frequency in the sibilant region

Spectrograms & Transforms

Feature	Function	Description
Linear Spectrogram	`linear_spectrogram()`	STFT magnitude spectrogram
Mel Spectrogram	`mel_spectrogram()`	Mel-scaled frequency spectrogram
Log Mel Spectrogram	`log_mel_spectrogram()`	Log-scaled Mel spectrogram
CQT Spectrogram	`cqt_spectrogram()`	Constant-Q transform (log-frequency bins)
MFCCs	`mfcc()`	Mel-Frequency Cepstral Coefficients
Chroma	`chroma()`	12-bin pitch class intensity (chromagram)
Tonnetz	`tonnetz()`	6D tonal centroid features

Formant Analysis

Feature	Function	Description
Formant Frequencies	`formant_frequencies()`	Extract F1, F2, F3, ... via Burg LPC
Formant Contours	`formant_contours()`	Time-varying formant trajectories
Formant Bandwidths	`formant_bandwidths()`	Bandwidth of each formant
Formant Dispersion	`formant_dispersion()`	Average spacing between formants

Linear Prediction

Feature	Function	Description
LPC	`lpc_coefficients()`	Linear Prediction Coefficients (Burg method)
LSP	`lsp_coefficients()`	Line Spectral Pairs from LPC

Cepstral Features

Feature	Function	Description
LPCC	`lpcc()`	Linear Predictive Cepstral Coefficients
GTCC	`gtcc()`	Gammatone Cepstral Coefficients
GFCC	`gfcc()`	Gammatone Frequency Cepstral Coefficients
ERB Cepstral	`erb_cepstral_coefficients()`	ERB-scale cepstral coefficients
Delta	`delta()`	First-order derivative of a feature contour
Delta-Delta	`delta_delta()`	Second-order derivative (acceleration)

Pitch Features

Feature	Function	Description
F0 (Autocorrelation)	`fundamental_frequency_autocorr()`	F0 via autocorrelation
F0 (YIN)	`fundamental_frequency_yin()`	F0 via YIN algorithm
F0 (pYIN)	`fundamental_frequency_pyin()`	Probabilistic YIN (requires librosa)
F0 (Praat)	`fundamental_frequency_praat()`	Exact Praat parity (requires parselmouth)
Pitch Strength	`pitch_strength()`	Strength of periodicity
Semitone Std Dev	`semitone_sd()`	F0 variation in semitones

Voice Quality Features

Feature	Function	Description
Jitter	`jitter()`	Cycle-to-cycle F0 variation
Jitter (local)	`jitter_local()`	Average absolute period difference (%)
Jitter (PPQ5)	`jitter_ppq5()`	Five-point Period Perturbation Quotient
Jitter (DDP)	`jitter_ddp()`	Difference of Differences of Periods
Shimmer	`shimmer()`	Cycle-to-cycle amplitude variation
Shimmer (local)	`shimmer_local()`	Local shimmer (%)
Shimmer (dB)	`shimmer_local_db()`	Shimmer in decibels
Shimmer (APQ3)	`shimmer_apq3()`	Three-point Amplitude Perturbation Quotient
Shimmer (DDA)	`shimmer_dda()`	Difference of Differences of Amplitudes
CPP	`cepstral_peak_prominence()`	Cepstral Peak Prominence for dysphonia detection
Alpha Ratio	`alpha_ratio()`	Energy ratio: 50–1000 Hz vs 1–5 kHz
Hammarberg Index	`hammarberg_index()`	Max energy ratio: 0–2 kHz vs 2–5 kHz
Harmonic Differences	`harmonic_differences()`	H1-H2, H1-A3, and other harmonic ratios
SHR	`subharmonic_to_harmonic_ratio()`	Subharmonic-to-harmonic power ratio
NAQ	`normalized_amplitude_quotient()`	Normalized Amplitude Quotient
Closed Quotient	`closed_quotient()`	Closed phase ratio from EGG
Soft Phonation Index	`soft_phonation_index()`	Low/high band energy ratio
GNE	`glottal_to_noise_excitation()`	Glottal-to-Noise Excitation ratio
MFDR	`maximum_flow_declination_rate()`	Maximum Flow Declination Rate
Vocal Fry Index	`vocal_fry_index()`	Ratio of fry frames to voiced frames
VOT	`voice_onset_time()`	Voice Onset Time estimation
Vocal Tract Length	`vocal_tract_length()`	Estimated from F1 and F2
Nasality Index	`nasality_index()`	Nasal vs. oral microphone energy

Rhythm Features

Feature	Function	Description
Tempo	`tempo()`	BPM estimation from onset autocorrelation
Beat Tracking	`beat_track()`	Beat positions in the audio signal
Onset Detection	`onset_detect()`	Transient event detection

Statistical Functionals

Apply to any time-series feature via compute_functionals(): mean, standard deviation, min, max, skewness, kurtosis.

Architecture

audiofeat
├── temporal/      # RMS, ZCR, energy, attack, loudness, rhythm, ...
├── spectral/      # Centroid, rolloff, flux, MFCCs, chroma, formants, ...
├── cepstral/      # LPCC, GTCC, ERB cepstral, deltas
├── pitch/         # Autocorrelation, YIN, pYIN, Praat backends
├── voice/         # Jitter, shimmer, CPP, harmonic ratios, glottal flow
├── rhythm/        # Beat detection
├── stats/         # Statistical functionals
├── io/            # Audio loading, single-file & batch extraction, CSV export
├── validation/    # Praat comparison, gold-standard scorecard
├── standards/     # openSMILE eGeMAPS/ComParE wrappers
└── catalog/       # Auto-discovered feature catalog

How it works: Each feature function checks if librosa is available. If so, it delegates to librosa's implementation for bit-exact parity with the research standard. If librosa is not installed, a pure-PyTorch fallback computes the same feature. Either way, you always get a torch.Tensor back.

Installation

Python >=3.8 is required. We recommend creating a virtual environment first.

pip (from PyPI)

pip install audiofeat

From source

git clone https://github.com/ankitshah009/audiofeat.git
cd audiofeat
pip install -e .

With a virtual environment

# Option A: venv
python -m venv .venv
source .venv/bin/activate
pip install audiofeat

# Option B: conda
conda create -n audiofeat python=3.11 -y
conda activate audiofeat
pip install audiofeat

# Option C: uv
uv venv && source .venv/bin/activate
uv pip install audiofeat

Optional extras

Extra	What it adds	Install command
`dev`	pytest, black, mypy, flake8	`pip install "audiofeat[dev]"`
`examples`	matplotlib, librosa, soundfile	`pip install "audiofeat[examples]"`
`validation`	Praat/parselmouth backend	`pip install "audiofeat[validation]"`
`standards`	openSMILE eGeMAPS/ComParE	`pip install "audiofeat[standards]"`
`models`	ASR, diarization, VAD, denoising	`pip install "audiofeat[models]"`
`full`	examples + validation + standards	`pip install "audiofeat[full]"`

Quick Start

Extract features from a file

The simplest way to get started. This extracts all core features and returns a flat dictionary of summary statistics:

from audiofeat.io.features import extract_features_from_file

features = extract_features_from_file("path/to/audio.wav")

# What you get back:
print(features["f0_mean_hz"])        # Mean fundamental frequency
print(features["rms_mean"])          # Mean RMS energy
print(features["spectral_centroid_mean"])  # Mean spectral centroid
print(features["mfcc_0_mean"])       # Mean of first MFCC coefficient

Compute individual features

For fine-grained control, call feature functions directly. Every function accepts a 1D torch.Tensor waveform:

import torch
import audiofeat

# Load your audio (or use a test signal)
sr = 22050
waveform = torch.randn(sr * 3)  # 3 seconds of noise

# Temporal features
rms = audiofeat.rms(waveform, frame_length=2048, hop_length=512)
zcr = audiofeat.zero_crossing_rate(waveform, frame_length=2048, hop_length=512)

# Spectral features
centroid = audiofeat.spectral_centroid(waveform, frame_length=2048, hop_length=512, sample_rate=sr)
rolloff = audiofeat.spectral_rolloff(waveform, frame_length=2048, hop_length=512, sample_rate=sr)
contrast = audiofeat.spectral_contrast(waveform, sample_rate=sr)

# Cepstral features
mfccs = audiofeat.mfcc(waveform, sr)
chroma = audiofeat.chroma(waveform, sr)

# Pitch
f0 = audiofeat.fundamental_frequency_yin(waveform, fs=sr, frame_length=2048, hop_length=512)

# Voice quality
jit = audiofeat.jitter(waveform, fs=sr)
shim = audiofeat.shimmer(waveform, fs=sr)

# Every result is a torch.Tensor
print(f"RMS shape: {rms.shape}")
print(f"MFCCs shape: {mfccs.shape}")
print(f"F0 shape: {f0.shape}")

Load a real audio file

from audiofeat.io import load_audio

waveform, sr = load_audio("path/to/audio.wav", target_sr=16000)
# waveform is a 1D torch.Tensor, sr is an int

Aggregate over time with statistical functionals

from audiofeat import compute_functionals

rms = audiofeat.rms(waveform, frame_length=2048, hop_length=512)
stats = compute_functionals(rms)
# {'mean': tensor(...), 'std': tensor(...), 'min': tensor(...),
#  'max': tensor(...), 'skewness': tensor(...), 'kurtosis': tensor(...)}

Batch extraction to CSV

from audiofeat.io.features import extract_features_for_directory

extract_features_for_directory("audio_folder/", "output.csv")

CLI

audiofeat includes a command-line interface for common workflows.

audiofeat --help

Extract features from a single file

audiofeat extract recording.wav --output features.json

Batch extract an entire directory

audiofeat batch-extract audio_folder/ output.csv

Diagnose your environment

audiofeat doctor --audio-dir examples

Checks installed dependencies, verifies audio files are valid, and reports any issues.

Browse available features

audiofeat list-features
audiofeat list-features --format markdown --output FEATURES.md

Advanced Topics

For Praat validation, openSMILE integration, the gold-standard scorecard, and troubleshooting, see docs/VALIDATION.md.

For the auto-generated feature catalog, see docs/FEATURE_CATALOG.md.

Testing

pytest -q

With coverage:

pytest --cov=audiofeat --cov-report=term-missing -q

Contributing

We welcome contributions! If you have new features, bug fixes, or improvements, please open a pull request on GitHub.

Citation

If you use audiofeat in your research, please cite:

@phdthesis{shah2024computational,
  title={Computational Audition with Imprecise Labels},
  author={Shah, Ankit Parag},
  year={2024},
  school={Carnegie Mellon University Pittsburgh, PA}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.1

Feb 12, 2026

1.0.0

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiofeat-1.1.1.tar.gz (103.7 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audiofeat-1.1.1-py3-none-any.whl (114.2 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file audiofeat-1.1.1.tar.gz.

File metadata

Download URL: audiofeat-1.1.1.tar.gz
Upload date: Feb 12, 2026
Size: 103.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for audiofeat-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3d93056e42be72ac809c81eb4f8fae9e1f0a1768093df28ee3553a9044a214e7`
MD5	`c9e7e4d8171c7aa8f90ddc776e7fdb2e`
BLAKE2b-256	`21ca25b58c8c312c4399ba7f1b80e39a4fab4ce9cc5a28d31b311dfeebbacd85`

See more details on using hashes here.

File details

Details for the file audiofeat-1.1.1-py3-none-any.whl.

File metadata

Download URL: audiofeat-1.1.1-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 114.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for audiofeat-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1645386f2419b59f62e57c827b0441b2ba5d7fcce08e279ddbe04cfb54f6b70b`
MD5	`b661ecf7c0d5cca2223bb151ee245fe6`
BLAKE2b-256	`bdd9faa73b602930fc26c47ce52d311f015fbd8687d7406afeda38898c5b0c88`

See more details on using hashes here.

audiofeat 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

audiofeat

Why audiofeat?

Features

Temporal Features

Spectral Features

Spectrograms & Transforms

Formant Analysis

Linear Prediction

Cepstral Features

Pitch Features

Voice Quality Features

Rhythm Features

Statistical Functionals

Architecture

Installation

pip (from PyPI)

From source

With a virtual environment

Optional extras

Quick Start

Extract features from a file

Compute individual features

Load a real audio file

Aggregate over time with statistical functionals

Batch extraction to CSV

CLI

Extract features from a single file

Batch extract an entire directory

Diagnose your environment

Browse available features

Advanced Topics

Testing

Contributing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes