High-performance audio analysis and music information retrieval in Rust
Project description
sonara
High-performance audio analysis library for Python, written in Rust.
High-performance audio feature extraction, batch analysis, and built-in perceptual features for playlist generation.
sonara — from Latin sonare, "to sound, to resonate"
Installation
pip install sonara
Requires Python 3.9+. Pre-built wheels available for Linux, macOS (Intel & Apple Silicon), and Windows.
Build from source:
git clone https://github.com/kkollsga/sonara.git
cd sonara
pip install maturin
maturin develop --release
Quick Start
import sonara
import numpy as np
# Load audio
y, sr = sonara.load("track.mp3", sr=22050)
# STFT
D = sonara.stft(y)
S_db = sonara.amplitude_to_db(np.abs(D))
# Mel spectrogram + MFCC
mel = sonara.melspectrogram(y=y, sr=22050.0)
mfcc = sonara.mfcc(y=y, sr=22050.0, n_mfcc=13)
# Beat tracking
tempo, beats = sonara.beat_track(y=y, sr=22050)
# Chroma & HPCP
chroma = sonara.chroma_stft(y=y, sr=22050.0)
hpcp = sonara.hpcp(power_spec, freqs)
# Pitch estimation
f0, voiced, prob = sonara.pyin(y, fmin=65.0, fmax=2093.0, sr=22050)
Analysis Pipeline
sonara includes a fused analysis pipeline that extracts all features in a single optimized pass. Three modes control the depth of analysis:
Modes
| Mode | Features | Time (10s track) | Use case |
|---|---|---|---|
compact |
11 core features | ~1.2 ms | Fast scanning, metadata |
playlist |
30+ features incl. tonal & perceptual | ~4 ms | Playlist generation, music discovery |
full |
All features incl. time signature | ~50 ms | Research, comprehensive analysis |
Compact mode (default)
Core signal features, always computed:
r = sonara.analyze_file("track.mp3", mode="compact")
r['bpm'] # Tempo (BPM)
r['beats'] # Beat frame positions
r['onset_frames'] # Onset positions
r['onset_density'] # Onsets per second
r['rms_mean'] # Average loudness (RMS)
r['rms_max'] # Peak loudness (RMS)
r['loudness_lufs'] # Integrated loudness (LUFS, ITU-R BS.1770-4)
r['dynamic_range_db'] # Loudness range (p95 - p5, dB)
r['spectral_centroid_mean'] # Brightness (Hz)
r['zero_crossing_rate'] # Percussiveness proxy
r['duration_sec'] # Track length
Playlist mode
Everything for playlist generation: spectral features, MFCCs (timbre fingerprint), chroma (harmony), tonal analysis (chords, dissonance), plus perceptual features:
r = sonara.analyze_file("track.mp3", mode="playlist")
# Perceptual features (0.0 - 1.0)
r['energy'] # Perceived intensity (loudness + brightness + activity)
r['danceability'] # Beat regularity + tempo sweet spot + rhythm
r['valence'] # Mood (0 = sad/dark, 1 = happy/bright)
r['acousticness'] # Acoustic vs electronic character
# Musical key
r['key'] # e.g. "C major", "A minor"
r['key_confidence'] # How confident the key detection is (0.0 - 1.0)
# Tonal analysis
r['chord_sequence'] # Beat-synchronous chord labels, e.g. ["Am", "F", "C", "G"]
r['predominant_chord'] # Most frequent chord
r['chord_change_rate'] # Chord changes per second (harmonic complexity)
r['dissonance'] # Sensory dissonance (0 = consonant, 1 = rough)
# Spectral features
r['spectral_bandwidth_mean'] # Frequency spread
r['spectral_rolloff_mean'] # Frequency below which 85% of energy sits
r['spectral_flatness_mean'] # Tonal (0) vs noise-like (1)
r['spectral_contrast_mean'] # Peak-valley ratio per band (7 values)
r['mfcc_mean'] # Timbre fingerprint (13 coefficients)
r['chroma_mean'] # Pitch class distribution (12 values)
Full mode
Adds expensive rhythm analysis features on top of playlist mode:
r = sonara.analyze_file("track.mp3", mode="full")
r['tempo_curve'] # Per-beat BPM values
r['tempo_variability'] # Coefficient of variation of tempo
r['time_signature'] # e.g. "4/4", "3/4"
r['time_signature_confidence'] # Detection confidence
Custom feature selection
Cherry-pick specific features regardless of mode:
r = sonara.analyze_file("track.mp3", features=["bpm", "energy", "key", "chords"])
Valid feature names: bpm, beats, onsets, rms, dynamic_range, centroid, zcr, onset_density, bandwidth, rolloff, flatness, contrast, mfcc, chroma, chords, dissonance, energy, danceability, key, valence, acousticness, tempo_curve, time_signature
Batch analysis
Analyze entire music libraries in parallel using all CPU cores:
import sonara
from pathlib import Path
files = [str(p) for p in Path("~/Music").rglob("*.mp3")]
results = sonara.analyze_batch(files, mode="playlist")
for r in results:
print(f"{r['bpm']:5.0f} BPM | {r['energy']:.2f} energy | "
f"{r['key']:>10} | {r['predominant_chord']:>4} | "
f"{r['dissonance']:.3f} diss | {r['valence']:.2f} valence")
Tonal Analysis
Standalone tonal functions for detailed harmonic analysis:
import sonara
import numpy as np
y, sr = sonara.load("track.mp3", sr=22050)
S = sonara.stft(y, n_fft=2048, hop_length=512)
power = np.abs(S) ** 2
freqs = sonara.fft_frequencies(sr=float(sr), n_fft=2048)
# HPCP — Harmonic Pitch Class Profile (Gomez 2006)
# More robust than energy-based chroma: uses spectral peaks + harmonic weighting
hpcp = sonara.hpcp(power, freqs) # shape (12, n_frames)
# Chord detection from HPCP + beats
tempo, beats = sonara.beat_track(y=y, sr=sr)
chords = sonara.chords_from_beats(hpcp, list(beats)) # ["Am", "F", "C", "G", ...]
desc = sonara.chord_descriptors(chords, len(y) / sr)
print(f"Predominant: {desc['predominant_chord']}, "
f"Changes: {desc['chord_change_rate']:.2f}/s, "
f"Unique: {desc['n_unique']}")
# Dissonance — Sethares (1998) Plomp-Levelt model
diss = sonara.dissonance(power, freqs) # mean dissonance (0-1)
# Or from specific peaks
d = sonara.dissonance_from_peaks([440.0, 466.16], [1.0, 1.0]) # minor 2nd
Display
import sonara
import sonara.display as display
import matplotlib.pyplot as plt
y, sr = sonara.load("track.mp3", sr=22050)
mel = sonara.melspectrogram(y=y, sr=22050.0)
mel_db = sonara.power_to_db(mel)
fig, ax = plt.subplots()
display.specshow(mel_db, x_axis='time', y_axis='mel', sr=22050, ax=ax)
plt.show()
Performance
All arithmetic uses f32 precision (matching native decoder format), with a parallelized fused FFT pipeline where all features (spectral, tonal, contrast) are computed in a single pass per frame — eliminating redundant FFT computation and keeping data in L1 cache.
Analysis pipeline benchmarks (Apple Silicon)
| Mode | 10s track | 3-min track | Features |
|---|---|---|---|
compact |
~1.2 ms | ~39 ms | 11 core features |
playlist |
~4 ms | ~80 ms | 30+ features |
full |
~50 ms | ~510 ms | All features incl. time signature |
Feature benchmarks (vs Python/librosa)
| Feature | Speedup |
|---|---|
| Mel spectrogram | ~3x |
| MFCC | ~3x |
| Beat tracking | ~4x |
| Onset detection | ~3x |
| Cold start (first call) | ~20-30x |
| Batch analysis (parallel) | ~5x |
Key optimizations
- Fused single-pass pipeline — one FFT per frame simultaneously produces mel, chroma, centroid, RMS, bandwidth, rolloff, flatness, spectral contrast, HPCP, and dissonance. No power spectrum matrix stored.
- Pre-computed DCT matrix — MFCCs use cached DCT-II coefficients (matrix multiply instead of per-element cos())
- Sparse filterbanks — both mel and chroma filterbanks skip zero entries (~97% sparsity for mel)
- Partial sort for contrast — uses O(n) selection instead of O(n log n) sort for percentile computation
- Top-N peak detection — spectral peaks sorted by magnitude for HPCP/dissonance, shared between both algorithms
- f32 precision — halves memory bandwidth vs f64, matches Symphonia's native decode format
- Parallel FFT frames — rayon parallelism across frames (for signals > 32 frames)
- Fast 2:1 decimation — half-band FIR filter for 44100-to-22050 Hz instead of full sinc resampling
- Thread-local caches — FFT plans, mel/chroma filterbanks, DCT matrix reused across calls
API Reference
sonara provides 100+ audio analysis functions:
Core Audio: load, stream, stft, istft, resample, to_mono, tone, chirp, clicks, autocorrelate, lpc, zero_crossings, mu_compress, mu_expand
Spectral Features: melspectrogram, mfcc, chroma_stft, tonnetz, spectral_centroid, spectral_bandwidth, spectral_rolloff, spectral_flatness, spectral_contrast, rms, zero_crossing_rate, poly_features
Tonal Analysis: hpcp, chords_from_beats, chords_from_frames, chord_descriptors, dissonance, dissonance_from_peaks
Rhythm: beat_track, onset_detect, onset_strength, onset_strength_multi, tempo, tempo_curve, tempo_variability, tempogram, fourier_tempogram, metrogram, detect_time_signature, plp
Pitch: yin, pyin, piptrack, estimate_tuning, pitch_tuning, salience, interp_harmonics, f0_harmonics
Transforms: cqt, vqt, icqt, hybrid_cqt, pseudo_cqt, griffinlim, griffinlim_cqt, phase_vocoder, iirt, reassigned_spectrogram, pcen, perceptual_weighting
Source Separation: hpss, harmonic, percussive, nn_filter, decompose_nmf
Effects: time_stretch, pitch_shift, trim, split, split_with_constraints, remix, melody_separate, preemphasis, deemphasis
Sequence Analysis: dtw, rqa, viterbi, viterbi_discriminative, viterbi_binary, recurrence_matrix, cross_similarity, path_enhance
Perceptual: loudness_lufs, energy, danceability, detect_key, valence, acousticness
Conversions (50+): hz_to_mel, mel_to_hz, hz_to_midi, midi_to_hz, note_to_hz, note_to_midi, hz_to_note, hz_to_octs, hz_to_svara_h, hz_to_svara_c, hz_to_fjs, fft_frequencies, mel_frequencies, cqt_frequencies, frames_to_time, time_to_frames, frequency weighting (A/B/C/D/Z), notation helpers, and more
Filters & DSP: mel filterbank, chroma filterbank, lfilter, filtfilt, sosfiltfilt, window functions (Hann, Hamming, Blackman, Kaiser, Tukey, Gaussian)
Pipeline: analyze_file, analyze_signal, analyze_batch
Architecture
sonara is a two-crate Rust workspace:
sonara— Pure Rust core library (~18,000 LOC)sonara-python— PyO3 bindings (~1,200 LOC)
sonara/src/
analyze.rs — Fused analysis pipeline (compact/playlist/full modes)
perceptual.rs — LUFS, energy, danceability, key detection, valence, acousticness
tonal.rs — HPCP, chord detection, dissonance (Sethares 1998)
beat.rs — Beat tracking (Ellis 2007 DP algorithm)
onset.rs — Onset detection (spectral flux + peak picking)
decompose.rs — HPSS, NMF
effects.rs — Time stretch, pitch shift, trim, split
segment.rs — Recurrence matrix, cross-similarity, path enhancement
sequence.rs — DTW, RQA, Viterbi, transition matrices
core/
audio.rs — Audio I/O, resampling, fast 2:1 decimation
spectrum.rs — STFT, CQT/VQT, phase vocoder, Griffin-Lim
fft.rs — FFT with thread-local plan caching
pitch.rs — YIN / pYIN pitch estimation
harmonic.rs — Harmonic salience, interpolation
convert.rs — Hz/mel/MIDI/note/SVara/FJS conversions, frequency weighting
feature/
spectral.rs — Mel, MFCC, chroma, centroid, bandwidth, rolloff, flatness, contrast
rhythm.rs — Tempogram, metrogram, time signature detection
dsp/
windows.rs — Window functions (Hann, Hamming, Blackman, Kaiser, Tukey, Gaussian)
iir.rs — IIR filters (lfilter, filtfilt, sosfiltfilt)
extrema.rs — Local maxima/minima detection
filters.rs — Mel/chroma filterbanks
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sonara-0.1.6.tar.gz.
File metadata
- Download URL: sonara-0.1.6.tar.gz
- Upload date:
- Size: 153.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c47fc22ddc674b51c1fa49d278631be955ad9e46d21ea5e273fdc23776cbdb9
|
|
| MD5 |
983a605b9be8c5491994df66fcd42b3e
|
|
| BLAKE2b-256 |
cc13bac8b9ce69997981a3cc89f82ffbd3f4b1111d57398a569a7a6288389a48
|
Provenance
The following attestation bundles were made for sonara-0.1.6.tar.gz:
Publisher:
ci.yml on kkollsga/sonara
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sonara-0.1.6.tar.gz -
Subject digest:
2c47fc22ddc674b51c1fa49d278631be955ad9e46d21ea5e273fdc23776cbdb9 - Sigstore transparency entry: 1299587130
- Sigstore integration time:
-
Permalink:
kkollsga/sonara@e78b256d5e40e5d1e643c63fb75d50eb643ad90d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kkollsga
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@e78b256d5e40e5d1e643c63fb75d50eb643ad90d -
Trigger Event:
push
-
Statement type: