Fast spectrogram computation library powered by Rust
Project description
Features
- Multiple Spectrogram Types: Linear, Mel, ERB frequency scales
- Multiple Amplitude Scales: Power, Magnitude, Decibels
- High Performance: Rust implementation with Python bindings
- Plan-based Computation: Reuse FFT plans for efficient batch processing
- Rich Audio Features: MFCC, Chromagram, CQT support
- Streaming Support: Frame-by-frame processing for real-time applications
Installation
pip install spectrograms
Benchmark Results
Check out the benchmark results for detailed performance comparisons against NumPy and SciPy implementations across various configurations and signal types.
Quick Start
import numpy as np
import spectrograms as sg
# Generate a test signal
sr = 16000
t = np.linspace(0, 1, sr)
samples = np.sin(2 * np.pi * 440 * t)
# Create parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=sr)
# Compute spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)
print(f"Shape: {spec.shape}")
print(f"Frequency range: {spec.frequency_range()}")
print(f"Duration: {spec.duration():.2f}s")
Mel Spectrogram Example
import numpy as np
import spectrograms as sg
# Load your audio data
samples = np.random.randn(16000) # Replace with real audio
sr = 16000
# Configure parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=sr)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)
# Compute mel spectrogram in dB scale
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)
# Access the data
spectrogram_data = mel_spec.data # NumPy array (n_mels, n_frames)
frequencies = mel_spec.frequencies # Mel frequencies
times = mel_spec.times # Time axis in seconds
Efficient Batch Processing
For processing multiple audio files, use the planner API to reuse FFT plans:
import numpy as np
import spectrograms as sg
# Setup
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=16000)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)
# Create plan once
planner = sg.SpectrogramPlanner()
plan = planner.mel_db_plan(params, mel_params, db_params)
# Reuse plan for multiple signals (much faster!)
signals = [np.random.randn(16000) for _ in range(100)]
spectrograms = [plan.compute(signal) for signal in signals]
Advanced Features
MFCCs (Mel-Frequency Cepstral Coefficients)
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
mfcc_params = sg.MfccParams(n_mfcc=13)
mfccs = sg.compute_mfcc(samples, stft, sample_rate=16000, n_mels=40, mfcc_params=mfcc_params)
# Returns shape: (n_mfcc, n_frames)
Chromagram (Pitch Class Profiles)
stft = sg.StftParams(n_fft=4096, hop_size=512, window=sg.WindowType.hanning)
chroma_params = sg.ChromaParams.music_standard()
chroma = sg.compute_chromagram(samples, stft, sample_rate=22050, chroma_params=chroma_params)
# Returns shape: (12, n_frames) - one row per pitch class
Raw STFT
params = sg.SpectrogramParams.music_default(sample_rate=44100)
stft_data = sg.compute_stft(samples, params)
# Returns complex-valued STFT matrix
Window Functions
Supported window functions:
"hanning"- Hann window (default)"hamming"- Hamming window"blackman"- Blackman window"rectangular"- Rectangular window (no windowing)"kaiser=beta"- Kaiser window with beta parameter (e.g.,"kaiser=5.0")"gaussian=std"- Gaussian window with std parameter (e.g.,"gaussian=0.4")
Example:
stft = sg.StftParams(n_fft=512, hop_size=256, window="kaiser=8.0")
Default Presets
# Speech processing preset (n_fft=512, hop_size=160)
params = sg.SpectrogramParams.speech_default(sample_rate=16000)
# Music processing preset (n_fft=2048, hop_size=512)
params = sg.SpectrogramParams.music_default(sample_rate=44100)
API Reference
Parameter Classes
StftParams(n_fft, hop_size, window, centre=True)- STFT configurationSpectrogramParams(stft, sample_rate)- Base spectrogram parametersMelParams(n_mels, f_min, f_max)- Mel filterbank parametersErbParams(n_filters, f_min, f_max)- ERB filterbank parametersLogParams(floor_db)- Decibel conversion parametersCqtParams(bins_per_octave, n_octaves, f_min)- Constant-Q parametersChromaParams(tuning, f_min, f_max, norm)- Chromagram parametersMfccParams(n_mfcc)- MFCC parameters
Spectrogram Result
The Spectrogram object returned by all compute functions has:
.data- NumPy array with shape (n_bins, n_frames).frequencies- Frequency axis values (Hz or scale-specific).times- Time axis values (seconds).n_bins- Number of frequency bins.n_frames- Number of time frames.shape- Tuple (n_bins, n_frames).frequency_range()- Min/max frequencies.duration()- Total duration in seconds.params- Original computation parameters
Note: The Spectrogram object can be directly used as a NumPy array. For example:
import numpy as np
import spectrograms as sg
sine_wave = np.sin(2 * np.pi * 440 * np.linspace(0, 1.0, SAMPLE_RATE, endpoint=False))
stft_params = sg.StftParams(n_fft=1024, hop_size=256, window=sg.WindowType.hanning)
spectrogram_params = sg.SpectrogramParams(stft_params, SAMPLE_RATE)
spectrogram = sg.compute_linear_power_spectrogram(sine_wave, spectrogram_params)
np.abs(spectrogram).shape # works just fine
Binaural Spectrograms
Binaural spectrograms capture spatial audio cues from stereo or binaural recordings. Based on Binaspect.
import spectrograms as sg
# stereo_audio: numpy array of shape (2, n_samples) — [left, right]
stft = sg.StftParams(n_fft=4096, hop_size=1024, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=44100)
# ITD — Interaural Time Difference (seconds), low-frequency localisation cue
itd_params = sg.ITDSpectrogramParams(params, start_freq=50.0, end_freq=620.0)
itd = sg.compute_itd_spectrogram(stereo_audio, itd_params)
# shape: (53, n_frames) [with n_fft=4096 at 44100 Hz]
# IPD — Interaural Phase Difference (radians), optionally phase-wrapped
ipd_params = sg.IPDSpectrogramParams(params, start_freq=50.0, end_freq=620.0, wrapped=True)
ipd = sg.compute_ipd_spectrogram(stereo_audio, ipd_params)
# ILD — Interaural Level Difference (dB), high-frequency localisation cue
ild_params = sg.ILDSpectrogramParams(params, start_freq=1700.0, end_freq=4600.0)
ild = sg.compute_ild_spectrogram(stereo_audio, ild_params)
# shape: (269, n_frames)
# ILR — Interaural Level Ratio (normalised, range [-1, 1])
ilr_params = sg.ILRSpectrogramParams(params, start_freq=1700.0, end_freq=4600.0)
ilr = sg.compute_ilr_spectrogram(stereo_audio, ilr_params)
# Comparison / diff functions
itd_diff, mean_degrees, mean_itd = sg.compute_itd_spectrogram_diff(
ref_audio, test_audio, itd_params
)
print(f"Mean ITD difference: {mean_degrees:.2f}° ({mean_itd*1e6:.1f} µs)")
ilr_diff, mean_ilr = sg.compute_ilr_spectrogram_diff(
ref_audio, test_audio, ilr_params
)
Convenience Functions
All compute functions release the Python GIL during computation.
Linear spectrograms:
compute_linear_power_spectrogram(samples, params)compute_linear_magnitude_spectrogram(samples, params)compute_linear_db_spectrogram(samples, params, db_params)
Mel spectrograms:
compute_mel_power_spectrogram(samples, params, mel_params)compute_mel_magnitude_spectrogram(samples, params, mel_params)compute_mel_db_spectrogram(samples, params, mel_params, db_params)
ERB spectrograms:
compute_erb_power_spectrogram(samples, params, erb_params)compute_erb_magnitude_spectrogram(samples, params, erb_params)compute_erb_db_spectrogram(samples, params, erb_params, db_params)
Other features:
compute_stft(samples, params)- Raw STFT (complex output)compute_cqt(samples, sample_rate, cqt_params, hop_size)- Constant-Q Transformcompute_chromagram(samples, stft_params, sample_rate, chroma_params)compute_mfcc(samples, stft_params, sample_rate, n_mels, mfcc_params)
Binaural spectrograms:
compute_itd_spectrogram(audio, params)- Interaural Time Differencecompute_itd_spectrogram_diff(reference, test, params)- ITD comparisoncompute_ipd_spectrogram(audio, params)- Interaural Phase Differencecompute_ild_spectrogram(audio, params)- Interaural Level Differencecompute_ilr_spectrogram(audio, params)- Interaural Level Ratiocompute_ilr_spectrogram_diff(reference, test, params)- ILR comparison
Planner API
Create a planner and reusable plans for batch processing:
planner = sg.SpectrogramPlanner()
# Create plans (one per spectrogram type)
plan = planner.linear_power_plan(params)
plan = planner.mel_db_plan(params, mel_params, db_params)
# ... and 7 other plan types
# Use plans
spec = plan.compute(samples)
frame = plan.compute_frame(samples, frame_idx)
shape = plan.output_shape(signal_length)
Available plan types match the convenience functions:
linear_power_plan,linear_magnitude_plan,linear_db_planmel_power_plan,mel_magnitude_plan,mel_db_planerb_power_plan,erb_magnitude_plan,erb_db_plan
Performance Notes
- Plan Reuse: Creating FFT plans is expensive. Reuse plans via the
SpectrogramPlannerAPI for a speedup in batch processing. - FFT Size: Powers of 2 (256, 512, 1024, 2048) are significantly faster than arbitrary sizes.
- GIL Release: All compute functions release the Python GIL, allowing parallel processing of multiple audio files.
- Backend: The default
realfftbackend is pure Rust with no system dependencies. Try building from source to enable the FFTW backend. It may offer better performance.
License
MIT License
Links
- GitHub: https://github.com/jmg049/Spectrograms
- Documentation: https://jmg049.github.io/Spectrograms
- PyPI: https://pypi.org/project/spectrograms/
Contributing
Contributions are welcome! Please see the main repository for contribution guidelines.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spectrograms-1.4.0.tar.gz.
File metadata
- Download URL: spectrograms-1.4.0.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe89595042ce9a5d882500d49018b20546d3d2f77a0f4aa9a491e565d056b8a7
|
|
| MD5 |
253b0969902e2ce792ffc37c6c09430d
|
|
| BLAKE2b-256 |
8a3a631935164bb42b56d865a8dca451904f4f5da7a1acce7f54b0714273c224
|
File details
Details for the file spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 4.6 MB
- Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c17e0dea6f1bdf7c4c2686f693d7ef3e900c1c57cf2dff1c5ff16b0c34877d2b
|
|
| MD5 |
42bca663b30b0685cf86a307343e7728
|
|
| BLAKE2b-256 |
75b1ec280d0d6b17ee0a82fdb0e6a2618d8c68daa3c3d7a152396aaafcf07964
|