Fast spectrogram computation library powered by Rust

These details have not been verified by PyPI

Project links

Project description

Spectrograms

Fast spectrogram computation library powered by Rust

Features

Multiple Spectrogram Types: Linear, Mel, ERB frequency scales
Multiple Amplitude Scales: Power, Magnitude, Decibels
High Performance: Rust implementation with Python bindings
Plan-based Computation: Reuse FFT plans for efficient batch processing
Rich Audio Features: MFCC, Chromagram, CQT support
Streaming Support: Frame-by-frame processing for real-time applications

Installation

pip install spectrograms

Benchmark Results

Check out the benchmark results for detailed performance comparisons against NumPy and SciPy implementations across various configurations and signal types.

Quick Start

import numpy as np
import spectrograms as sg

# Generate a test signal
sr = 16000
t = np.linspace(0, 1, sr)
samples = np.sin(2 * np.pi * 440 * t)

# Create parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=sr)

# Compute spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)

print(f"Shape: {spec.shape}")
print(f"Frequency range: {spec.frequency_range()}")
print(f"Duration: {spec.duration():.2f}s")

Mel Spectrogram Example

import numpy as np
import spectrograms as sg

# Load your audio data
samples = np.random.randn(16000)  # Replace with real audio
sr = 16000

# Configure parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=sr)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)

# Compute mel spectrogram in dB scale
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)

# Access the data
spectrogram_data = mel_spec.data  # NumPy array (n_mels, n_frames)
frequencies = mel_spec.frequencies  # Mel frequencies
times = mel_spec.times  # Time axis in seconds

Efficient Batch Processing

For processing multiple audio files, use the planner API to reuse FFT plans:

import numpy as np
import spectrograms as sg

# Setup
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=16000)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)

# Create plan once
planner = sg.SpectrogramPlanner()
plan = planner.mel_db_plan(params, mel_params, db_params)

# Reuse plan for multiple signals (much faster!)
signals = [np.random.randn(16000) for _ in range(100)]
spectrograms = [plan.compute(signal) for signal in signals]

Advanced Features

MFCCs (Mel-Frequency Cepstral Coefficients)

stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning)
mfcc_params = sg.MfccParams(n_mfcc=13)

mfccs = sg.compute_mfcc(samples, stft, sample_rate=16000, n_mels=40, mfcc_params=mfcc_params)
# Returns shape: (n_mfcc, n_frames)

Chromagram (Pitch Class Profiles)

stft = sg.StftParams(n_fft=4096, hop_size=512, window=sg.WindowType.hanning)
chroma_params = sg.ChromaParams.music_standard()

chroma = sg.compute_chromagram(samples, stft, sample_rate=22050, chroma_params=chroma_params)
# Returns shape: (12, n_frames) - one row per pitch class

Raw STFT

params = sg.SpectrogramParams.music_default(sample_rate=44100)
stft_data = sg.compute_stft(samples, params)
# Returns complex-valued STFT matrix

Window Functions

Supported window functions:

"hanning" - Hann window (default)
"hamming" - Hamming window
"blackman" - Blackman window
"rectangular" - Rectangular window (no windowing)
"kaiser=beta" - Kaiser window with beta parameter (e.g., "kaiser=5.0")
"gaussian=std" - Gaussian window with std parameter (e.g., "gaussian=0.4")

Example:

stft = sg.StftParams(n_fft=512, hop_size=256, window="kaiser=8.0")

Default Presets

# Speech processing preset (n_fft=512, hop_size=160)
params = sg.SpectrogramParams.speech_default(sample_rate=16000)

# Music processing preset (n_fft=2048, hop_size=512)
params = sg.SpectrogramParams.music_default(sample_rate=44100)

API Reference

Parameter Classes

StftParams(n_fft, hop_size, window, centre=True) - STFT configuration
SpectrogramParams(stft, sample_rate) - Base spectrogram parameters
MelParams(n_mels, f_min, f_max) - Mel filterbank parameters
ErbParams(n_filters, f_min, f_max) - ERB filterbank parameters
LogParams(floor_db) - Decibel conversion parameters
CqtParams(bins_per_octave, n_octaves, f_min) - Constant-Q parameters
ChromaParams(tuning, f_min, f_max, norm) - Chromagram parameters
MfccParams(n_mfcc) - MFCC parameters

Spectrogram Result

The Spectrogram object returned by all compute functions has:

.data - NumPy array with shape (n_bins, n_frames)
.frequencies - Frequency axis values (Hz or scale-specific)
.times - Time axis values (seconds)
.n_bins - Number of frequency bins
.n_frames - Number of time frames
.shape - Tuple (n_bins, n_frames)
.frequency_range() - Min/max frequencies
.duration() - Total duration in seconds
.params - Original computation parameters

Note: The Spectrogram object can be directly used as a NumPy array. For example:

import numpy as np
import spectrograms as sg

sine_wave = np.sin(2 * np.pi * 440 * np.linspace(0, 1.0, SAMPLE_RATE, endpoint=False))

stft_params = sg.StftParams(n_fft=1024, hop_size=256, window=sg.WindowType.hanning)

spectrogram_params = sg.SpectrogramParams(stft_params, SAMPLE_RATE)

spectrogram = sg.compute_linear_power_spectrogram(sine_wave, spectrogram_params)

np.abs(spectrogram).shape  # works just fine

Binaural Spectrograms

Binaural spectrograms capture spatial audio cues from stereo or binaural recordings. Based on Binaspect.

import spectrograms as sg

# stereo_audio: numpy array of shape (2, n_samples) — [left, right]
stft = sg.StftParams(n_fft=4096, hop_size=1024, window=sg.WindowType.hanning)
params = sg.SpectrogramParams(stft, sample_rate=44100)

# ITD — Interaural Time Difference (seconds), low-frequency localisation cue
itd_params = sg.ITDSpectrogramParams(params, start_freq=50.0, end_freq=620.0)
itd = sg.compute_itd_spectrogram(stereo_audio, itd_params)
# shape: (53, n_frames)  [with n_fft=4096 at 44100 Hz]

# IPD — Interaural Phase Difference (radians), optionally phase-wrapped
ipd_params = sg.IPDSpectrogramParams(params, start_freq=50.0, end_freq=620.0, wrapped=True)
ipd = sg.compute_ipd_spectrogram(stereo_audio, ipd_params)

# ILD — Interaural Level Difference (dB), high-frequency localisation cue
ild_params = sg.ILDSpectrogramParams(params, start_freq=1700.0, end_freq=4600.0)
ild = sg.compute_ild_spectrogram(stereo_audio, ild_params)
# shape: (269, n_frames)

# ILR — Interaural Level Ratio (normalised, range [-1, 1])
ilr_params = sg.ILRSpectrogramParams(params, start_freq=1700.0, end_freq=4600.0)
ilr = sg.compute_ilr_spectrogram(stereo_audio, ilr_params)

# Comparison / diff functions
itd_diff, mean_degrees, mean_itd = sg.compute_itd_spectrogram_diff(
    ref_audio, test_audio, itd_params
)
print(f"Mean ITD difference: {mean_degrees:.2f}°  ({mean_itd*1e6:.1f} µs)")

ilr_diff, mean_ilr = sg.compute_ilr_spectrogram_diff(
    ref_audio, test_audio, ilr_params
)

Convenience Functions

All compute functions release the Python GIL during computation.

Linear spectrograms:

compute_linear_power_spectrogram(samples, params)
compute_linear_magnitude_spectrogram(samples, params)
compute_linear_db_spectrogram(samples, params, db_params)

Mel spectrograms:

compute_mel_power_spectrogram(samples, params, mel_params)
compute_mel_magnitude_spectrogram(samples, params, mel_params)
compute_mel_db_spectrogram(samples, params, mel_params, db_params)

ERB spectrograms:

compute_erb_power_spectrogram(samples, params, erb_params)
compute_erb_magnitude_spectrogram(samples, params, erb_params)
compute_erb_db_spectrogram(samples, params, erb_params, db_params)

Other features:

compute_stft(samples, params) - Raw STFT (complex output)
compute_cqt(samples, sample_rate, cqt_params, hop_size) - Constant-Q Transform
compute_chromagram(samples, stft_params, sample_rate, chroma_params)
compute_mfcc(samples, stft_params, sample_rate, n_mels, mfcc_params)

Binaural spectrograms:

compute_itd_spectrogram(audio, params) - Interaural Time Difference
compute_itd_spectrogram_diff(reference, test, params) - ITD comparison
compute_ipd_spectrogram(audio, params) - Interaural Phase Difference
compute_ild_spectrogram(audio, params) - Interaural Level Difference
compute_ilr_spectrogram(audio, params) - Interaural Level Ratio
compute_ilr_spectrogram_diff(reference, test, params) - ILR comparison

Planner API

Create a planner and reusable plans for batch processing:

planner = sg.SpectrogramPlanner()

# Create plans (one per spectrogram type)
plan = planner.linear_power_plan(params)
plan = planner.mel_db_plan(params, mel_params, db_params)
# ... and 7 other plan types

# Use plans
spec = plan.compute(samples)
frame = plan.compute_frame(samples, frame_idx)
shape = plan.output_shape(signal_length)

Available plan types match the convenience functions:

linear_power_plan, linear_magnitude_plan, linear_db_plan
mel_power_plan, mel_magnitude_plan, mel_db_plan
erb_power_plan, erb_magnitude_plan, erb_db_plan

Performance Notes

Plan Reuse: Creating FFT plans is expensive. Reuse plans via the SpectrogramPlanner API for a speedup in batch processing.
FFT Size: Powers of 2 (256, 512, 1024, 2048) are significantly faster than arbitrary sizes.
GIL Release: All compute functions release the Python GIL, allowing parallel processing of multiple audio files.
Backend: The default realfft backend is pure Rust with no system dependencies. Try building from source to enable the FFTW backend. It may offer better performance.

License

MIT License

Contributing

Contributions are welcome! Please see the main repository for contribution guidelines.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.4.0

Apr 29, 2026

1.0.1

Feb 8, 2026

0.2.3

Jan 29, 2026

0.2.2

Jan 29, 2026

0.2.1

Jan 28, 2026

0.2.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectrograms-1.4.0.tar.gz (2.0 MB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl (4.6 MB view details)

Uploaded Apr 29, 2026 CPython 3.12manylinux: glibc 2.35+ x86-64

File details

Details for the file spectrograms-1.4.0.tar.gz.

File metadata

Download URL: spectrograms-1.4.0.tar.gz
Upload date: Apr 29, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.12.4

File hashes

Hashes for spectrograms-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`fe89595042ce9a5d882500d49018b20546d3d2f77a0f4aa9a491e565d056b8a7`
MD5	`253b0969902e2ce792ffc37c6c09430d`
BLAKE2b-256	`8a3a631935164bb42b56d865a8dca451904f4f5da7a1acce7f54b0714273c224`

See more details on using hashes here.

File details

Details for the file spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

Download URL: spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl
Upload date: Apr 29, 2026
Size: 4.6 MB
Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.12.4

File hashes

Hashes for spectrograms-1.4.0-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm	Hash digest
SHA256	`c17e0dea6f1bdf7c4c2686f693d7ef3e900c1c57cf2dff1c5ff16b0c34877d2b`
MD5	`42bca663b30b0685cf86a307343e7728`
BLAKE2b-256	`75b1ec280d0d6b17ee0a82fdb0e6a2618d8c68daa3c3d7a152396aaafcf07964`

See more details on using hashes here.

spectrograms 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Spectrograms

Fast spectrogram computation library powered by Rust

Features

Installation

Benchmark Results

Quick Start

Mel Spectrogram Example

Efficient Batch Processing

Advanced Features

MFCCs (Mel-Frequency Cepstral Coefficients)

Chromagram (Pitch Class Profiles)

Raw STFT

Window Functions

Default Presets

API Reference

Parameter Classes

Spectrogram Result

Binaural Spectrograms

Convenience Functions

Planner API

Performance Notes

License

Links

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes