Fast spectrogram computation library powered by Rust
Project description
Features
- Multiple Spectrogram Types: Linear, Mel, ERB frequency scales
- Multiple Amplitude Scales: Power, Magnitude, Decibels
- High Performance: Rust implementation with Python bindings
- Plan-based Computation: Reuse FFT plans for efficient batch processing
- Rich Audio Features: MFCC, Chromagram, CQT support
- Streaming Support: Frame-by-frame processing for real-time applications
Installation
pip install spectrograms
For the FFTW-accelerated version (requires system FFTW library) you currently must build from source:
git clone https://github.com/jmg049/Spectrograms.git
cd Spectrograms/
# In pyproject.toml under [tool.maturin], change "realfft" to `"fftw"
maturin develop --release
Benchmark Results
Check out the benchmark results for detailed performance comparisons against NumPy and SciPy implementations across various configurations and signal types.
Quick Start
import numpy as np
import spectrograms as sg
# Generate a test signal
sr = 16000
t = np.linspace(0, 1, sr)
samples = np.sin(2 * np.pi * 440 * t)
# Create parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning())
params = sg.SpectrogramParams(stft, sample_rate=sr)
# Compute spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)
print(f"Shape: {spec.shape}")
print(f"Frequency range: {spec.frequency_range()}")
print(f"Duration: {spec.duration():.2f}s")
Mel Spectrogram Example
import numpy as np
import spectrograms as sg
# Load your audio data
samples = np.random.randn(16000) # Replace with real audio
sr = 16000
# Configure parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning())
params = sg.SpectrogramParams(stft, sample_rate=sr)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)
# Compute mel spectrogram in dB scale
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)
# Access the data
spectrogram_data = mel_spec.data # NumPy array (n_mels, n_frames)
frequencies = mel_spec.frequencies # Mel frequencies
times = mel_spec.times # Time axis in seconds
Efficient Batch Processing
For processing multiple audio files, use the planner API to reuse FFT plans:
import numpy as np
import spectrograms as sg
# Setup
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning())
params = sg.SpectrogramParams(stft, sample_rate=16000)
mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=8000.0)
db_params = sg.LogParams(floor_db=-80.0)
# Create plan once
planner = sg.SpectrogramPlanner()
plan = planner.mel_db_plan(params, mel_params, db_params)
# Reuse plan for multiple signals (much faster!)
signals = [np.random.randn(16000) for _ in range(100)]
spectrograms = [plan.compute(signal) for signal in signals]
The planner API provides 1.5-3x speedup for batch processing by reusing FFT plans.
Advanced Features
MFCCs (Mel-Frequency Cepstral Coefficients)
stft = sg.StftParams(n_fft=512, hop_size=256, window=sg.WindowType.hanning())
mfcc_params = sg.MfccParams(n_mfcc=13)
mfccs = sg.compute_mfcc(samples, stft, sample_rate=16000, n_mels=40, mfcc_params=mfcc_params)
# Returns shape: (n_mfcc, n_frames)
Chromagram (Pitch Class Profiles)
stft = sg.StftParams(n_fft=4096, hop_size=512, window=sg.WindowType.hanning())
chroma_params = sg.ChromaParams.music_standard()
chroma = sg.compute_chromagram(samples, stft, sample_rate=22050, chroma_params=chroma_params)
# Returns shape: (12, n_frames) - one row per pitch class
Raw STFT
params = sg.SpectrogramParams.music_default(sample_rate=44100)
stft_data = sg.compute_stft(samples, params)
# Returns complex-valued STFT matrix
Window Functions
Supported window functions:
"hanning"- Hann window (default)"hamming"- Hamming window"blackman"- Blackman window"rectangular"- Rectangular window (no windowing)"kaiser=beta"- Kaiser window with beta parameter (e.g.,"kaiser=5.0")"gaussian=std"- Gaussian window with std parameter (e.g.,"gaussian=0.4")
Example:
stft = sg.StftParams(n_fft=512, hop_size=256, window="kaiser=8.0")
Default Presets
# Speech processing preset (n_fft=512, hop_size=160)
params = sg.SpectrogramParams.speech_default(sample_rate=16000)
# Music processing preset (n_fft=2048, hop_size=512)
params = sg.SpectrogramParams.music_default(sample_rate=44100)
API Reference
Parameter Classes
StftParams(n_fft, hop_size, window, centre=True)- STFT configurationSpectrogramParams(stft, sample_rate)- Base spectrogram parametersMelParams(n_mels, f_min, f_max)- Mel filterbank parametersErbParams(n_filters, f_min, f_max)- ERB filterbank parametersLogParams(floor_db)- Decibel conversion parametersCqtParams(bins_per_octave, n_octaves, f_min)- Constant-Q parametersChromaParams(tuning, f_min, f_max, norm)- Chromagram parametersMfccParams(n_mfcc)- MFCC parameters
Spectrogram Result
The Spectrogram object returned by all compute functions has:
.data- NumPy array with shape (n_bins, n_frames).frequencies- Frequency axis values (Hz or scale-specific).times- Time axis values (seconds).n_bins- Number of frequency bins.n_frames- Number of time frames.shape- Tuple (n_bins, n_frames).frequency_range()- Min/max frequencies.duration()- Total duration in seconds.params- Original computation parameters
Note: The Spectrogram object can be directly used as a NumPy array. For example:
import numpy as np
import spectrograms as sg
sine_wave = np.sin(2 * np.pi * 440 * np.linspace(0, 1.0, SAMPLE_RATE, endpoint=False))
stft_params = sg.StftParams(n_fft=1024, hop_size=256, window=sg.WindowType.hanning)
spectrogram_params = sg.SpectrogramParams(stft_params, SAMPLE_RATE)
spectrogram = sg.compute_linear_power_spectrogram(sine_wave, spectrogram_params)
np.abs(spectrogram).shape # works just fine
Convenience Functions
All compute functions release the Python GIL during computation.
Linear spectrograms:
compute_linear_power_spectrogram(samples, params)compute_linear_magnitude_spectrogram(samples, params)compute_linear_db_spectrogram(samples, params, db_params)
Mel spectrograms:
compute_mel_power_spectrogram(samples, params, mel_params)compute_mel_magnitude_spectrogram(samples, params, mel_params)compute_mel_db_spectrogram(samples, params, mel_params, db_params)
ERB spectrograms:
compute_erb_power_spectrogram(samples, params, erb_params)compute_erb_magnitude_spectrogram(samples, params, erb_params)compute_erb_db_spectrogram(samples, params, erb_params, db_params)
Other features:
compute_stft(samples, params)- Raw STFT (complex output)compute_cqt(samples, sample_rate, cqt_params, hop_size)- Constant-Q Transformcompute_chromagram(samples, stft_params, sample_rate, chroma_params)compute_mfcc(samples, stft_params, sample_rate, n_mels, mfcc_params)
Planner API
Create a planner and reusable plans for batch processing:
planner = sg.SpectrogramPlanner()
# Create plans (one per spectrogram type)
plan = planner.linear_power_plan(params)
plan = planner.mel_db_plan(params, mel_params, db_params)
# ... and 7 other plan types
# Use plans
spec = plan.compute(samples)
frame = plan.compute_frame(samples, frame_idx)
shape = plan.output_shape(signal_length)
Available plan types match the convenience functions:
linear_power_plan,linear_magnitude_plan,linear_db_planmel_power_plan,mel_magnitude_plan,mel_db_planerb_power_plan,erb_magnitude_plan,erb_db_plan
Performance Notes
- Plan Reuse: Creating FFT plans is expensive. Reuse plans via the
SpectrogramPlannerAPI for 1.5-3x speedup in batch processing. - FFT Size: Powers of 2 (256, 512, 1024, 2048) are significantly faster than arbitrary sizes.
- GIL Release: All compute functions release the Python GIL, allowing parallel processing of multiple audio files.
- Backend: The default
realfftbackend is pure Rust with no system dependencies. Try building from source to enable the FFTW backend. It may offer better performance.
License
MIT License
Links
- GitHub: https://github.com/jmg049/Spectrograms
- Documentation: https://jmg049.github.io/Spectrograms
- PyPI: https://pypi.org/project/spectrograms/
Contributing
Contributions are welcome! Please see the main repository for contribution guidelines.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spectrograms-0.2.3.tar.gz.
File metadata
- Download URL: spectrograms-0.2.3.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0871e2a265b0bd8bd0b88354520817029fc84e85b95155de0678f52d85ee3cf6
|
|
| MD5 |
83fa98202e717758b54bc313cae8bf4b
|
|
| BLAKE2b-256 |
158d58dbbc22383fbe89cfbbcb0f02668829b74cf3f50181428ce8f16b19f32a
|
File details
Details for the file spectrograms-0.2.3-cp312-cp312-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: spectrograms-0.2.3-cp312-cp312-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 829.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa2e2c4298220132c73564f129ee4cc9f97042e123786c9ac53b051c60afb64e
|
|
| MD5 |
933e373c5660d2594eaf8bd46408ab22
|
|
| BLAKE2b-256 |
0901ddd96f784feb605404ff32728c46df87423e6b22b0e48ac6b96cba4538e7
|