Phase-preserving spectrogram encoder/decoder for high-quality audio reconstruction
Project description
phase-spectrogram
Phase-preserving spectrogram encoder/decoder for high-quality audio reconstruction.
This Python package implements a phase-preserving spectrogram encoder/decoder that converts audio waveforms to spectrograms and back to audio without loss of phase information, enabling high-quality audio reconstruction.
Installation
pip install phase-spectrogram
Quick Start
from phase import Phase
# Initialize with sample rate
phase = Phase(sample_rate=44100)
# Convert audio file to spectrogram image
phase.to_phase_wav('input.wav', 'output.png')
# Convert spectrogram image back to audio
phase.to_wav_png('output.png', 'reconstructed.wav')
Features
- Phase-Preserving: Retains both magnitude and phase information for lossless reconstruction
- High-Quality Audio: Near-lossless audio reconstruction without iterative algorithms
- Multiple Sample Rates: Support for 8kHz, 11.025kHz, 16kHz, 22.05kHz, 24kHz, 32kHz, 44.1kHz, and 48kHz
- Flexible Formats: WAV and FLAC input support
- PNG Export: Save spectrograms as images for visualization or ML applications
- HDR Support: Optional 16-bit per channel PNG for higher dynamic range
Usage Examples
Basic Audio Processing
import numpy as np
from phase import Phase
# Create Phase encoder/decoder
phase = Phase(sample_rate=44100)
# Generate test audio (1 second of 440Hz sine wave)
t = np.linspace(0, 1.0, 44100)
audio = np.sin(2 * np.pi * 440 * t)
# Convert to spectrogram
spectrogram = phase.to_phase(audio)
# Reconstruct audio
reconstructed = phase.from_phase(spectrogram)
File Conversion
from phase import Phase
phase = Phase(sample_rate=44100)
# WAV to PNG
phase.to_phase_wav('input.wav', 'spectrogram.png')
# FLAC to PNG
phase.to_phase_flac('input.flac', 'spectrogram.png')
# PNG back to WAV
phase.to_wav_png('spectrogram.png', 'output.wav')
Advanced Configuration
from phase import Phase
# High Dynamic Range (16-bit per channel)
phase_hdr = Phase(
sample_rate=48000,
HDR=True,
volume_boost=2.0,
y_reverse=False
)
# Custom window and FFT resolution
phase_custom = Phase(
sample_rate=44100,
window=2560,
resolut=8192
)
# With inverse hyperbolic sine compression
phase_ihs = Phase(
sample_rate=44100,
IHS=True
)
API Reference
Phase Class
Phase(sample_rate=None, num_freqs=None, window=1280,
resolut=4096, y_reverse=True, volume_boost=0.0,
HDR=False, IHS=False)
Parameters:
sample_rate(int): Audio sample rate (8000, 11025, 16000, 22050, 24000, 32000, 44100, or 48000)num_freqs(int): Number of frequency bins (auto-set based on sample_rate if not provided)window(int): STFT window size (default: 1280)resolut(int): FFT resolution (default: 4096)y_reverse(bool): Flip Y-axis in PNG images (default: True)volume_boost(float): Volume multiplier for reconstruction (default: 0.0 = no boost)HDR(bool): Use 16 bits per channel PNG (default: False = 8 bits per channel)IHS(bool): Enable inverse hyperbolic sine compression (default: False)
Methods
to_phase(audio_buffer)
Convert audio buffer to phase-preserving spectrogram.
Parameters:
audio_buffer(numpy.ndarray): 1D array of float64 audio samples
Returns:
numpy.ndarray: 2D array of shape (time_frames * num_freqs, 2)
from_phase(spectrogram)
Reconstruct audio from phase-preserving spectrogram.
Parameters:
spectrogram(numpy.ndarray): 2D array of shape (time_frames * num_freqs, 2)
Returns:
numpy.ndarray: 1D array of float64 audio samples
to_phase_wav(input_file, output_file)
Convert WAV file to PNG spectrogram.
Parameters:
input_file(str): Path to input WAV fileoutput_file(str): Path to output PNG file
to_phase_flac(input_file, output_file)
Convert FLAC file to PNG spectrogram.
Parameters:
input_file(str): Path to input FLAC fileoutput_file(str): Path to output PNG file
to_wav_png(input_file, output_file)
Convert PNG spectrogram to WAV file.
Parameters:
input_file(str): Path to input PNG fileoutput_file(str): Path to output WAV file
Returns:
int: Detected sample rate from the spectrogram
Supported Sample Rates
| Sample Rate | Frequency Bins | Family |
|---|---|---|
| 8000 Hz | 768 | 48k |
| 16000 Hz | 768 | 48k |
| 24000 Hz | 768 | 48k |
| 32000 Hz | 768 | 48k |
| 48000 Hz | 768 | 48k |
| 11025 Hz | 836 | 44.1k |
| 22050 Hz | 836 | 44.1k |
| 44100 Hz | 836 | 44.1k |
Note: HDR mode doubles the frequency bin count.
Requirements
- Python >= 3.7
- numpy >= 1.20.0
- scipy >= 1.7.0
- soundfile >= 0.10.0
- Pillow >= 8.0.0
- pypng >= 0.20220715.0
License
See LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Related Projects
This is a Python implementation based on the Go package gomel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phase_spectrogram-0.0.11.tar.gz.
File metadata
- Download URL: phase_spectrogram-0.0.11.tar.gz
- Upload date:
- Size: 63.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a56838d1bc85e0fff205419bc2089a6df46edd0bfd425e3838df7701aac699db
|
|
| MD5 |
b500f231a87e14c217a526700d120d84
|
|
| BLAKE2b-256 |
fe7da732fd42c57bafdacc2a5736c917622047efd5fda5741669519ffae34972
|
Provenance
The following attestation bundles were made for phase_spectrogram-0.0.11.tar.gz:
Publisher:
python-publish.yml on neurlang/gomel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phase_spectrogram-0.0.11.tar.gz -
Subject digest:
a56838d1bc85e0fff205419bc2089a6df46edd0bfd425e3838df7701aac699db - Sigstore transparency entry: 1264462446
- Sigstore integration time:
-
Permalink:
neurlang/gomel@4daf8581366e427c3888046b1c475c445b4fcca6 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/neurlang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4daf8581366e427c3888046b1c475c445b4fcca6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file phase_spectrogram-0.0.11-py3-none-any.whl.
File metadata
- Download URL: phase_spectrogram-0.0.11-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66e91eab5739227029aa93c6d1208a0e36094a260627311b6158c7aca3164936
|
|
| MD5 |
d70877fed6020ce845defaead1248b1e
|
|
| BLAKE2b-256 |
52aadc58dd4ce8ea6d3971a0fdbb53f6e51a6a76582432fe9193e32850962544
|
Provenance
The following attestation bundles were made for phase_spectrogram-0.0.11-py3-none-any.whl:
Publisher:
python-publish.yml on neurlang/gomel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phase_spectrogram-0.0.11-py3-none-any.whl -
Subject digest:
66e91eab5739227029aa93c6d1208a0e36094a260627311b6158c7aca3164936 - Sigstore transparency entry: 1264462588
- Sigstore integration time:
-
Permalink:
neurlang/gomel@4daf8581366e427c3888046b1c475c445b4fcca6 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/neurlang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4daf8581366e427c3888046b1c475c445b4fcca6 -
Trigger Event:
release
-
Statement type: