A library for converting between audio files and spectrograms

These details have not been verified by PyPI

Project links

Project description

Otozu (音図) - Audio Spectrogram Conversion Library

Otozu (音図), combining the Japanese words for "sound" (音) and "diagram" (図), is a Python library that simplifies the conversion between audio files and spectrograms. Whether you're working on audio analysis, machine learning with sound data, or just curious about sound visualization, Otozu provides an intuitive interface for bidirectional conversion between audio files and their spectrogram representations.

Why?

Ever needed to convert audio files into spectrograms? Maybe you're training a neural network for audio classification and want to understand what your model is actually learning. Otozu makes it easy to convert your audio dataset into spectrograms for training, but it goes further - you can also convert those learned representations back into audio. Want to know what features your CNN's first layer is detecting? Convert its activation patterns back into sound. Curious about what "maximum meowing" sounds like to your cat detector? Otozu can help you translate those neural patterns back into audio that humans can understand.

Features

Convert audio files to mel spectrograms (saved as PNG or NPY files)
Reconstruct audio from spectrograms using the Griffin-Lim algorithm
Support for multiple audio formats (WAV, MP3, FLAC, OGG)
Batch processing of entire directories
Command-line interface for easy usage
Python API for integration into your projects
Configurable spectrogram parameters
Preserves metadata for accurate audio reconstruction

Installation

pip install otozu

Quick Start

Command Line Usage

Convert an audio file to a spectrogram:

otozu audio-to-spec input.wav output_directory/

Convert all audio files in a directory:

otozu audio-to-spec input_directory/ output_directory/ --label dataset1

Reconstruct audio from a spectrogram:

otozu spec-to-audio spectrogram.png output_directory/

Python API Usage

from otozu import AudioSpectrogramConverter
from otozu.config import SpectrogramConfig

# Create a converter with custom settings
config = SpectrogramConfig(
    n_fft=2048,
    hop_length=512,
    n_mels=128
)
converter = AudioSpectrogramConverter(config)

# Convert audio to spectrogram
spec_path, metadata = converter.audio_to_spectrogram(
    "input.wav",
    "output_directory"
)

# Reconstruct audio from spectrogram
audio_path = converter.spectrogram_to_audio(
    "spectrogram.png",
    "reconstructed.wav"
)

Advanced Configuration

The SpectrogramConfig class allows you to customize various aspects of the spectrogram generation:

from otozu.config import SpectrogramConfig

config = SpectrogramConfig(
    n_fft=2048,          # FFT window size
    hop_length=512,      # Number of samples between successive frames
    n_mels=128,          # Number of mel bands
    sample_rate=44100,   # Target sample rate (None for original)
    center=False,        # Whether to pad signal at edges
    normalized_range=(0, 65535)  # Output range for normalization
)

Technical Details

Spectrogram Generation Process

Audio Loading: Files are loaded using librosa with configurable sample rate
Mel Spectrogram Creation: Converts audio to mel-scale spectrogram
Power to dB: Converts power spectrogram to decibel units
Normalization: Scales values to 16-bit range for PNG storage
Metadata Preservation: Stores original range and sample rate for reconstruction

Audio Reconstruction Process

Spectrogram Loading: Loads normalized spectrogram data
Denormalization: Restores original decibel scale
Mel to Linear: Converts mel-scale spectrogram to linear-scale
Griffin-Lim Algorithm: Estimates phase information
Waveform Generation: Produces final audio output

A Quick Note About Audio Reconstruction

When you convert a spectrogram back to audio, it won't sound exactly like the original. When we create spectrograms, we lose some information about the sound (specifically, the phase information). We do our best to guess what that missing information should be with the Griffin-Lim algorithm, but it's not perfect. The result is usually pretty good for most purposes, but it's going to be quite lossy.

Contributing

Found a bug? Have an idea for a feature? PRs are always welcome! If you're thinking of adding something big, maybe open an issue first so we can chat about it.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Otozu is basically just a friendly wrapper around several excellent libraries:

librosa for audio processing
numpy for numerical operations
Pillow for image handling
soundfile for audio file I/O
typer for CLI interface

Need Help?

If something's not working right or you're not sure how to do something, open an issue on GitHub. I'll do my best to help out when I can.

Citation

If you use Otozu in your research, please cite:

@software{otozu2024,
  title = {Otozu: Audio Spectrogram Conversion Library},
  author = {Yamashiro, John},
  year = {2024},
  url = {https://github.com/jyaaan/otozu}
}

Built with 🎵 by an engineer who got tired of writing the same spectrogram conversion code over and over again. Bulk of code was taken from my sneeze detector, Kushami.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

otozu-0.1.0.tar.gz (7.6 kB view details)

Uploaded Dec 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

otozu-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Dec 5, 2024 Python 3

File details

Details for the file otozu-0.1.0.tar.gz.

File metadata

Download URL: otozu-0.1.0.tar.gz
Upload date: Dec 5, 2024
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for otozu-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dabff51b0a27450749f5aeb8d70da98390364f3877f30fde5dd828d589aa3ac5`
MD5	`18c2f2f7f403a13b2a1babf638692d24`
BLAKE2b-256	`129192f75f275d716f01e75a82bd232046f133a2fc2c8690a1a2f895ece3dc15`

See more details on using hashes here.

File details

Details for the file otozu-0.1.0-py3-none-any.whl.

File metadata

Download URL: otozu-0.1.0-py3-none-any.whl
Upload date: Dec 5, 2024
Size: 7.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for otozu-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97f16ffd70372eaae9de14d0e3631fe61e245076e3ac86fb9ef3800fe60b8913`
MD5	`6f77e64a35ed948b79f823c80cc44d9c`
BLAKE2b-256	`139f726828decebf5f3c393c036962f13f321bead4730755b71da5ce3c96b9ca`

See more details on using hashes here.

otozu 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Otozu (音図) - Audio Spectrogram Conversion Library

Why?

Features

Installation

Quick Start

Command Line Usage

Python API Usage

Advanced Configuration

Technical Details

Spectrogram Generation Process

Audio Reconstruction Process

A Quick Note About Audio Reconstruction

Contributing

License

Acknowledgments

Need Help?

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes