Simple Audio Pre-Processing Library for deep learning audio applications

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Sound/Audio :: Analysis

Project description

🎶 SAPPL: Simple Audio Pre-Processing Library

SAPPL (Simple Audio Pre-Processing Library) is a toolkit for audio preprocessing, designed specifically for deep learning applications such as speech classification, audio representation learning, and general audio preprocessing tasks. With SAPPL, you can easily load, transform, and extract features from audio files with minimal boilerplate code. Perfect for building model-ready audio datasets.

🔖 Table of Contents

🎶 SAPPL: Simple Audio Pre-Processing Library

Installation

Install the package directly from PyPI using pip:

pip install sappl

or clone the repository and install it locally:

git clone https://github.com/MorenoLaQuatra/sappl
cd sappl
pip install .

📦 Implemented Modules

1. `io.py` - Audio I/O Operations

Handles loading and saving audio files with support for various file formats. Resamples audio to the specified sample rate and converts it to mono if required.

Functions

load_audio(file_path, sample_rate=16000, mono=True): Loads audio, resamples to the target rate, and optionally converts to mono.
save_audio(file_path, audio, sample_rate=16000): Saves audio data to a specified path and supports .wav, .flac, .mp3, and .ogg formats.

Example

from sappl.io import load_audio, save_audio

audio_data = load_audio("path/to/file.wav", sample_rate=16000, mono=True)
print("Audio loaded:", audio_data.shape)

save_audio("path/to/save_file.wav", audio_data, sample_rate=16000)
print("Audio saved successfully.")

2. `transform.py` - Time-Frequency Transformations

Performs transformations like Short-Time Fourier Transform (STFT) and inverse STFT, as well as conversions between magnitude and phase. All outputs follow the (T, F) format, where T is time and F is frequency, making it ideal for deep learning.

Functions

stft(audio, n_fft=2048, hop_length=512, win_length=None, center=False, pad_mode='reflect'): Computes the STFT of an audio signal and returns a (T, F) matrix. If center=True, pads the signal so each frame is centered around its FFT window. The pad_mode parameter defines the padding mode when center=True.
istft(stft_matrix, hop_length=512, win_length=None, center=False): Reconstructs audio from an STFT matrix, with the option to trim padding if center=True.
magphase(stft_matrix): Separates the magnitude and phase components of the STFT matrix, returning both in (T, F) format.
compute_mel_spectrogram(audio, sample_rate=16000, n_fft=2048, hop_length=512, n_mels=128, f_min=0.0, f_max=None, center=False, pad_mode='reflect'): Computes the Mel spectrogram of the audio signal in dB scale. If center=True, pads the signal so each frame is centered around its FFT window, with pad_mode controlling the padding style.
reconstruct_waveform(magnitude, phase, hop_length=512, win_length=None, center=False): Reconstructs the waveform from magnitude and phase spectrograms, with the option to trim padding when center=True.

Example

from sappl.transform import stft, istft, magphase, compute_mel_spectrogram

# Compute STFT
from sappl.transform import stft, istft, magphase, compute_mel_spectrogram, reconstruct_waveform
from sappl.io import load_audio

# Load audio data
audio_data = load_audio("path/to/audio_file.wav", sample_rate=16000, mono=True)

# Compute STFT with center=False (no padding) and default hop length and FFT size
stft_matrix = stft(audio_data, center=False)
print("STFT shape:", stft_matrix.shape)

# Separate magnitude and phase
magnitude, phase = magphase(stft_matrix)
print("Magnitude shape:", magnitude.shape)
print("Phase shape:", phase.shape)

# Compute Mel spectrogram with center=False to avoid padding
mel_spec = compute_mel_spectrogram(audio_data, sample_rate=16000, center=False)
print("Mel spectrogram shape:", mel_spec.shape)

# Reconstruct waveform from magnitude and phase with center=False
reconstructed_audio = reconstruct_waveform(magnitude, phase, hop_length=512, center=False)
print("Reconstructed audio shape:", reconstructed_audio.shape)

3. `utils.py` - Utility Functions

Provides utilities for handling audio data, such as padding, truncation, mono conversion, and normalization methods.

Functions

convert_to_mono(audio): Converts stereo audio to mono by averaging channels.
pad_audio(audio, max_length, sample_rate, padding_value=0.0): Pads audio to a specified duration.
truncate_audio(audio, max_length, sample_rate): Truncates audio to a specified duration.
normalize(audio, method="min_max"): Normalizes audio using min_max, standard, peak, or rms methods.
rms_normalize(audio, target_db=-20.0): Normalizes audio to a target RMS level in dB.

Example

from sappl.utils import convert_to_mono, pad_audio, truncate_audio, normalize, rms_normalize

# Convert to mono
mono_audio = convert_to_mono(audio_data)
print("Mono audio shape:", mono_audio.shape)

# Pad to 5 seconds
padded_audio = pad_audio(mono_audio, max_length=5.0, sample_rate=16000)
print("Padded audio shape:", padded_audio.shape)

# Normalize using peak normalization
peak_normalized = normalize(mono_audio, method="peak")
print("Peak-normalized min/max:", peak_normalized.min(), peak_normalized.max())

# RMS normalize to -20 dB
rms_normalized = rms_normalize(mono_audio, target_db=-20.0)
print("RMS-normalized min/max:", rms_normalized.min(), rms_normalized.max())

4. `feature_extraction.py` - Feature Extraction

Extracts key audio features useful for machine learning tasks, such as MFCC, Chroma, Tonnetz, Zero-Crossing Rate, and Spectral Contrast.

Functions

extract_mfcc: Computes Mel-Frequency Cepstral Coefficients (MFCCs).
extract_chroma: Extracts chroma features.
extract_tonnetz: Extracts tonal centroid features.
extract_zero_crossing_rate: Computes the zero-crossing rate.
extract_spectral_contrast: Computes spectral contrast, capturing peak-valley differences in energy across frequency bands.

Example

from sappl.feature_extraction import extract_mfcc, extract_chroma, extract_tonnetz, extract_zero_crossing_rate, extract_spectral_contrast

# Extract MFCCs
mfcc = extract_mfcc(audio_data, sample_rate=16000)
print("MFCC shape:", mfcc.shape)

# Extract Chroma features
chroma = extract_chroma(audio_data, sample_rate=16000)
print("Chroma shape:", chroma.shape)

# Extract Tonnetz features
tonnetz = extract_tonnetz(audio_data, sample_rate=16000)
print("Tonnetz shape:", tonnetz.shape)

5. `processor.py` - Audio Processor Class

The AudioProcessor class provides centralized access to all core functions in SAPPL, acting as an adaptable interface for loading, saving, transforming, and extracting features from audio files. It simplifies audio processing by allowing users to set processing parameters once at initialization (e.g., n_fft, hop_length, n_mels), reducing the need to pass them repeatedly for each function call.

Example

from sappl.processor import AudioProcessor

# Initialize AudioProcessor with default parameters, setting them once
processor = AudioProcessor(sample_rate=16000, max_length=5.0, n_fft=1024, hop_length=256, n_mels=40)

# Load and preprocess audio
audio = processor.load_audio("path/to/file.wav")
audio_mono = processor.convert_to_mono(audio)
normalized_audio = processor.normalize(audio_mono, method="peak")

# Extract MFCC features
mfcc_features = processor.extract_mfcc(normalized_audio)
print("MFCC features shape:", mfcc_features.shape)

# Compute STFT and Mel Spectrogram
stft_matrix = processor.stft(normalized_audio)
mel_spectrogram = processor.compute_mel_spectrogram(normalized_audio)
print("Mel Spectrogram shape:", mel_spectrogram.shape)

# Reconstruct waveform from magnitude and phase
magnitude, phase = processor.magphase(stft_matrix)
reconstructed_audio = processor.reconstruct_waveform(magnitude, phase)
print("Reconstructed audio shape:", reconstructed_audio.shape)

🚀 Current Development

SAPPL is continuously expanding to include:

Augmentation: Adding pitch shifting, time-stretching, and other augmentation utilities.
Filtering: Implementing audio filters like band-pass and noise reduction for robust preprocessing.

📄 License

MIT License

SAPPL is built to streamline audio preprocessing for deep learning, with flexibility, consistency, and ease of use at its core. Happy coding! 🎉

❓ FAQ

1. Is SAPPL just a wrapper around librosa?

No, while SAPPL leverages librosa for many underlying functions, it provides additional features, modularity, and flexibility designed specifically for deep learning applications. SAPPL centralizes preprocessing functions, supports both NumPy and PyTorch arrays, enforces consistent (T, F) output for deep learning, and includes custom utilities such as padding, truncation, and normalization. SAPPL is more than a wrapper; it's a specialized framework for streamlined and robust audio processing.

2. Does SAPPL support PyTorch Tensors natively?

Yes, SAPPL is designed to handle both NumPy arrays and PyTorch tensors. When PyTorch tensors are passed, functions automatically handle them by converting to NumPy arrays internally, allowing compatibility without manual conversions. This flexibility allows SAPPL to fit seamlessly into PyTorch-based workflows.

3. Can I use custom functions with the AudioProcessor class?

Absolutely! The AudioProcessor class includes an add_custom_function method that allows you to add any callable function dynamically. This feature lets you expand the functionality of SAPPL by adding new processing steps or custom feature extractors without modifying the core library.

4. How can I contribute to SAPPL?

Contributions are welcome, especially as SAPPL continues to grow. To contribute, please fork the repository, create a new branch for your feature or bug fix, and submit a pull request. For substantial changes, feel free to open an issue first to discuss it. Please ensure your code is well-documented and tested.

5. What features are planned for future versions of SAPPL?

SAPPL aims to include:

Data Augmentation: Options for pitch shifting, time-stretching, and adding noise to support robust training.
Filtering Options: Basic filtering utilities such as band-pass filters and noise reduction techniques.
More Feature Extraction: Additional features commonly used in speech and audio analysis.

Stay tuned and check our GitHub repository for updates on new releases!

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Sound/Audio :: Analysis

Release history Release notifications | RSS feed

This version

0.1.3

Nov 10, 2024

0.1.2

Nov 9, 2024

0.1.1

Oct 29, 2024

0.1.0

Oct 29, 2024

0.0.1

Oct 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sappl-0.1.3.tar.gz (17.2 kB view details)

Uploaded Nov 10, 2024 Source

Built Distribution

sappl-0.1.3-py3-none-any.whl (15.5 kB view details)

Uploaded Nov 10, 2024 Python 3

File details

Details for the file sappl-0.1.3.tar.gz.

File metadata

Download URL: sappl-0.1.3.tar.gz
Upload date: Nov 10, 2024
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for sappl-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b5b4b3ae326e79885e8899ef2df173a92ed3e89fc07b937fdd09595ca677d95b`
MD5	`f835db4fa43c6d6481417b58d1dcec50`
BLAKE2b-256	`51b7013ae7575d1e40ee037aa72395f6482b08dd713c15e9004e792466ee04f8`

See more details on using hashes here.

File details

Details for the file sappl-0.1.3-py3-none-any.whl.

File metadata

Download URL: sappl-0.1.3-py3-none-any.whl
Upload date: Nov 10, 2024
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for sappl-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f48378069e1e7287eb7c46abbbffb8d5e117f7ba15882d95f9122cbbc8e4b9f`
MD5	`66d265a0f3d92d4a108d81090bbba89a`
BLAKE2b-256	`1111149ef1b81a23cfcf4e37eeca7205f315721beb4bb051b8a62d4fdeec5728`

See more details on using hashes here.

sappl 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎶 SAPPL: Simple Audio Pre-Processing Library

🔖 Table of Contents

Installation

📦 Implemented Modules

1. io.py - Audio I/O Operations

Functions

Example

2. transform.py - Time-Frequency Transformations

Functions

Example

3. utils.py - Utility Functions

Functions

Example

4. feature_extraction.py - Feature Extraction

Functions

Example

5. processor.py - Audio Processor Class

Example

🚀 Current Development

📄 License

❓ FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `io.py` - Audio I/O Operations

2. `transform.py` - Time-Frequency Transformations

3. `utils.py` - Utility Functions

4. `feature_extraction.py` - Feature Extraction

5. `processor.py` - Audio Processor Class