Skip to main content

Kapre: Keras Audio Preprocessors. Tensorflow.Keras layers for audio pre-processing in deep learning

Project description

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.9+ with TensorFlow 2.16-2.20, with type hints for better development experience

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

Development

Kapre includes comprehensive type hints for better IDE support and development experience.

Type Checking

Run type checking with our included script:

python scripts/check_types.py

Or use your preferred type checker:

# With mypy
pip install mypy
mypy kapre/

# With pyright
pip install pyright
pyright kapre/

Development Setup

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run type checking
python scripts/check_types.py

# Format code
black kapre/ tests/

# Lint code
flake8 kapre/ tests/

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
model.add(Input(shape=input_shape))
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2048, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last'))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 44100, 6)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

TFLite compatibility

The STFT layer is not TFLite compatible (due to tf.signal.stft). To create a TFLite compatible model, first train using the normal kapre layers then create a new model replacing STFT and Magnitude with STFTTflite, MagnitudeTflite. TFLite compatible layers are restricted to a batch size of 1 which prevents use of them during training.

# assumes you have run the one-shot example above.
from kapre import STFTTflite, MagnitudeTflite
model_tflite = Sequential()
model_tflite.add(Input(shape=input_shape))

model_tflite.add(STFTTflite(n_fft=2048, win_length=2048, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last'))
model_tflite.add(MagnitudeTflite())
model_tflite.add(MagnitudeToDecibel())  
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))
model_tflite.add(BatchNormalization())
model_tflite.add(ReLU())
model_tflite.add(GlobalAveragePooling2D())
model_tflite.add(Dense(10))
model_tflite.add(Softmax())

# load the trained weights into the tflite compatible model.
model_tflite.set_weights(model.get_weights())

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kapre-0.4.1.tar.gz (329.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kapre-0.4.1-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file kapre-0.4.1.tar.gz.

File metadata

  • Download URL: kapre-0.4.1.tar.gz
  • Upload date:
  • Size: 329.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.10

File hashes

Hashes for kapre-0.4.1.tar.gz
Algorithm Hash digest
SHA256 8f9e5e02dd89f2184cc2cb94c9ad9e21eefcb077fa1dc7818c035fa567238b7f
MD5 c261a22dea391698d8b9d78eaf48a4f0
BLAKE2b-256 a4f0f41ff9169b11e834b7e31b9f69dfb9189894b8cc2af02870b0090a05213b

See more details on using hashes here.

File details

Details for the file kapre-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: kapre-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.10

File hashes

Hashes for kapre-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb38f31025b533998fcc6747e21aa43a54706ccc2e7ef6221315de9351c33fe4
MD5 bd873fab519c2257a860676ddd71bee5
BLAKE2b-256 e3b83f1101b461bf6dd3c23063d21f0988b0ab6396d50b67647c096c803d68fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page