Skip to main content

A fast, accurate Tempo Predictor

Project description

DeepRhythm: High-Speed Tempo Prediction

DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).

Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.

more details here

Classification Process

  1. Split input audio into 8 second clips [len_batch, len_audio]
  2. Compute the HCQM of each clip
    1. Compute STFT [len_batch, stft_bands, len_audio/hop]
    2. Sum STFT bins into 8 log-spaced bands using filter matrix [len_batch, 8, len_audio/hop]
    3. Flatten bands for parallel CQT processing [len_batch*8, len_audio/hop]
    4. For each of the six harmonics, compute the CQT [6, len_batch*8, num_cqt_bins]
    5. Reshape [len_batch, num_cqt_bins, 8, 6]
  3. Feed HCQM through CNN [len_batch, num_classes (256)]
  4. Softmax the outputs to get probabilities
  5. Choose the class with the highest probability and convert to bpm (bpms = [len_batch])

Benchmarks

Method Acc1 (%) Acc2 (%) Avg. Time (s) Total Time (s)
DeepRhythm (cuda) 95.91 96.54 0.021 20.11
DeepRhythm (cpu) 95.91 96.54 0.12 115.02
TempoCNN (cnn) 84.78 97.69 1.21 1150.43
TempoCNN (fcn) 83.53 96.54 1.19 1131.51
Essentia (multifeature) 87.93 97.48 2.72 2595.64
Essentia (percival) 85.83 95.07 1.35 1289.62
Essentia (degara) 86.46 97.17 1.38 1310.69
Librosa 66.84 75.13 0.48 460.52
  • Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
  • Acc1 = Prediction within +/- 2% of actual bpm
  • Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
  • Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
  • I could only get TempoCNN to run on cpu (it requires Cuda 10)

Installation

To install DeepRhythm, ensure you have Python and pip installed. Then run:

pip install deeprhythm

Usage

CLI Inference

Single

python -m deeprhythm.infer /path/to/song.wav -cq
> ([bpm], [confidence])

Flags:

  • -c, --conf - include confidence scores
  • -d, --device [cuda/cpu/mps] - specify model device
  • -q, --quiet - prints only bpm/conf

Batch

To predict the tempo of all songs in a directory, run

python -m deeprhythm.batch_infer /path/to/dir

This will create in a jsonl file mapping filepath to predicted BPM.

Flags:

  • -o output_path.jsonl - provide a custom output path (default 'batch_results.jsonl`)
  • -c, --conf - include confidence scores
  • -d, --device [cuda/cpu/mps] - specify model device
  • -q, --quiet - doesn't print status / logs

Python Inference

To predict the tempo of a song:

from deeprhythm import DeepRhythmPredictor

model = DeepRhythmPredictor()

tempo = model.predict('path/to/song.mp3')

# to include confidence
tempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)

print(f"Predicted Tempo: {tempo} BPM")

Audio is loaded with librosa, which supports most audio formats.

If you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:

import librosa
from deeprhythm import DeepRhythmPredictor

model = DeepRhythmPredictor()

audio, sr = librosa.load('path/to/song.mp3')

# ... other steps for processing the audio ...

tempo = model.predict_from_audio(audio, sr)

# to include confidence
tempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)

print(f"Predicted Tempo: {tempo} BPM")

References

[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.

[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeprhythm-0.0.13.tar.gz (10.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeprhythm-0.0.13-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file deeprhythm-0.0.13.tar.gz.

File metadata

  • Download URL: deeprhythm-0.0.13.tar.gz
  • Upload date:
  • Size: 10.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for deeprhythm-0.0.13.tar.gz
Algorithm Hash digest
SHA256 8333b06c6dd3f440ddac5d43f3437812da2c384c545334f2edc3f0d0a883f2f0
MD5 5346ef3883dc4975bd99fb3282710106
BLAKE2b-256 5b43c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c

See more details on using hashes here.

File details

Details for the file deeprhythm-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: deeprhythm-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for deeprhythm-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 e032b4676a2e46c3aff7eeb30cfd7d46b3292dfa15732c7113decbddef068c95
MD5 a610a336aec72b0d1079e9d793ae0296
BLAKE2b-256 3e2eb9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page