Skip to main content

A fast, accurate Tempo Predictor

Project description

DeepRhythm: High-Speed Tempo Prediction

Introduction

DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian)

HCQM

(reworded from “Deep-Rhythm for Global Tempo Estimation in Music”, by Foroughmand and Peeters [1].)

The Constant Q Transform (CQT) is a tool used to analyze sound frequencies over time. It breaks down the frequency spectrum into bins that are spaced logarithmically, meaning they're closer together at low frequencies and wider apart at high frequencies. This aligns with how we hear sounds, making it great for music analysis as it captures details of pitches and notes very precisely.

It is normally performed with a hop length around 10-25ms (the window size varies by frequency) and 80-120 bins (covering ~50-5kHz), which results in a solid melodic representation of the given audio.

With the HCQM (Harmonic Constant-Q Modulation), Foroughmand and Peeters creatively repurpose the CQT for rhythm detection. Instead of scanning a few milliseconds, they give it an 8-second window. Rather than the standard 81 bins covering 50 Hz to 1 kHz, it utilizes 256 bins tailored to span from 30 bpm to 286 bpm (approximately 0.5 Hz to 4.76 Hz). This adjustment results in a highly detailed, narrow, and low frequency window, which delineates how prevalent each potential bpm is within the track. For instance, in a song with a tempo of 120 bpm, this method would highlight spikes at 30, 60, 120 (predominantly), and 240 bpm. Each element of the song that recurs on this 8-second scale contributes to peaks in the CQT, e.g. a quarter-notea hi hat would look like a continuous 2 Hz (120 bpm) tone in the transformed data.

Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.

Benchmarks

Method Acc1 (%) Acc2 (%) Avg. Time (s) Total Time (s)
DeepRhythm (cuda) 95.91 96.54 0.021 20.11
DeepRhythm (cpu) 95.91 96.54 0.12 115.02
TempoCNN (cnn) 84.78 97.69 1.21 1150.43
TempoCNN (fcn) 83.53 96.54 1.19 1131.51
Essentia (multifeature) 87.93 97.48 2.72 2595.64
Essentia (percival) 85.83 95.07 1.35 1289.62
Essentia (degara) 86.46 97.17 1.38 1310.69
Librosa 66.84 75.13 0.48 460.52
  • Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
  • Acc1 = Prediction within +/- 2% of actual bpm
  • Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
  • Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
  • I could only get TempoCNN to run on cpu (it requires Cuda 10)

Installation

To install DeepRhythm, ensure you have Python and pip installed. Then run:

pip install deeprhythm

Usage

To predict the tempo of a song:

from deeprhythm import DeepRhythmPredictor

model = DeepRhythmPredictor()
tempo = model.predict('path/to/song.mp3')
print(f"Predicted Tempo: {tempo} BPM")

References

[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.

[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeprhythm-0.0.10.tar.gz (10.1 MB view details)

Uploaded Source

Built Distribution

deeprhythm-0.0.10-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file deeprhythm-0.0.10.tar.gz.

File metadata

  • Download URL: deeprhythm-0.0.10.tar.gz
  • Upload date:
  • Size: 10.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for deeprhythm-0.0.10.tar.gz
Algorithm Hash digest
SHA256 e90ed181e319a2e3ea6b11759127f701289456f0c587176c05f9390c0a4876e2
MD5 1aa6c15a3ae13b0902738ba6dab53dba
BLAKE2b-256 e527583e1635c3945948812f4c6eb7de4f6a258f2892d0628a0a823d9e542244

See more details on using hashes here.

File details

Details for the file deeprhythm-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: deeprhythm-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for deeprhythm-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 4c75e7d8adff7158ddcde6c7b533c170e6011dc0456b220b442ec2555c729dcb
MD5 5c301dddd469dc61fe5b827a8f60a066
BLAKE2b-256 e7c2a987e999171fafc70e8e4773e7cc40bcd843ddb1a214e5bbc91ba5b11eef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page