A fast, accurate Tempo Predictor
Project description
DeepRhythm: High-Speed Tempo Prediction
DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).
Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.
Classification Process
- Split input audio into 8 second clips
[len_batch, len_audio] - Compute the HCQM of each clip
- Compute STFT
[len_batch, stft_bands, len_audio/hop] - Sum STFT bins into 8 log-spaced bands using filter matrix
[len_batch, 8, len_audio/hop] - Flatten bands for parallel CQT processing
[len_batch*8, len_audio/hop] - For each of the six harmonics, compute the CQT
[6, len_batch*8, num_cqt_bins] - Reshape
[len_batch, num_cqt_bins, 8, 6]
- Compute STFT
- Feed HCQM through CNN
[len_batch, num_classes (256)] - Softmax the outputs to get probabilities
- Choose the class with the highest probability and convert to bpm (bpms =
[len_batch])
Benchmarks
| Method | Acc1 (%) | Acc2 (%) | Avg. Time (s) | Total Time (s) |
|---|---|---|---|---|
| DeepRhythm (cuda) | 95.91 | 96.54 | 0.021 | 20.11 |
| DeepRhythm (cpu) | 95.91 | 96.54 | 0.12 | 115.02 |
| TempoCNN (cnn) | 84.78 | 97.69 | 1.21 | 1150.43 |
| TempoCNN (fcn) | 83.53 | 96.54 | 1.19 | 1131.51 |
| Essentia (multifeature) | 87.93 | 97.48 | 2.72 | 2595.64 |
| Essentia (percival) | 85.83 | 95.07 | 1.35 | 1289.62 |
| Essentia (degara) | 86.46 | 97.17 | 1.38 | 1310.69 |
| Librosa | 66.84 | 75.13 | 0.48 | 460.52 |
- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
- Acc1 = Prediction within +/- 2% of actual bpm
- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
- I could only get TempoCNN to run on cpu (it requires Cuda 10)
Installation
To install DeepRhythm, ensure you have Python and pip installed. Then run:
pip install deeprhythm
Usage
CLI Inference
Single
python -m deeprhythm.infer /path/to/song.wav -cq
> ([bpm], [confidence])
Flags:
-c,--conf- include confidence scores-d,--device [cuda/cpu/mps]- specify model device-q,--quiet- prints only bpm/conf
Batch
To predict the tempo of all songs in a directory, run
python -m deeprhythm.batch_infer /path/to/dir
This will create in a jsonl file mapping filepath to predicted BPM.
Flags:
-o output_path.jsonl- provide a custom output path (default 'batch_results.jsonl`)-c,--conf- include confidence scores-d,--device [cuda/cpu/mps]- specify model device-q,--quiet- doesn't print status / logs
Python Inference
To predict the tempo of a song:
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
tempo = model.predict('path/to/song.mp3')
# to include confidence
tempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
Audio is loaded with librosa, which supports most audio formats.
If you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:
import librosa
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
audio, sr = librosa.load('path/to/song.mp3')
# ... other steps for processing the audio ...
tempo = model.predict_from_audio(audio, sr)
# to include confidence
tempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
References
[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.
[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deeprhythm-0.0.13.tar.gz.
File metadata
- Download URL: deeprhythm-0.0.13.tar.gz
- Upload date:
- Size: 10.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8333b06c6dd3f440ddac5d43f3437812da2c384c545334f2edc3f0d0a883f2f0
|
|
| MD5 |
5346ef3883dc4975bd99fb3282710106
|
|
| BLAKE2b-256 |
5b43c75d945560a20430b24468dde151ed6ae89864cf10dd90f04b49af65017c
|
File details
Details for the file deeprhythm-0.0.13-py3-none-any.whl.
File metadata
- Download URL: deeprhythm-0.0.13-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e032b4676a2e46c3aff7eeb30cfd7d46b3292dfa15732c7113decbddef068c95
|
|
| MD5 |
a610a336aec72b0d1079e9d793ae0296
|
|
| BLAKE2b-256 |
3e2eb9bdc42cb086721f4d82730d25f3283caa64f005df3eab28be425bb03347
|