Skip to main content

Ultra-fast, CPU-only classical audio feature extraction (BPM, Key, Loudness, Energy, Duration)

Project description

@ohnrshyp/dsp

Ultra-fast, CPU-only classical audio feature extraction library.

This module provides high-speed, lightweight audio signal processing utilities. It runs efficiently on standard CPU hardware without demanding GPU execution or deep learning models, making it ideal for high-throughput batch upload pipelines.


🚀 Key Features

  • ⏱️ BPM & Tempo Tracking: Identifies the audio tempo (beats per minute) alongside a temporal stability-based confidence score.
  • 🎹 Musical Key & Scale Detection: Recognizes the global pitch center and scale mode (Major/Minor) using the Krumhansl-Schmuckler chromagram correlation algorithm.
  • 🔊 RMS Energy & Loudness: Computes integrated root-mean-square (RMS) energy and estimated loudness levels in decibels (dB).
  • 📊 Dynamic Range: Calculates the dB spread between the 95th and 10th percentiles of frame-wise RMS energy.
  • 💃 Danceability Estimator: Computes a composite danceability index based on tempo stability and acoustic energy distribution.

🧬 Architectural & Mathematical Design

The core processing pipeline bridges a Node.js child-process wrapper and a high-performance Python DSP engine utilizing Librosa and NumPy.

1. Tempo (BPM) & Confidence

The BPM estimation pipeline computes the onset strength envelope of the input signal: $$\text{Onset Strength}(t) = \sum_{f} \max(0, S(f, t) - S(f, t-1))$$ where $S(f, t)$ is the log-mel spectrogram. It then runs a Fourier-based tempogram or auto-correlation: $$\text{Tempogram}(\tau, t) = \sum_{n} \text{Onset Strength}(n) \cdot W(n - t) \cdot e^{-j 2\pi \tau n}$$ The dominant peak yields the primary BPM. The confidence score measures the dominance of the selected tempo frequency relative to the average spectral energy across the tempogram.

2. Krumhansl-Schmuckler Key Detection

A Constant-Q Transform (CQT) translates the audio signal into a 12-bin chromagram (pitch class profile). The average energy across time forms a 12-dimensional chroma vector $C$. This vector is normalized and rotated $12$ times (for each semitone shift) to correlate against major and minor key templates defined by Krumhansl & Schmuckler:

  • Major Profile: [6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]
  • Minor Profile: [6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]

The Pearson correlation coefficient $r$ is computed for each pitch class rotation $i$: $$r_i = \frac{\sum (C_i - \bar{C_i})(T - \bar{T})}{\sqrt{\sum (C_i - \bar{C_i})^2 \sum (T - \bar{T})^2}}$$ The profile rotation that maximizes $r_i$ defines the musical key and scale mode. The confidence represents the correlation normalized to a $[0, 1]$ range.


📦 Installation

Node.js (NPM Package)

npm install @ohnrshyp/dsp

Python (PyPI Package)

pip install orbit-dsp

Host Dependencies

This package delegates core DSP tasks to a local Python executable. Ensure Python 3.8+ is installed on the host along with the required libraries:

pip install librosa numpy scipy

🛠️ Node.js API Reference

analyze(input, [options])

Performs comprehensive acoustic analysis of the input audio source.

  • Parameters:

    • input (Buffer | string): Raw binary buffer or absolute path to the target audio file.
    • options (Object, optional):
      • maxLength (number): Limit processing to the first $N$ seconds of the file. Default is 120.
      • stemsDir (string | null): Optional path to a directory containing separated stems (e.g. Demucs vocal/bass/other stems) to significantly improve pitch key resolution.
      • verbose (boolean): Enable diagnostic logging. Default is false.
  • Returns: Promise<Object> containing the following schema:

    {
      "bpm": {
        "value": 128.0,
        "confidence": 0.8415
      },
      "key": {
        "value": "A minor",
        "key": "A",
        "mode": "minor",
        "confidence": 0.7912
      },
      "energy": 0.6843,
      "loudness_db": -14.21,
      "dynamic_range_db": 11.45,
      "duration": 240.5,
      "sample_rate": 22050,
      "analyzed_length": 120.0,
      "key_detection_source": "mix_hpss",
      "processingTimeMs": 420
    }
    

calculateDanceability(analysisResult)

Estimates danceability from the extracted BPM and energy metrics using a normalized sigmoid correlation.

  • Parameters:
    • analysisResult (Object): The JSON output returned from analyze().
  • Returns: number between 0 and 1.

checkPythonEnvironment()

Verifies that the Python binary and dependencies (librosa, numpy) are available and operational.

  • Returns: Promise<Object>:
    {
      "available": true,
      "message": "Python environment ready for audio DSP analysis",
      "details": {
        "pythonVersion": "Python 3.10.8",
        "packages": ["librosa", "numpy"]
      }
    }
    

💻 Code Examples

Analyzing raw audio buffer and determining danceability

const dsp = require('@ohnrshyp/dsp');
const fs = require('fs');

async function run() {
  // Check host environment first
  const env = await dsp.checkPythonEnvironment();
  if (!env.available) {
    console.error('Environment check failed:', env.message);
    console.log('Please run:', env.details?.install);
    return;
  }

  const audioBuffer = fs.readFileSync('house-track.mp3');
  
  try {
    const analysis = await dsp.analyze(audioBuffer, {
      maxLength: 90, // process only first 90s for ultra-fast response
      verbose: true
    });

    const danceScore = dsp.calculateDanceability(analysis);

    console.log(`BPM: ${analysis.bpm.value} (Confidence: ${analysis.bpm.confidence})`);
    console.log(`Key: ${analysis.key.value}`);
    console.log(`Loudness: ${analysis.loudness_db} dB`);
    console.log(`Estimated Danceability: ${(danceScore * 100).toFixed(1)}%`);
  } catch (error) {
    console.error('Analysis failed:', error.message);
  }
}

run();

📄 License

Licensed under the Apache License, Version 2.0 (the "License"). See LICENSE in the project root for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbit_dsp-1.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orbit_dsp-1.0.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file orbit_dsp-1.0.1.tar.gz.

File metadata

  • Download URL: orbit_dsp-1.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for orbit_dsp-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8040e2b493c8051eadc1c7eb23c36eec12330b0efa42ad6250aadc2eb12dd64a
MD5 b9b69c19f627e485feca43941c16fc1d
BLAKE2b-256 d990e1354942f1ec6de5b92da6ab2e49970aff124dbfe1c07959e6ef0a120456

See more details on using hashes here.

File details

Details for the file orbit_dsp-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: orbit_dsp-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for orbit_dsp-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6786d78b8f3880d3ecb1e9d0ad4c07b705c0b78729c78fe82f7e36a5e9d85f3d
MD5 0a29b40f4cbacdd21e858f521c97031e
BLAKE2b-256 7f84deba2c878557b01b419f6617bee4e7f888dfa44e480c246c7ec8bd46b8d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page