Skip to main content

tensorflow generation of SOX-style spectrograms on the GPU

Project description

sox_tensorflow

TensorFlow implementation of SoX-style spectrogram generation that uses TensorFlow operations for GPU acceleration.

LINKS

Our analysis shows

  • sox_tensorflow spectrograms are 99.81% exact-pixel-match on average relative to sox.
  • Every segment falls within ±2 pixel values
  • The small residual error is concentrated in the darkest pixels (0–10% brightness decile: ~99.3% accuracy) and vanishes almost entirely in brighter regions where the signals live
  • 100% agreement with top-5 ranks agreement when passed through PNW-Cnet v4 model
  • The model-output classes with the largest mean absolute difference are BUVI and PSFL (around 0.0004)

Pixel accuracy by brightness decile

For more details see the scripts and notebooks found in the comparison folder.


QUICK START

import soundfile as sf
import tensorflow as tf
from sox_tensorflow import spectrogram, spectrogram_from_flac

# From a numpy array
samples, sr = sf.read('audio.flac', dtype='float64', always_2d=True)
samples = samples[:, 0]  # mono
pixels = spectrogram(
    audio_array=tf.constant(samples, dtype=tf.float64),
    shape=(257, 1000),
    sample_rate=sr,
    dest='spectrogram.png'
)


# From a FLAC file directly
path = spectrogram_from_flac(
    flac_path='audio.flac',
    shape=(257, 1000),
    duration=12.0,
    segment=0,
    dest='spectrogram.png'
)

API

spectrogram(audio_array, shape, dest, ...)

Generates a SoX-matching spectrogram from an audio array or TensorFlow tensor.

Argument Type Description
audio_array tf.Tensor or np.ndarray Audio samples, float32/float64 in [-1, 1]
shape (int, int) Output shape as (height, width). Height determines frequency resolution: DFT size = 2 × (height − 1)
dest str or Path, optional Output PNG path. If None, returns a tf.Tensor
segment int, optional Segment index (0-based) to extract from the audio
segment_duration float, optional Duration of each segment in seconds
segment_overlap float, optional Overlap between segments in seconds
sample_rate int Sample rate of the input audio in Hz
output_sample_rate int Sample rate for spectrogram generation (default: 8000)
db_range int Dynamic range in dB (default: 90)

Returns a tf.Tensor of pixel values (uint8, shape (height, width)) if dest is None, otherwise the path to the saved PNG.

spectrogram_from_flac(flac_path, shape, dest, ...)

Convenience wrapper that loads a FLAC file and generates a spectrogram in one call. Accepts the same shape/segment/dest arguments as spectrogram(), plus:

Argument Type Description
flac_path str Path to FLAC file
start_time float, optional Start time in seconds
duration float Duration in seconds (default: 12)
channel int Channel to extract (default: 0)

load_audio(flac_path, start_time, segment, duration, channel)

Reads a FLAC file into a tf.Tensor. Returns (tensor, sample_rate).


NOTES

PNW-Cnet compatibility

When loading H5 models saved with older TensorFlow/Keras versions, set:

export TF_USE_LEGACY_KERAS=1

This forces TensorFlow 2.16+ to use the legacy Keras implementation, which maintains compatibility with older H5 model files.

SoX accuracy

The implementation replicates SoX's spectrogram algorithm exactly:

  • Hann window with SoX-specific normalization
  • FFT with SoX edge handling (partial windows at start/end of signal)
  • dB conversion and pixel rendering matching SoX's palette

Resampling uses soxr (the SoX Resampler library) at HQ quality, achieving 99.8%+ pixel match with the SoX binary.


STYLE-GUIDE

Following PEP8. See setup.cfg for exceptions. Keeping honest with pycodestyle .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sox_tensorflow-0.1.2.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sox_tensorflow-0.1.2-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file sox_tensorflow-0.1.2.tar.gz.

File metadata

  • Download URL: sox_tensorflow-0.1.2.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sox_tensorflow-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1275650744889fcbe51189d3a14f245da72dee0d26eb51eee23016374cec355f
MD5 0e2ef1678d651f7f3d75cb44ff076d45
BLAKE2b-256 f0f8dda21b8b75e9df5f3eeb1dcc0b683ed951d5ebbeb81ba849b7987f80c648

See more details on using hashes here.

File details

Details for the file sox_tensorflow-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sox_tensorflow-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sox_tensorflow-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ddd4e6f3bd5dd5e355435929c2fcee01f54436a47f54687038231e4e013862ec
MD5 dd8dd89c33e5d387705843d1bfaf1c98
BLAKE2b-256 f286379d7bd3d72bb788a0ab8194bc87e79319f18c3b2a8b78e9fef530c3e992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page