Skip to main content

Sub-audible STFT-domain audio perturbation research console.

Project description

umbr

Sub-audible, STFT-domain audio perturbation research console.

umbr (from umbra, the fully shadowed core of a shadow) takes an input track and produces a transformed version that is intended to sound perceptually identical to a human listener, while its machine-readable representation is nudged away from the original. Every claim the tool makes about transparency, representation divergence, and processing robustness is measured and written to an audit report - nothing is asserted or simulated.

This is a research instrument for studying the tension between inaudibility and survivability under lossy processing. It does not encode messages or payloads: the perturbation is a deterministic, non-semantic sign lattice with no recoverable content.

Status: alpha / research prototype. Expect the honest finding that a perturbation quiet enough to be inaudible is largely erased by ordinary lossy compression. The point of the tool is to quantify that, not to hide it.

Installation

pip install umbr

umbr requires Python 3.12+ and depends on NumPy (>= 2.0).

It also shells out to FFmpeg for decoding, codec round-trips, and duration probing, so ffmpeg and ffprobe must be on your PATH (https://ffmpeg.org/download.html). The optional fingerprint cross-check uses the Chromaprint fpcalc binary if present (https://acoustid.org/chromaprint); when it is missing the stage degrades gracefully and says so in the audit.

Usage

The source track is required; everything else is derived or optional:

# Output derives to <source-stem>.umbr.wav next to the source.
umbr song.flac

# Explicit output, a research-strength preset, first 60s only.
umbr song.flac -o out/song.umbr.wav --strength research --limit-seconds 60

Key options (umbr --help for the full list):

Option Default Meaning
source (required) Input audio, any format FFmpeg can decode.
-o, --output <stem>.umbr.wav Rendered WAV.
--delta-output <output-stem>.delta_x60dB.wav Residual amplified +60 dB for spectrogram inspection.
--artifacts artifacts Directory for the CSV/JSON/Markdown audit.
--strength medium conservative / medium / research (quieter to louder).
--n-fft 2048 STFT window size (power of two, 512-8192).
--hop 512 STFT hop size.
--sample-rate 44100 Internal working rate (16000-96000).
--limit-seconds 0 (full) Process only the first N seconds.

How it works

The pipeline mirrors the spec stages, all in the short-time Fourier domain.

  1. Ingestion / normalization - FFmpeg decodes the source to 16-bit PCM at the working sample rate.
  2. Spectrogram analysis - a Hann-windowed Short-Time Fourier Transform with overlap-add resynthesis (the Constant-OverLap-Add / COLA condition keeps reconstruction transparent).
  3. Psychoacoustic masking map - a conservative simultaneous-masking proxy combining a near-masker threshold, an absolute-threshold-of-hearing penalty, per-frame spectral flux and spectral flatness, and a loudness estimate. Only dense, masked time-frequency regions are eligible.
  4. Candidate perturbation - a deterministic +/- lattice (NumPy PCG64 generator), phase-shifted a quarter turn from the host and scaled under the masking budget.
  5. Surrogate evaluation - a small, explicitly labelled band-energy proxy for an audio-analysis system. Divergence against it is suggestive, not conclusive (see Caveats).
  6. Robustness refinement - the rendered probe is round-tripped through real lossy codecs (MP3 / LAME and AAC) via FFmpeg and re-measured, so robustness is observed rather than modelled.
  7. Perceptual quality scoring - residual RMS / peak in dBFS, a log-spectral distance proxy, and a transparency gate that the refinement loop must satisfy.
  8. Export - the transformed WAV plus a +60 dB amplified residual WAV for spectrogram inspection.
  9. Human-listening verification - the audit lists exactly which regions were modified and why they were judged psychoacoustically safe, for blind ABX listening.

Documented constants

All tunables live as documented module-level constants near the top of umbr.py. The most important:

Constant Value Reference
STRENGTH presets (dB) -62 / -56 / -50 dB relative to local host magnitude; see dBFS.
TRANSPARENCY_GATE_DBFS -85.0 Residual RMS ceiling the refinement loop must meet.
NEAR_MASKER_DB -38.0 Masking threshold below the per-frame peak (auditory masking).
PERTURB_BAND_LOW/HIGH_HZ 120 / 15500 Eligible band; avoids fragile sub-bass and codec-stripped air.
ATH_PENALTY_* 0.40 / 0.70 / 0.55 Absolute threshold of hearing weighting.
PERTURB_PHASE_OFFSET pi / 2 Quadrature offset from the host phase.
SURROGATE_BAND_EDGES_HZ 80 .. 15500 Log-band edges of the surrogate embedding.
ROBUSTNESS_CODECS MP3 128k, AAC 128k Codecs exercised by the robustness round-trip.

Strength and CoverType are enum.Enum types rather than bare strings, so the CLI choices, the dB levels, and the audit phrasing stay in sync.

Audit output

Written to the --artifacts directory:

  • umbr_audit.md - human-readable report: transparency gates, surrogate readout, codec robustness, and the top modified regions with their psychoacoustic rationale.
  • umbr_metrics.json - the full Metrics record.
  • umbr_regions.csv - every modified STFT frame.
  • umbr_spectrogram_bins.csv - the loudest modified bins for spectrogram overlay.

Caveats

  • The internal surrogate is self-defined; divergence against it does not prove divergence against a real retrieval/fingerprinting system. Install Chromaprint to enable the independent cross-check.
  • Sub-audible energy is, by construction, the first thing lossy codecs discard, so low surviving divergence is the expected result and is reported honestly.
  • Exposed vocals, fades, and sparse instruments overrule every numeric score and must be confirmed by blind listening.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umbr-0.2.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

umbr-0.2.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file umbr-0.2.0.tar.gz.

File metadata

  • Download URL: umbr-0.2.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for umbr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bab3b8cc4f81123d6d00f3e3c0173aba9524a60db043e61691264ef130b40523
MD5 c4b5e6fe446dccc07675eb558abc981c
BLAKE2b-256 cebc25d8b8314bcffcb1f4a37d837d6758f5d481dfcb4c572ee2dc143e853718

See more details on using hashes here.

File details

Details for the file umbr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: umbr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for umbr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c984d1a871e33f67ac44b9afcf817c139eccec7428ee4f8e5006e5a5919b4fcb
MD5 a274b5178aa9c116dc1f990bde24ca56
BLAKE2b-256 2b50c04f3c6ab2980d1c7a90a1834fc14e0925ada4182f2a8d36eb332a26fa6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page