Sub-audible STFT-domain audio perturbation research console.
Project description
umbr
Sub-audible, STFT-domain audio perturbation research console.
umbr (from umbra, the fully shadowed core of a shadow) takes an input track
and produces a transformed version that is intended to sound perceptually
identical to a human listener, while its machine-readable representation is
nudged away from the original. Every claim the tool makes about transparency,
representation divergence, and processing robustness is measured and written
to an audit report - nothing is asserted or simulated.
This is a research instrument for studying the tension between inaudibility and survivability under lossy processing. It does not encode messages or payloads: the perturbation is a deterministic, non-semantic sign lattice with no recoverable content.
Status: alpha / research prototype. Expect the honest finding that a perturbation quiet enough to be inaudible is largely erased by ordinary lossy compression. The point of the tool is to quantify that, not to hide it.
Installation
pip install umbr
umbr requires Python 3.12+ and depends on NumPy (>= 2.0).
It also shells out to FFmpeg for decoding, codec round-trips, and duration
probing, so ffmpeg and ffprobe must be on your PATH
(https://ffmpeg.org/download.html). The optional fingerprint cross-check uses
the Chromaprint fpcalc binary if present
(https://acoustid.org/chromaprint); when it is missing the stage degrades
gracefully and says so in the audit.
Usage
The source track is required; everything else is derived or optional:
# Output derives to <source-stem>.umbr.wav next to the source.
umbr song.flac
# Explicit output, a research-strength preset, first 60s only.
umbr song.flac -o out/song.umbr.wav --strength research --limit-seconds 60
Key options (umbr --help for the full list):
| Option | Default | Meaning |
|---|---|---|
source |
(required) | Input audio, any format FFmpeg can decode. |
-o, --output |
<stem>.umbr.wav |
Rendered WAV. |
--delta-output |
<output-stem>.delta_x60dB.wav |
Residual amplified +60 dB for spectrogram inspection. |
--artifacts |
artifacts |
Directory for the CSV/JSON/Markdown audit. |
--strength |
medium |
conservative / medium / research (quieter to louder). |
--n-fft |
2048 |
STFT window size (power of two, 512-8192). |
--hop |
512 |
STFT hop size. |
--sample-rate |
44100 |
Internal working rate (16000-96000). |
--limit-seconds |
0 (full) |
Process only the first N seconds. |
How it works
The pipeline mirrors the spec stages, all in the short-time Fourier domain.
- Ingestion / normalization - FFmpeg decodes the source to 16-bit PCM at the working sample rate.
- Spectrogram analysis - a Hann-windowed Short-Time Fourier Transform with overlap-add resynthesis (the Constant-OverLap-Add / COLA condition keeps reconstruction transparent).
- Psychoacoustic masking map - a conservative simultaneous-masking proxy combining a near-masker threshold, an absolute-threshold-of-hearing penalty, per-frame spectral flux and spectral flatness, and a loudness estimate. Only dense, masked time-frequency regions are eligible.
- Candidate perturbation - a deterministic +/- lattice (NumPy PCG64 generator), phase-shifted a quarter turn from the host and scaled under the masking budget.
- Surrogate evaluation - a small, explicitly labelled band-energy proxy for an audio-analysis system. Divergence against it is suggestive, not conclusive (see Caveats).
- Robustness refinement - the rendered probe is round-tripped through real lossy codecs (MP3 / LAME and AAC) via FFmpeg and re-measured, so robustness is observed rather than modelled.
- Perceptual quality scoring - residual RMS / peak in dBFS, a log-spectral distance proxy, and a transparency gate that the refinement loop must satisfy.
- Export - the transformed WAV plus a +60 dB amplified residual WAV for spectrogram inspection.
- Human-listening verification - the audit lists exactly which regions were modified and why they were judged psychoacoustically safe, for blind ABX listening.
Documented constants
All tunables live as documented module-level constants near the top of
umbr.py. The most important:
| Constant | Value | Reference |
|---|---|---|
STRENGTH presets (dB) |
-62 / -56 / -50 |
dB relative to local host magnitude; see dBFS. |
TRANSPARENCY_GATE_DBFS |
-85.0 |
Residual RMS ceiling the refinement loop must meet. |
NEAR_MASKER_DB |
-38.0 |
Masking threshold below the per-frame peak (auditory masking). |
PERTURB_BAND_LOW/HIGH_HZ |
120 / 15500 |
Eligible band; avoids fragile sub-bass and codec-stripped air. |
ATH_PENALTY_* |
0.40 / 0.70 / 0.55 |
Absolute threshold of hearing weighting. |
PERTURB_PHASE_OFFSET |
pi / 2 |
Quadrature offset from the host phase. |
SURROGATE_BAND_EDGES_HZ |
80 .. 15500 |
Log-band edges of the surrogate embedding. |
ROBUSTNESS_CODECS |
MP3 128k, AAC 128k | Codecs exercised by the robustness round-trip. |
Strength and CoverType are enum.Enum types rather than bare strings, so the
CLI choices, the dB levels, and the audit phrasing stay in sync.
Audit output
Written to the --artifacts directory:
umbr_audit.md- human-readable report: transparency gates, surrogate readout, codec robustness, and the top modified regions with their psychoacoustic rationale.umbr_metrics.json- the fullMetricsrecord.umbr_regions.csv- every modified STFT frame.umbr_spectrogram_bins.csv- the loudest modified bins for spectrogram overlay.
Caveats
- The internal surrogate is self-defined; divergence against it does not prove divergence against a real retrieval/fingerprinting system. Install Chromaprint to enable the independent cross-check.
- Sub-audible energy is, by construction, the first thing lossy codecs discard, so low surviving divergence is the expected result and is reported honestly.
- Exposed vocals, fades, and sparse instruments overrule every numeric score and must be confirmed by blind listening.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file umbr-0.1.0.tar.gz.
File metadata
- Download URL: umbr-0.1.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
961c96f8456b4cdcc170adfcbf7de1d5996157968c26543abbc2c8fb9c0f141d
|
|
| MD5 |
4e489fe8ce4642b0522cacc882fa1935
|
|
| BLAKE2b-256 |
9a9e0cc2d1f0dbfd162387db9a3f4af32213a9db81eb5edba303e57ec9a5133b
|
File details
Details for the file umbr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: umbr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
803c65379e3e8703d8a9576cc7e16601b5801af6f1788e57702c7a068c7a0b9b
|
|
| MD5 |
d0ba7be2e9bab015864ce6f6c9bbe556
|
|
| BLAKE2b-256 |
d55c3b9b78dc636cae868ee302d63c3f4d13d366b01f6095ad7d03b9ce876fc4
|