Strip disfluencies (um, uh, er, ah, hmm) from spoken audio.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

erm

Local CLI that strips disfluencies (um, uh, er, erm, ah, hmm, mhm, mm, uh-huh, plus any-length elongations like ummmm / uhhhhh) from recordings of English speech.

It uses faster-whisper (running the medium.en Whisper model by default — override with --model) for word-level timestamps, three audio-domain detectors that catch fillers Whisper hides, and ffmpeg for the cuts. Each splice is snapped to a local energy minimum and zero-crossing, optionally crossfaded with a length that scales with the cut size, and laid over a constant looped sample of the recording's own room tone so the noise floor stays uniform across edits.

Install

Requires Python 3.11+ and ffmpeg / ffprobe on PATH.

python3.13 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'

Transcription device (GPU vs CPU)

Transcription runs on CPU by default and needs no extra setup. If you have an NVIDIA GPU, faster-whisper can use it — but only when the CUDA runtime libraries (libcublas, libcudnn) are installed. A machine with an NVIDIA GPU and driver but no CUDA runtime is the common case that produces:

RuntimeError: Library libcublas.so.12 is not found or cannot be loaded

erm handles this automatically: with the default --device auto, if the GPU can't be loaded it prints a warning and falls back to CPU, so transcription still completes. You have two ways to make it explicit:

Force CPU (no warning, skips the GPU probe): erm input.wav --device cpu
Enable the GPU by installing the CUDA wheels into the same environment:
```
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
```
faster-whisper's CUDA backend needs CUDA 12 / cuDNN 9. See the faster-whisper GPU notes for details.

Usage

# Remove fillers; output and cut-list paths are auto-generated next to the input.
erm input.wav

# Specify output explicitly.
erm input.wav -o cleaned.wav

# Inspect what would be cut without rendering.
erm input.wav --dry-run

# Validate a rendered output against its source.
erm validate input.wav cleaned.wav --cuts cuts.json

When -o / --json are omitted, output paths are written next to the input as {stem}-cleaned-{YYYYMMDD-HHMMSS}.wav and {stem}-cuts-{YYYYMMDD-HHMMSS}.json.

How it works

Transcribe. faster-whisper runs with word_timestamps=True and a verbatim-bias initial_prompt so it emits filler tokens instead of silently cleaning them up.
Detect. Four passes produce candidate cut ranges:
- Word-list match — words whose normalized text is in --fillers, including arbitrary-length elongations (e.g. ummmm matches the um stem).
- Gap fillers — voiced regions in inter-word gaps longer than --gap-min-ms. Catches fillers Whisper drops entirely.
- Intra-word fillers — long words whose interior splits across a silence dip into multiple voiced runs. The non-vowel run whose duration best matches the word's expected duration is treated as the real word; siblings become cuts. Catches "in, uhhhhh" that Whisper rolls into one 'in' token.
- Overlong words — words much longer than expected_max_word_duration for their text. The trailing portion is scanned for voiced runs. Optionally pitch-confirmed (--confirm-pitch) by checking the cut region looks like a sustained filler vowel (stable spectral centroid, voiced ZCR), so we don't trim slow-but-real speech.
Refine. Each cut endpoint snaps to a local RMS-energy minimum within ±--search-ms, then to the nearest zero-crossing. Refinement is clamped so it never crosses a neighboring word's timestamp.
Merge. Cuts whose surviving fragment would be shorter than --merge-gap-ms are collapsed into one — a 40ms surviving fragment between two cuts gets eaten by the surrounding crossfades and would otherwise blurp.
Render. ffmpeg atrim + acrossfade renders the kept segments. Each splice's crossfade length scales with that splice's cut size: clamp(min, cut_ms * factor, max). Crossfades are also clamped so they never reach back across a real word boundary.
Room tone (optional, on by default). A quiet region of the original recording is sampled and looped under the output at --room-tone-level-db. This keeps the noise floor identical everywhere, masking the residual noise-floor mismatch at each splice.

Denoising

--denoise picks how ffmpeg's afftdn denoiser is used:

Mode	Detection sees	ffmpeg cuts from	Notes
`none`	original	original	No denoising.
`pre`	denoised	denoised	Cleanest splices, but detection less sensitive (denoising flattens energy/pitch signals).
`post`	original	original; output denoised at end	Full detection sensitivity; splice noise-floor mismatch smoothed afterward.
`hybrid` (default)	original	denoised	Full detection sensitivity and clean splices. Recommended.

Tune with --denoise-nr (reduction strength dB) and --denoise-nf (noise floor dB).

Flags

Detection

Flag	Default	Notes
`--model`	`medium.en`	Any faster-whisper model. `small.en` faster; `large-v3` more accurate.
`--device`	`auto`	`auto` / `cpu` / `cuda`. `auto` uses the GPU when available and falls back to CPU if the CUDA runtime can't be loaded (see Transcription device).
`--compute-type`	`auto`	faster-whisper compute type (e.g. `int8`, `float16`). `auto` lets the backend choose.
`--fillers`	`ah,er,erm,hmm,mhm,mm,uh,uh-huh,um`	Comma-separated stems. Elongations matched dynamically.
`--detect-gaps` / `--no-detect-gaps`	on	Run gap + intra-word + overlong detectors.
`--gap-min-ms`	`350`	Minimum inter-word gap to scan for fillers.
`--gap-min-voiced-ms` / `--gap-max-voiced-ms`	`100` / `1500`	Voiced-run length bounds.
`--intraword-min-ms`	`550`	Minimum word length to scan internally.
`--confirm-pitch` / `--no-confirm-pitch`	on	Drop overlong/intra candidates that don't look like sustained filler vowels.

Cuts and splices

Flag	Default	Notes
`--search-ms`	`60`	How far each endpoint may slide to find a local energy minimum.
`--crossfade-ms`	(unset)	Force a fixed crossfade length for every splice. When unset, per-splice scaling is used.
`--min-crossfade-ms` / `--max-crossfade-ms`	`50` / `120`	Floor and ceiling for the per-splice crossfade scaling.
`--crossfade-factor`	`0.15`	`cut_ms * factor`, clamped to `[min, max]`. Higher = smoother but blurrier.
`--merge-gap-ms`	`120`	Merge two cuts whose surviving fragment would be shorter than this.

Audio cleanup

Flag	Default	Notes
`--denoise`	`hybrid`	`none` / `pre` / `post` / `hybrid` (see table above).
`--denoise-nr`	`12.0`	`afftdn` noise reduction (dB).
`--denoise-nf`	`-25.0`	`afftdn` noise floor (dB).
`--room-tone` / `--no-room-tone`	on	Loop a quiet sample of the original under the output.
`--room-tone-level-db`	`-12.0`	Attenuation applied to the looped tone. `-12` to `-20` is usually right.
`--room-tone-source`	`auto`	`auto` finds a quiet region; otherwise `START-END` in seconds (e.g. `0.05-1.4`).

Output

Flag	Default	Notes
`-o`, `--output`	auto-named next to input	Output `.wav` path.
`--json PATH`	auto-named next to input	Cut list JSON.
`--dry-run`	off	Print the cut list and exit; no audio rendered.

`validate` subcommand

erm validate input.wav cleaned.wav --cuts cuts.json

Runs three deterministic checks:

Container sanity — ffprobe reads the output without errors.
Duration math — output_duration ≈ input_duration - sum(cut lengths), within 50ms.
No-filler invariant — re-transcribe the output; assert no token in the filler set survives.

Writes a JSON report to --report PATH (or auto-named next to the output) and exits non-zero if any check fails.

Tests

pytest

The pure helpers (find_fillers, invert_to_keep_ranges, refine_boundaries, merge_close_cuts, expected_max_word_duration, _voiced_runs_in_region, …) run without faster-whisper or librosa imported. Heavy deps are imported lazily inside transcribe, render, load_audio_mono, and is_sustained_vowel.

The suite is split into:

test_pure.py — pure logic, no heavy imports: filler matching, range inversion, boundary refinement, close-cut merging (merge_close_cuts), the per-word duration bound (expected_max_word_duration), room-tone region selection (find_quiet_region), and the per-splice crossfade clamp (_splice_crossfade_s).
test_asr_fallback.py — the CUDA → CPU fallback in transcribe, with faster-whisper mocked.
test_cli.py — argument parsing, defaults, and main() subcommand routing (remove / validate / bare-input). The pipeline handlers are monkeypatched, so nothing heavy runs.
test_integration.py — a golden-path --dry-run over a synthesized WAV with a stubbed transcriber, wiring transcription → filler detection → refinement → range inversion → JSON. Gated on librosa (the audio loader); skipped automatically if it isn't installed.

Out of scope

Removing like, you know, I mean — too risky for meaning.
Languages other than English.
Real-time / streaming.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dougcalobrisi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Jun 14, 2026

0.3.0

Jun 13, 2026

This version

0.2.0

Jun 12, 2026

0.1.1

Apr 28, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

erm-0.2.0.tar.gz (38.8 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

erm-0.2.0-py3-none-any.whl (30.0 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file erm-0.2.0.tar.gz.

File metadata

Download URL: erm-0.2.0.tar.gz
Upload date: Jun 12, 2026
Size: 38.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for erm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a21e663a58c40964517680da2826bb0fbdeae45aef0e1471f5e0d2c0c16f9885`
MD5	`9067607d295f80c36675a5791d987ca3`
BLAKE2b-256	`1075634223945fd676628173e1b9cdd1a4de0a7024ecce7f1f649ae07da36180`

See more details on using hashes here.

Provenance

The following attestation bundles were made for erm-0.2.0.tar.gz:

Publisher: release.yml on dougcalobrisi/erm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: erm-0.2.0.tar.gz
- Subject digest: a21e663a58c40964517680da2826bb0fbdeae45aef0e1471f5e0d2c0c16f9885
- Sigstore transparency entry: 1804211211
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: dougcalobrisi/erm@edec34a66704c8365d6a0856fd65d9dba08d5066
- Branch / Tag: refs/heads/main
- Owner: https://github.com/dougcalobrisi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@edec34a66704c8365d6a0856fd65d9dba08d5066
- Trigger Event: workflow_dispatch

File details

Details for the file erm-0.2.0-py3-none-any.whl.

File metadata

Download URL: erm-0.2.0-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 30.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for erm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21db692f54b16f80f889e116c3cb2364a2dfabd0ef7f9c2c3577e2976dc94c40`
MD5	`4dca5d58d4d76e295aa85cccc87023e0`
BLAKE2b-256	`c72ec239bc2af7659561a1c875e59b65e9a7b68deea7992605773e860d8fb546`

See more details on using hashes here.

Provenance

The following attestation bundles were made for erm-0.2.0-py3-none-any.whl:

Publisher: release.yml on dougcalobrisi/erm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: erm-0.2.0-py3-none-any.whl
- Subject digest: 21db692f54b16f80f889e116c3cb2364a2dfabd0ef7f9c2c3577e2976dc94c40
- Sigstore transparency entry: 1804211233
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: dougcalobrisi/erm@edec34a66704c8365d6a0856fd65d9dba08d5066
- Branch / Tag: refs/heads/main
- Owner: https://github.com/dougcalobrisi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@edec34a66704c8365d6a0856fd65d9dba08d5066
- Trigger Event: workflow_dispatch

erm 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

erm

Install

Transcription device (GPU vs CPU)

Usage

How it works

Denoising

Flags

Detection

Cuts and splices

Audio cleanup

Output

validate subcommand

Tests

Out of scope

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`validate` subcommand