Skip to main content

Remove background music from videos - for accessibility and personal use

Project description

HUSHER

As a muslim, I've experienced this friction many times: the documentary I want to watch, the lecture I want to learn from, the long-form video I want to understand, almost always comes with a score running underneath. One day I sat down to watch a documentary, the soundtrack was on top of every scene, and I went looking for a tool that would just take the video and hand me back the same video with the music gone. Audio-only stem splitters, GUIs aimed at music producers, cloud services that wanted my upload: none of them did the simple thing.

HUSHER is the simple thing. You give it a video file, it gives you back a video file, the music is gone, the dialogue and sound effects stay, and the video stream is copied through untouched so the picture is bit-for-bit unchanged.

From the terminal:

husher documentary.mp4
# → ~/Documents/HUSHER/documentary_hushed.mp4

From Python, the same engine behind a small stable API, so you can wire HUSHER into something larger (a content pipeline, a batch job, your own project):

import husher

husher.hush("documentary.mp4")

Under the hood it's an AI source-separation model (Bandit v2) running locally on your machine. It splits the audio into speech, music, and sfx, discards the music stem, mixes the other two back together, and remuxes the new audio into a copy of the video. The video stream is never re-encoded.

I built it for muslims who run into the same friction trying to learn, research, or produce content. The CLI is for one-off cleanup, the API is for building larger things on top. Other people have related reasons: hearing loss where the mix fights the narration, focus or auditory-processing conditions where the score becomes another competing stream you have to filter out. If that's you too, you're welcome here. The tool is generic, anyone can use it.

It's a one-maintainer project, and I'd rather you treat it as a useful tool you can fork than as infrastructure. If a fellow muslim dev takes what I've started here and makes it better than I could, that's honestly the outcome I'd be happiest with.

Why this and not something else

Most OSS in this space comes from music production, where the goal is to split a song into vocals, drums, bass, and other. That's the four-stem split you'll find in Demucs, in UVR5, in python-audio-separator. For cinematic content (documentaries, lectures, sermons, podcasts) that split is the wrong shape. What you actually want is speech, music, and sound effects. HUSHER is built around that three-stem split because that's what fits the use case.

The model behind it is Bandit v2, a research-grade separator from ICASSP 2025. It isn't shipped by default in UVR5, python-audio-separator, or Demucs, so as far as I know this is the only place you'll find it pre-wired into a video-in/video-out tool. HUSHER fetches the checkpoint and the config for you on first run; you don't have to chase them down.

The other thing I cared about was long files. Most separation tools load the whole audio track into RAM, which is fine for a song and rough for a feature-length documentary. HUSHER uses a fixed-size ring buffer instead, processing the audio in a moving window without ever holding the full track in memory. On an 84-minute documentary on my M4 with 24 GB, that meant 19 GB of peak memory for HUSHER versus 41 GB for the upstream reference, which only finished because macOS swapped about 20 GB to disk. About 2.15× less in practice. Numbers and methodology in docs/bench/.

What HUSHER isn't

Before you install it, a few things it doesn't do, so you don't find out the wrong way.

It's terminal only. There's no GUI and no drag-and-drop installer. If that's a dealbreaker, UVR5 is probably what you want for audio, or one of the cloud tools for one-shot video.

It isn't faster than Demucs. I optimised the memory path for long files, not raw throughput, and you'll feel that on short inputs where Demucs finishes first.

It doesn't take YouTube URLs, doesn't process anything in the cloud, and doesn't parallelise across machines. One local file at a time, or a folder processed sequentially.

It also doesn't give you the stems. HUSHER's job is to drop the music and hand back a video; if you want an isolated-stem export, that's a different tool.

It isn't easier to install than a signed binary. You'll need a terminal, brew, and git. If that's a barrier, the alternatives section at the bottom lists tools that are.

Install

macOS on Apple Silicon is the only platform I actually develop and benchmark on. CUDA and CPU paths exist in the code but I haven't tested them; treat them as experimental. If you're on Linux or Windows, the alternatives section at the bottom of this README will get you there faster than waiting for me.

The first run downloads a ~450 MB model checkpoint from Zenodo into ~/.husher/. It only happens once.

The quickest path on macOS:

git clone --recursive https://github.com/borderedprominent/HUSHER.git
cd husher
./scripts/install.sh

This creates a venv at ~/.husher/venv, installs HUSHER in editable mode, downloads the default model, and (by default) appends export PATH="$HOME/.husher:$PATH" to your ~/.zshrc or ~/.bashrc so you can run husher from any terminal.

If you'd rather manage your own PATH:

./scripts/install.sh --no-rc-edit

Open a new terminal after installing (so the PATH change takes effect) and run:

husher info   # verify device, FFmpeg, and model are ready

Manual install

For contributors or anyone who doesn't want a shell installer:

git clone --recursive https://github.com/borderedprominent/HUSHER.git
cd husher
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
brew install ffmpeg       # if not already installed
husher --version

The --recursive flag is required: HUSHER's separator calls into a pinned copy of [ZFTurbo's Music-Source-Separation-Training framework] (https://github.com/ZFTurbo/Music-Source-Separation-Training) as a git submodule at vendor/mss-training/.

Usage

husher                                # interactive mode (prompts for file)
husher video.mp4                      # single file → ~/Documents/HUSHER/
husher video.mp4 -o custom.mp4        # custom output path
husher --checkpoint eng video.mp4     # use the English-optimised model
husher --batch files.txt              # process paths from a list file
husher --folder /path/to/videos       # process every video in a folder
husher --folder ./videos --batch-size 10   # folder in batches of 10
husher info                           # system / model status
husher video.mp4 --device cpu         # force CPU, skip MPS
husher video.mp4 --force              # skip duration / disk-space checks

Supported input containers: mp4, mkv, avi, mov, webm, flv, wmv, m4v.

Model

HUSHER ships a single model: Bandit v2, CC-BY-SA 4.0, 48 kHz. CC-BY-SA permits commercial use under share-alike terms; see Licensing for the caveats before shipping anything commercial.

Bandit v2 ships per-language checkpoints (multi, eng, deu, fra, spa, cmn, fao). The default is multi. Swap with --checkpoint eng etc.

Python API

For batch loops or wiring HUSHER into your own code, instantiate a Session so the model loads once and is reused:

import husher

with husher.Session() as session:
    for path in ["a.mp4", "b.mp4", "c.mp4"]:
        session.hush(path)

husher.hush(), husher.Session, husher.Result, and the typed exception hierarchy under husher.HusherError form the entire v1.x public surface; semver applies to those names only. Full reference, including progress callbacks and stem-only outputs, in docs/api.md.

How it works

 input.mp4 ─┬─▶ ffmpeg audio extract ─▶ 48 kHz WAV
            │
            │                            Bandit v2 (streaming demix)
            │                                  │
            │                    ┌─────────────┼─────────────┐
            │                    ▼             ▼             ▼
            │                  speech        music          sfx
            │                    │           drop            │
            │                    └────────── mix ────────────┘
            │                                │
            └─▶ ffmpeg mux  ◀─────  new audio track (speech + sfx)
                   │
                   ▼
           output_hushed.mp4   (video stream copied as-is)

The separator uses a streaming demix path: a fixed-size ring buffer and virtual padded-chunk construction so the full audio track is never held in memory. On an 84-minute documentary this uses ~2.15× less peak memory than the vendor's batched demix(), which swaps 20 GB to disk on a 24 GB Mac. See docs/bench/ for the plain-language summary, full methodology in memory-benchmark.md, and a windowing bug this benchmarking uncovered and fixed.

Configuration and data

HUSHER writes data to two places, both under your home directory:

Path What lives there
~/.husher/models/ Downloaded model checkpoints (~450 MB each), configs
~/.husher/venv/ Python virtual env (created by install.sh)
~/.husher/telemetry.jsonl Local debug log (only if you enable it)
~/Documents/HUSHER/ Processed output videos

Nothing is sent over the network after the initial model download.

Local debug log (off by default)

HUSHER can write a local JSONL log of processing stages (wall time, peak memory, MPS allocation) to help debug performance issues. The log never leaves your machine. It's a plain file you can read, delete, or ignore.

Telemetry is off by default. Enable it when you want a paper trail:

HUSHER_TELEMETRY=1 husher video.mp4

The log contains wall time, peak RSS, and MPS memory at each stage boundary; nothing that identifies you or your files. Delete it any time.

Security

The Bandit v2 checkpoint format is a Python pickle. torch.load has to be called with weights_only=False to deserialize it, which means loading a checkpoint executes arbitrary pickled code from that file. This is a known caveat of running PyTorch checkpoints from any source.

HUSHER mitigates the risk with two belts:

  1. Pinned source. Model downloads only come from the specific Zenodo record encoded in husher/utils/paths.py (12701995 for Bandit v2). The URL is not user-configurable.
  2. SHA256 verification against CHECKPOINT_SHA256 in the same file. A mismatched download is rejected before it's loaded.

The default Bandit v2 multi checkpoint has a verified digest recorded. The other language variants (eng, deu, fra, spa, cmn, fao) currently download without integrity checks; the CLI prints a clear warning when this is the case. Digests will be populated after a known-good download and review.

If you don't trust this chain, don't run HUSHER. The threat model here is "Zenodo or the install pipeline is compromised"; against that, only a signed release helps, which we don't have yet.

Development

pip install -e .[dev]
pytest
ruff check .

The test suite has a 35 % coverage floor (enforced in CI on macOS across Python 3.10 / 3.11 / 3.12). Most tests are offline and fast. Tests that exercise the actual model (numerical parity against the real checkpoint) are gated behind HUSHER_RUN_MODEL_TESTS=1 so CI doesn't try to download 450 MB of weights.

HUSHER_RUN_MODEL_TESTS=1 pytest tests/test_core_separator_streaming.py

An end-to-end test that drives HusherPipeline.process() on a tiny synthetic video fixture is at tests/test_pipeline_e2e.py and runs without model weights (it stubs the separator and uses real FFmpeg).

Licensing

HUSHER is a personal tool, not an enterprise product. Read this section before using it for anything beyond personal viewing.

  • HUSHER's own code: MIT (see LICENSE).
  • Bandit v2 model weights: CC-BY-SA 4.0, from Zenodo 12701995. CC-BY-SA permits commercial use under share-alike terms: any derivative work you distribute under these weights must itself be licensed CC-BY-SA. That clause is operationally hostile to a lot of commercial software (it arguably extends to downstream derivative works). Consult a lawyer before shipping a commercial product built on HUSHER. The README can't and doesn't give legal advice.
  • Vendored separator framework at vendor/mss-training/: ZFTurbo's Music-Source-Separation-Training, MIT-licensed.

Project risks

HUSHER carries real single-points-of-failure you should know about before building anything on top of it:

  • Upstream model maintenance. The [Bandit v2 research repo] (https://github.com/kwatcharasupat/bandit-v2) has a small number of commits and no ongoing release cadence. If the author moves on, the model itself won't get fixes.
  • Vendored separator framework. HUSHER calls into ZFTurbo/Music-Source-Separation-Training via a git submodule pinned to a specific SHA. Upstream inference-API changes would require HUSHER to track or fork.
  • Forked model config. husher/configs/bandit_v2.yaml ships an explicit copy of the Bandit v2 architecture hyperparameters (band count, RNN dim, etc.). If upstream releases a Bandit v2.1 with a different architecture, HUSHER's config will be silently incompatible with the new checkpoint; you'll get a state-dict shape mismatch, not a friendly error. Checkpoint downloads are pinned to the Zenodo record currently compatible with this config, so existing installs keep working; only voluntarily pointing at a new upstream release would trigger this.
  • One-maintainer project. Treat it as a useful personal tool that you can fork if it stops being maintained, not as infrastructure.
  • Niche model. Bandit v2 isn't shipped by default in UVR5, python-audio-separator, or Demucs, so there's no adjacent community to rely on for ecosystem-level fixes.

The MIT license and local-only design mean you can always fork and keep running what works for you. That's intentional.

Alternatives

If HUSHER isn't the right fit, use the right tool instead:

  • asaah18/video-music-remover: closest direct competitor. Video in, video out, CLI, uses Demucs. Ship it if you specifically want Demucs or the 4-stem music-production split.
  • UVR5 (Ultimate Vocal Remover GUI): the de-facto GUI. Audio-only (you'd still need to ffmpeg-remux). Signed Win/Mac/Linux bundles, huge community. Best choice for non-technical users on a desktop.
  • python-audio-separator: maintained CLI/library successor for the UVR model collection (MDX-Net, Demucs, BS-RoFormer, etc.). Power-user-friendly.
  • Demucs: the research-grade reference for music source separation. Parent repo is archived; active development is on the adefossez fork.
  • Cloud services (MVSEP, Lalal.ai, vocalremover.org, etc.): if you'd rather not install anything and are comfortable uploading your audio. Usually the fastest path for a non-technical user.

HUSHER is specifically for the case where you want video-in / video-out / local / long-form / cinematic 3-stem. If your needs are different, one of the above is probably better.

Contributing

Open an issue or PR at https://github.com/borderedprominent/HUSHER/issues.

For bug reports, running with the local debug log enabled helps a lot:

HUSHER_TELEMETRY=1 husher your-video.mp4
# then attach ~/.husher/telemetry.jsonl (it never leaves your machine until you share it)

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

husher-1.0.0.tar.gz (109.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

husher-1.0.0-py3-none-any.whl (85.2 kB view details)

Uploaded Python 3

File details

Details for the file husher-1.0.0.tar.gz.

File metadata

  • Download URL: husher-1.0.0.tar.gz
  • Upload date:
  • Size: 109.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for husher-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ca25e1f8944c73741534077118493a95866addabc9ac1ca4d2c0b39f2b9ee0ee
MD5 06046d3d75096f955eafd4e36f551598
BLAKE2b-256 16d5d9e0a3a98f4ad19c764657e7bdc7592d5efd6d7a71cfd8d0f471b00a507f

See more details on using hashes here.

Provenance

The following attestation bundles were made for husher-1.0.0.tar.gz:

Publisher: publish.yml on borderedprominent/HUSHER

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file husher-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: husher-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 85.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for husher-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b9c0622760230fc3e8cd3f40b2a4890b61ccdd864996269ae21d83b0076453c
MD5 2167b6df5a104dfd53f0f4162a13b635
BLAKE2b-256 35d455d6883ede8e48f39f4ffb332e75a961c338bc574acab6e242e213e6ee0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for husher-1.0.0-py3-none-any.whl:

Publisher: publish.yml on borderedprominent/HUSHER

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page