Moises-Light: Resource-efficient Band-split U-Net for Music Source Separation
Project description
(WORK IN PROGRESS)
Moises-Light
Resource-efficient Band-split U-Net for Music Source Separation
PyTorch implementation of the Moises-Light architecture from "Resource-efficient Separation Transformer" (Rigatelli et al., 2024). The paper does not release code; this is an independent implementation based on the paper's description.
Key Features
- Spectral-only U-Net with internal STFT/iSTFT
- Equal-width band splitting via group convolutions
- Dual-path RoPE transformer bottleneck
- Asymmetric encoder/decoder (3 heavy encoder stages, 1 heavy + 2 light decoder stages)
- Multi-stem output: all stems in a single forward pass (unlike the paper's per-stem approach)
- 6 preset configurations from 2.5M to 8.4M parameters
Installation
pip install moises-light
Quick Start
import torch
from moises_light import MoisesLight, configs
# Use a preset
model = MoisesLight(**configs['paper_large'])
# Forward pass
x = torch.randn(1, 2, 264600) # [batch, channels, samples] 6s @ 44.1kHz
y = model(x) # [1, 4, 2, 264600] = [batch, stems, channels, samples]
# With auxiliary outputs interface (for training framework compatibility)
y, aux = model(x, return_auxiliary_outputs=True)
Preset Configurations
All presets use n_fft=6144, hop_size=1024, stereo input, and 4-stem output (vocals, drums, bass, other).
Paper-Faithful (truncated spectrum, 0-14.7 kHz)
Faithful to the paper's architecture. Frequencies above ~14.7 kHz are zeroed.
| Preset | G | Bands | Per-group ch | Freq coverage | Params |
|---|---|---|---|---|---|
paper_large |
56 | 4 | 14 | 0-14.7 kHz | 4,660,592 |
paper_small |
32 | 4 | 8 | 0-14.7 kHz | 2,520,592 |
Fullband Matched-Param (full spectrum, 0-22 kHz, similar param budget)
Full spectrum via 6 bands of 512 bins (freq_dim=3072). G adjusted to keep param count close to paper variants. Trades per-band capacity for full spectrum coverage.
| Preset | G | Bands | Per-group ch | Freq coverage | Params |
|---|---|---|---|---|---|
fullband_large |
60 | 6 | 10 | 0-22 kHz | 4,948,612 |
fullband_small |
36 | 6 | 6 | 0-22 kHz | 2,824,244 |
Fullband Wide (full spectrum, 0-22 kHz, matched per-group capacity)
Full spectrum with the same per-group channel capacity as the paper models. More total params but no per-band capacity compromise.
| Preset | G | Bands | Per-group ch | Freq coverage | Params |
|---|---|---|---|---|---|
fullband_large_wide |
84 | 6 | 14 | 0-22 kHz | 8,354,908 |
fullband_small_wide |
48 | 6 | 8 | 0-22 kHz | 4,102,712 |
Why Three Tiers?
G (base channel width) must be divisible by n_bands for group convolutions. With 6 bands, G cannot be 56 (not divisible by 6). Two strategies:
-
Matched-param: Pick the nearest divisible G that keeps total params similar (G=60 or 36). Each group conv processes fewer channels per band, but total model size stays close to the paper.
-
Matched per-group (wide): Pick G so that
G/n_bandsequals the paper's per-group count (84/6=14, matching 56/4=14). Each band gets identical capacity, but total params increase ~1.8x.
Architecture
Input [B, C, L]
|
v
STFT -> [B, 4, F, T] (stereo * real/imag = 4 channels)
|
v
Freq truncation -> [B, 4, freq_dim, T]
|
v
Z-score normalize
|
v
Band split -> [B, 4*n_bands, freq_dim/n_bands, T]
|
v
First conv (group K=1) -> [B, G, F_band, T]
|
v
Encoder (3x): SplitAndMerge + TimeDownsample
|
v
Bottleneck: SplitAndMerge + N_rope x DualPath(FreqRoPE + TimeRoPE)
|
v
Decoder: 1 heavy (upsample + skip * SplitAndMerge) + 2 light (upsample + skip)
|
v
Final conv (group K=1) -> [B, 4*n_bands, F_band, T]
|
v
Band merge -> [B, 4, freq_dim, T]
|
v
Source head -> [B, S*4, freq_dim, T]
|
v
Multiplicative mask on original STFT
|
v
Freq zero-pad + iSTFT -> [B, S, C, L]
Key Parameters
| Parameter | Description | Constraint |
|---|---|---|
G |
Base channel width. Channels at encoder stage i = G*(i+1) | Must be divisible by n_bands |
n_bands |
Number of equal-width frequency bands for group conv | freq_dim must be divisible by n_bands |
freq_dim |
Number of STFT bins to process (rest zero-padded) | Paper: 2048 (~14.7 kHz). Fullband: 3072 (~22 kHz) |
n_rope |
Number of dual-path RoPE transformer blocks in bottleneck | Paper large: 5, paper small: 6 |
n_enc / n_dec |
Encoder stages / heavy decoder stages | Asymmetric: n_dec < n_enc saves params |
n_split_enc / n_split_dec |
Number of group conv layers per SplitAndMerge block | Controls depth within each stage |
bn_factor |
TDF bottleneck factor (freq_dim -> freq_dim/bn_factor -> freq_dim) | Higher = more compression |
Multi-Stem vs Per-Stem
This implementation outputs all stems simultaneously. The paper trains separate per-stem models. To use per-stem:
model = MoisesLight(**{**configs['paper_large'], 'sources': ['vocals']})
y = model(x) # [B, 1, C, L]
Frequency Truncation
Paper presets use freq_dim=2048, keeping 2048 of 3073 STFT bins (~14.7 kHz at 44.1 kHz sample rate). Everything above is zeroed — hi-hats, vocal air, cymbal shimmer above ~15 kHz are not recovered. This saves ~33% compute through the U-Net.
Fullband presets use freq_dim=3072, covering 0-22 kHz (nearly the full spectrum).
Integration
ZFTurbo Music-Source-Separation-Training
from moises_light import MoisesLight
model = MoisesLight(
sources=['vocals', 'drums', 'bass', 'other'],
audio_channels=2, n_fft=6144, hop_size=1024, win_size=6144,
freq_dim=2048, n_bands=4, G=56, n_enc=3, n_dec=1,
n_split_enc=3, n_split_dec=1, n_rope=5, bn_factor=4,
)
Custom Training Loop
model = MoisesLight(**configs['paper_large'])
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4)
for batch in dataloader:
mix = batch['mix'] # [B, 2, L]
targets = batch['targets'] # [B, 4, 2, L]
pred = model(mix) # [B, 4, 2, L]
loss = criterion(pred, targets)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Known Limitations
- MPS (Apple Silicon):
torch.istftdoes not support MPS. The model automatically falls back to CPU for iSTFT, which adds overhead. This is a PyTorch limitation, not a model issue. - Frequency truncation: Paper presets zero frequencies above ~14.7 kHz. Use fullband presets if high-frequency content matters.
Citation
@article{rigatelli2024resource,
title={Resource-efficient Separation Transformer},
author={Rigatelli, Luca and Condorelli, Stefano and Scardapane, Simone},
journal={arXiv preprint arXiv:2412.11132},
year={2024}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moises_light-0.1.0.tar.gz.
File metadata
- Download URL: moises_light-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cac0aa5d898eeede66e0b5cc68e3cc4134b39c96e54bff42d6dff60c93dce378
|
|
| MD5 |
bcb9c7d0b87da0850c3221afe8f9a8a1
|
|
| BLAKE2b-256 |
090d528dd21753aa89a1a3d194476c08471325dba56166e7e3c6cd34511324fd
|
Provenance
The following attestation bundles were made for moises_light-0.1.0.tar.gz:
Publisher:
pypi.yml on crlandsc/moises-light
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
moises_light-0.1.0.tar.gz -
Subject digest:
cac0aa5d898eeede66e0b5cc68e3cc4134b39c96e54bff42d6dff60c93dce378 - Sigstore transparency entry: 1134437879
- Sigstore integration time:
-
Permalink:
crlandsc/moises-light@6472d62a0b639cb3c19746f501064a28a1464d75 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/crlandsc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@6472d62a0b639cb3c19746f501064a28a1464d75 -
Trigger Event:
push
-
Statement type:
File details
Details for the file moises_light-0.1.0-py3-none-any.whl.
File metadata
- Download URL: moises_light-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7e54b66413c76b85582502cb7acb37b227199e96f76d23ff806d52b08b683d
|
|
| MD5 |
b6a7492f37184604b17b4ab718a44cc9
|
|
| BLAKE2b-256 |
42617b5343691ca13e727bdf27fd8ae47050343543dab025866d1b0564f94634
|
Provenance
The following attestation bundles were made for moises_light-0.1.0-py3-none-any.whl:
Publisher:
pypi.yml on crlandsc/moises-light
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
moises_light-0.1.0-py3-none-any.whl -
Subject digest:
6d7e54b66413c76b85582502cb7acb37b227199e96f76d23ff806d52b08b683d - Sigstore transparency entry: 1134437965
- Sigstore integration time:
-
Permalink:
crlandsc/moises-light@6472d62a0b639cb3c19746f501064a28a1464d75 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/crlandsc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@6472d62a0b639cb3c19746f501064a28a1464d75 -
Trigger Event:
push
-
Statement type: