Skip to main content

Moises-Light: Resource-efficient Band-split U-Net for Music Source Separation

Project description

(WORK IN PROGRESS)

Moises-Light

PyPI version License: MIT Python 3.9+

Resource-efficient Band-split U-Net for Music Source Separation

PyTorch implementation of the Moises-Light architecture from "Resource-efficient Separation Transformer" (Rigatelli et al., 2024). The paper does not release code; this is an independent implementation based on the paper's description.

Key Features

  • Spectral-only U-Net with internal STFT/iSTFT
  • Equal-width band splitting via group convolutions
  • Dual-path RoPE transformer bottleneck
  • Asymmetric encoder/decoder (3 heavy encoder stages, 1 heavy + 2 light decoder stages)
  • Multi-stem output: all stems in a single forward pass (unlike the paper's per-stem approach)
  • 6 preset configurations from 2.5M to 8.4M parameters

Installation

pip install moises-light

Quick Start

import torch
from moises_light import MoisesLight, configs

# Use a preset
model = MoisesLight(**configs['paper_large'])

# Forward pass
x = torch.randn(1, 2, 264600)  # [batch, channels, samples] 6s @ 44.1kHz
y = model(x)                     # [1, 4, 2, 264600] = [batch, stems, channels, samples]

# With auxiliary outputs interface (for training framework compatibility)
y, aux = model(x, return_auxiliary_outputs=True)

Preset Configurations

All presets use n_fft=6144, hop_size=1024, stereo input, and 4-stem output (vocals, drums, bass, other).

Paper-Faithful (truncated spectrum, 0-14.7 kHz)

Faithful to the paper's architecture. Frequencies above ~14.7 kHz are zeroed.

Preset G Bands Per-group ch Freq coverage Params
paper_large 56 4 14 0-14.7 kHz 4,660,592
paper_small 32 4 8 0-14.7 kHz 2,520,592

Fullband Matched-Param (full spectrum, 0-22 kHz, similar param budget)

Full spectrum via 6 bands of 512 bins (freq_dim=3072). G adjusted to keep param count close to paper variants. Trades per-band capacity for full spectrum coverage.

Preset G Bands Per-group ch Freq coverage Params
fullband_large 60 6 10 0-22 kHz 4,948,612
fullband_small 36 6 6 0-22 kHz 2,824,244

Fullband Wide (full spectrum, 0-22 kHz, matched per-group capacity)

Full spectrum with the same per-group channel capacity as the paper models. More total params but no per-band capacity compromise.

Preset G Bands Per-group ch Freq coverage Params
fullband_large_wide 84 6 14 0-22 kHz 8,354,908
fullband_small_wide 48 6 8 0-22 kHz 4,102,712

Why Three Tiers?

G (base channel width) must be divisible by n_bands for group convolutions. With 6 bands, G cannot be 56 (not divisible by 6). Two strategies:

  1. Matched-param: Pick the nearest divisible G that keeps total params similar (G=60 or 36). Each group conv processes fewer channels per band, but total model size stays close to the paper.

  2. Matched per-group (wide): Pick G so that G/n_bands equals the paper's per-group count (84/6=14, matching 56/4=14). Each band gets identical capacity, but total params increase ~1.8x.

Architecture

Input [B, C, L]
    |
    v
STFT -> [B, 4, F, T]         (stereo * real/imag = 4 channels)
    |
    v
Freq truncation -> [B, 4, freq_dim, T]
    |
    v
Z-score normalize
    |
    v
Band split -> [B, 4*n_bands, freq_dim/n_bands, T]
    |
    v
First conv (group K=1) -> [B, G, F_band, T]
    |
    v
Encoder (3x): SplitAndMerge + TimeDownsample
    |
    v
Bottleneck: SplitAndMerge + N_rope x DualPath(FreqRoPE + TimeRoPE)
    |
    v
Decoder: 1 heavy (upsample + skip * SplitAndMerge) + 2 light (upsample + skip)
    |
    v
Final conv (group K=1) -> [B, 4*n_bands, F_band, T]
    |
    v
Band merge -> [B, 4, freq_dim, T]
    |
    v
Source head -> [B, S*4, freq_dim, T]
    |
    v
Multiplicative mask on original STFT
    |
    v
Freq zero-pad + iSTFT -> [B, S, C, L]

Key Parameters

Parameter Description Constraint
G Base channel width. Channels at encoder stage i = G*(i+1) Must be divisible by n_bands
n_bands Number of equal-width frequency bands for group conv freq_dim must be divisible by n_bands
freq_dim Number of STFT bins to process (rest zero-padded) Paper: 2048 (~14.7 kHz). Fullband: 3072 (~22 kHz)
n_rope Number of dual-path RoPE transformer blocks in bottleneck Paper large: 5, paper small: 6
n_enc / n_dec Encoder stages / heavy decoder stages Asymmetric: n_dec < n_enc saves params
n_split_enc / n_split_dec Number of group conv layers per SplitAndMerge block Controls depth within each stage
bn_factor TDF bottleneck factor (freq_dim -> freq_dim/bn_factor -> freq_dim) Higher = more compression

Multi-Stem vs Per-Stem

This implementation outputs all stems simultaneously. The paper trains separate per-stem models. To use per-stem:

model = MoisesLight(**{**configs['paper_large'], 'sources': ['vocals']})
y = model(x)  # [B, 1, C, L]

Frequency Truncation

Paper presets use freq_dim=2048, keeping 2048 of 3073 STFT bins (~14.7 kHz at 44.1 kHz sample rate). Everything above is zeroed — hi-hats, vocal air, cymbal shimmer above ~15 kHz are not recovered. This saves ~33% compute through the U-Net.

Fullband presets use freq_dim=3072, covering 0-22 kHz (nearly the full spectrum).

Integration

ZFTurbo Music-Source-Separation-Training

from moises_light import MoisesLight

model = MoisesLight(
    sources=['vocals', 'drums', 'bass', 'other'],
    audio_channels=2, n_fft=6144, hop_size=1024, win_size=6144,
    freq_dim=2048, n_bands=4, G=56, n_enc=3, n_dec=1,
    n_split_enc=3, n_split_dec=1, n_rope=5, bn_factor=4,
)

Custom Training Loop

model = MoisesLight(**configs['paper_large'])
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4)

for batch in dataloader:
    mix = batch['mix']          # [B, 2, L]
    targets = batch['targets']  # [B, 4, 2, L]
    pred = model(mix)           # [B, 4, 2, L]
    loss = criterion(pred, targets)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Known Limitations

  • MPS (Apple Silicon): torch.istft does not support MPS. The model automatically falls back to CPU for iSTFT, which adds overhead. This is a PyTorch limitation, not a model issue.
  • Frequency truncation: Paper presets zero frequencies above ~14.7 kHz. Use fullband presets if high-frequency content matters.

Citation

@article{rigatelli2024resource,
  title={Resource-efficient Separation Transformer},
  author={Rigatelli, Luca and Condorelli, Stefano and Scardapane, Simone},
  journal={arXiv preprint arXiv:2412.11132},
  year={2024}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moises_light-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moises_light-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file moises_light-0.1.0.tar.gz.

File metadata

  • Download URL: moises_light-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moises_light-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cac0aa5d898eeede66e0b5cc68e3cc4134b39c96e54bff42d6dff60c93dce378
MD5 bcb9c7d0b87da0850c3221afe8f9a8a1
BLAKE2b-256 090d528dd21753aa89a1a3d194476c08471325dba56166e7e3c6cd34511324fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for moises_light-0.1.0.tar.gz:

Publisher: pypi.yml on crlandsc/moises-light

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file moises_light-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: moises_light-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moises_light-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d7e54b66413c76b85582502cb7acb37b227199e96f76d23ff806d52b08b683d
MD5 b6a7492f37184604b17b4ab718a44cc9
BLAKE2b-256 42617b5343691ca13e727bdf27fd8ae47050343543dab025866d1b0564f94634

See more details on using hashes here.

Provenance

The following attestation bundles were made for moises_light-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on crlandsc/moises-light

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page