Moises-Light: Resource-efficient Band-split U-Net for Music Source Separation

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

crlandsc

These details have not been verified by PyPI

Project description

(WORK IN PROGRESS)

Moises-Light

Resource-efficient Band-split U-Net for Music Source Separation

PyTorch implementation of the Moises-Light architecture from "Resource-efficient Separation Transformer" (Rigatelli et al., 2024). The paper does not release code; this is an independent implementation based on the paper's description.

Key Features

Spectral-only U-Net with internal STFT/iSTFT
Equal-width band splitting via group convolutions
Dual-path RoPE transformer bottleneck
Asymmetric encoder/decoder (3 heavy encoder stages, 1 heavy + 2 light decoder stages)
Multi-stem output: all stems in a single forward pass (unlike the paper's per-stem approach)
6 preset configurations from 2.5M to 8.4M parameters

Installation

pip install moises-light

Quick Start

import torch
from moises_light import MoisesLight, configs

# Use a preset
model = MoisesLight(**configs['paper_large'])

# Forward pass
x = torch.randn(1, 2, 264600)  # [batch, channels, samples] 6s @ 44.1kHz
y = model(x)                     # [1, 4, 2, 264600] = [batch, stems, channels, samples]

# With auxiliary outputs interface (for training framework compatibility)
y, aux = model(x, return_auxiliary_outputs=True)

Preset Configurations

All presets use n_fft=6144, hop_size=1024, stereo input, and 4-stem output (vocals, drums, bass, other).

Paper-Faithful (truncated spectrum, 0-14.7 kHz)

Faithful to the paper's architecture. Frequencies above ~14.7 kHz are zeroed.

Preset	G	Bands	Per-group ch	Freq coverage	Params
`paper_large`	56	4	14	0-14.7 kHz	4,660,592
`paper_small`	32	4	8	0-14.7 kHz	2,520,592

Fullband Matched-Param (full spectrum, 0-22 kHz, similar param budget)

Full spectrum via 6 bands of 512 bins (freq_dim=3072). G adjusted to keep param count close to paper variants. Trades per-band capacity for full spectrum coverage.

Preset	G	Bands	Per-group ch	Freq coverage	Params
`fullband_large`	60	6	10	0-22 kHz	4,948,612
`fullband_small`	36	6	6	0-22 kHz	2,824,244

Fullband Wide (full spectrum, 0-22 kHz, matched per-group capacity)

Full spectrum with the same per-group channel capacity as the paper models. More total params but no per-band capacity compromise.

Preset	G	Bands	Per-group ch	Freq coverage	Params
`fullband_large_wide`	84	6	14	0-22 kHz	8,354,908
`fullband_small_wide`	48	6	8	0-22 kHz	4,102,712

Why Three Tiers?

G (base channel width) must be divisible by n_bands for group convolutions. With 6 bands, G cannot be 56 (not divisible by 6). Two strategies:

Matched-param: Pick the nearest divisible G that keeps total params similar (G=60 or 36). Each group conv processes fewer channels per band, but total model size stays close to the paper.
Matched per-group (wide): Pick G so that G/n_bands equals the paper's per-group count (84/6=14, matching 56/4=14). Each band gets identical capacity, but total params increase ~1.8x.

Architecture

Input [B, C, L]
    |
    v
STFT -> [B, 4, F, T]         (stereo * real/imag = 4 channels)
    |
    v
Freq truncation -> [B, 4, freq_dim, T]
    |
    v
Z-score normalize
    |
    v
Band split -> [B, 4*n_bands, freq_dim/n_bands, T]
    |
    v
First conv (group K=1) -> [B, G, F_band, T]
    |
    v
Encoder (3x): SplitAndMerge + TimeDownsample
    |
    v
Bottleneck: SplitAndMerge + N_rope x DualPath(FreqRoPE + TimeRoPE)
    |
    v
Decoder: 1 heavy (upsample + skip * SplitAndMerge) + 2 light (upsample + skip)
    |
    v
Final conv (group K=1) -> [B, 4*n_bands, F_band, T]
    |
    v
Band merge -> [B, 4, freq_dim, T]
    |
    v
Source head -> [B, S*4, freq_dim, T]
    |
    v
Multiplicative mask on original STFT
    |
    v
Freq zero-pad + iSTFT -> [B, S, C, L]

Key Parameters

Parameter	Description	Constraint
`G`	Base channel width. Channels at encoder stage i = G*(i+1)	Must be divisible by `n_bands`
`n_bands`	Number of equal-width frequency bands for group conv	`freq_dim` must be divisible by `n_bands`
`freq_dim`	Number of STFT bins to process (rest zero-padded)	Paper: 2048 (~14.7 kHz). Fullband: 3072 (~22 kHz)
`n_rope`	Number of dual-path RoPE transformer blocks in bottleneck	Paper large: 5, paper small: 6
`n_enc` / `n_dec`	Encoder stages / heavy decoder stages	Asymmetric: `n_dec < n_enc` saves params
`n_split_enc` / `n_split_dec`	Number of group conv layers per SplitAndMerge block	Controls depth within each stage
`bn_factor`	TDF bottleneck factor (freq_dim -> freq_dim/bn_factor -> freq_dim)	Higher = more compression

Multi-Stem vs Per-Stem

This implementation outputs all stems simultaneously. The paper trains separate per-stem models. To use per-stem:

model = MoisesLight(**{**configs['paper_large'], 'sources': ['vocals']})
y = model(x)  # [B, 1, C, L]

Frequency Truncation

Paper presets use freq_dim=2048, keeping 2048 of 3073 STFT bins (~14.7 kHz at 44.1 kHz sample rate). Everything above is zeroed — hi-hats, vocal air, cymbal shimmer above ~15 kHz are not recovered. This saves ~33% compute through the U-Net.

Fullband presets use freq_dim=3072, covering 0-22 kHz (nearly the full spectrum).

Integration

ZFTurbo Music-Source-Separation-Training

from moises_light import MoisesLight

model = MoisesLight(
    sources=['vocals', 'drums', 'bass', 'other'],
    audio_channels=2, n_fft=6144, hop_size=1024, win_size=6144,
    freq_dim=2048, n_bands=4, G=56, n_enc=3, n_dec=1,
    n_split_enc=3, n_split_dec=1, n_rope=5, bn_factor=4,
)

Custom Training Loop

model = MoisesLight(**configs['paper_large'])
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4)

for batch in dataloader:
    mix = batch['mix']          # [B, 2, L]
    targets = batch['targets']  # [B, 4, 2, L]
    pred = model(mix)           # [B, 4, 2, L]
    loss = criterion(pred, targets)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Known Limitations

MPS (Apple Silicon): torch.istft does not support MPS. The model automatically falls back to CPU for iSTFT, which adds overhead. This is a PyTorch limitation, not a model issue.
Frequency truncation: Paper presets zero frequencies above ~14.7 kHz. Use fullband presets if high-frequency content matters.

Citation

@article{rigatelli2024resource,
  title={Resource-efficient Separation Transformer},
  author={Rigatelli, Luca and Condorelli, Stefano and Scardapane, Simone},
  journal={arXiv preprint arXiv:2412.11132},
  year={2024}
}

License

MIT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

crlandsc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Apr 9, 2026

0.1.4

Mar 20, 2026

0.1.3

Mar 19, 2026

0.1.2

Mar 19, 2026

0.1.1

Mar 19, 2026

This version

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moises_light-0.1.0.tar.gz (16.3 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

moises_light-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file moises_light-0.1.0.tar.gz.

File metadata

Download URL: moises_light-0.1.0.tar.gz
Upload date: Mar 19, 2026
Size: 16.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moises_light-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cac0aa5d898eeede66e0b5cc68e3cc4134b39c96e54bff42d6dff60c93dce378`
MD5	`bcb9c7d0b87da0850c3221afe8f9a8a1`
BLAKE2b-256	`090d528dd21753aa89a1a3d194476c08471325dba56166e7e3c6cd34511324fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for moises_light-0.1.0.tar.gz:

Publisher: pypi.yml on crlandsc/moises-light

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: moises_light-0.1.0.tar.gz
- Subject digest: cac0aa5d898eeede66e0b5cc68e3cc4134b39c96e54bff42d6dff60c93dce378
- Sigstore transparency entry: 1134437879
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: crlandsc/moises-light@6472d62a0b639cb3c19746f501064a28a1464d75
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/crlandsc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@6472d62a0b639cb3c19746f501064a28a1464d75
- Trigger Event: push

File details

Details for the file moises_light-0.1.0-py3-none-any.whl.

File metadata

Download URL: moises_light-0.1.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moises_light-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d7e54b66413c76b85582502cb7acb37b227199e96f76d23ff806d52b08b683d`
MD5	`b6a7492f37184604b17b4ab718a44cc9`
BLAKE2b-256	`42617b5343691ca13e727bdf27fd8ae47050343543dab025866d1b0564f94634`

See more details on using hashes here.

Provenance

The following attestation bundles were made for moises_light-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on crlandsc/moises-light

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: moises_light-0.1.0-py3-none-any.whl
- Subject digest: 6d7e54b66413c76b85582502cb7acb37b227199e96f76d23ff806d52b08b683d
- Sigstore transparency entry: 1134437965
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: crlandsc/moises-light@6472d62a0b639cb3c19746f501064a28a1464d75
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/crlandsc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@6472d62a0b639cb3c19746f501064a28a1464d75
- Trigger Event: push

moises-light 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

(WORK IN PROGRESS)

Moises-Light

Key Features

Installation

Quick Start

Preset Configurations

Paper-Faithful (truncated spectrum, 0-14.7 kHz)

Fullband Matched-Param (full spectrum, 0-22 kHz, similar param budget)

Fullband Wide (full spectrum, 0-22 kHz, matched per-group capacity)

Why Three Tiers?

Architecture

Key Parameters

Multi-Stem vs Per-Stem

Frequency Truncation

Integration

ZFTurbo Music-Source-Separation-Training

Custom Training Loop

Known Limitations

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance