Python package for music source separation.
Project description
pymss
Python package for music source separation.
[English] 简体中文
Install
If you want the CUDA build of PyTorch, install it first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
For CLI and Python API usage, install:
pip install pymss
If you need API or WebUI, install this instead:
pip install "pymss[server]"
Develop
Development requires Git, Python 3.10 or later, and uv. WebUI development also requires Node.js and npm.
Clone the Python package repository and install development dependencies:
git clone https://github.com/pymss-project/pymss
cd pymss
uv sync --group dev
If you need to develop or locally serve the WebUI, the WebUI source lives in a separate repository and must be built with Node.js:
git clone https://github.com/pymss-project/pymss-webui
cd pymss-webui
npm ci
npm run build
Copy the built WebUI assets into the Python package checkout:
cp -R dist/. ../pymss/server/webui_static/
Build source and wheel distributions from the Python package checkout:
cd ..
uv build
The test suite uses pytest. The migrated integration tests live in test/ and are parameterized through test/test_all.py. They require local model weights, configs, and input audio; missing assets are skipped automatically.
uv run pytest test -q
Usage
CLI inference
Run inference by catalog model name. If the model, config, or auxiliary files are missing locally, the CLI downloads them automatically before inference.
pymss infer bs_roformer_voc_hyperacev2 \
-i path/to/input_file_or_folder \
-o results \
--device auto \
--format wav
--device auto uses CUDA first when an NVIDIA GPU is available. On Apple Silicon it uses the MLX backend by default. Use --device mlx to force MLX, or --device mps to force PyTorch MPS.
The default download source is ModelScope. You can choose another source or model directory:
pymss --model-dir /path/to/models infer bs_roformer_voc_hyperacev2 \
--source hf-mirror \
-i path/to/input_file_or_folder \
-o results
When running from a source checkout without installation, use python -m pymss.cli instead of pymss.
CLI ensemble
pymss ensemble path/to/model_a_vocals.wav path/to/model_b_vocals.wav \
--algorithm avg_wave \
--weights 1 0.8 \
-o results/ensemble_vocals.wav
Available algorithms are avg_wave, median_wave, min_wave, max_wave, avg_fft, median_fft, min_fft, and max_fft. Input files must use the same sample rate and channel count. Files with different lengths are truncated to the shortest input. If --weights is omitted, every input uses weight 1.
Server and WebUI
Install the optional server dependencies to run a HTTP server with dynamic model loading, catalog browsing, model downloads, and an optional browser WebUI:
pip install "pymss[server]"
pymss serve --webui
See server CLI docs, server API docs, and server error docs for details.
Python API
Use a catalog model name directly. You do not need to pass model_type, model_path, or config_path.
from pymss import MSSeparator
separator = MSSeparator.from_model_name(
"bs_roformer_voc_hyperacev2",
download=True,
device="auto",
output_format="wav",
store_dirs="results",
)
separator.process_folder("path/to/input_file_or_folder")
download=True downloads missing model files before loading. Omit it for strict local-only loading.
MSSeparator can also be used as a context manager. Leaving the with block automatically calls separator.close(), which releases model references and clears backend caches where possible.
from pymss import MSSeparator
with MSSeparator.from_model_name(
"bs_roformer_voc_hyperacev2",
download=True,
device="auto",
output_format="wav",
store_dirs="results",
) as separator:
separator.process_folder("path/to/input_file_or_folder")
Manual model paths
Use the full constructor for custom weights that are not in the model catalog.
from pymss import MSSeparator, get_separation_logger
# init
separator = MSSeparator(
model_type='htdemucs',
model_path='path/to/model',
config_path='path/to/config',
device='cuda',
device_ids=[0],
output_format='wav',
use_tta=True,
store_dirs={
"vocals": "./output/vocals",
"other": None # None or missing this stem will result in no output file for this stem. This example will output the vocal's stem in ./output/vocals and ignoring the other(instrumental) stem. Making sure the key(s) match the config file.
},
audio_params={"wav_bit_depth": "FLOAT", "flac_bit_depth": "PCM_24", "mp3_bit_rate": "320k", "m4a_bit_rate": "192k", "m4a_aac_at_quality": 2}, # Can be omitted
logger=get_separation_logger(), # Can be omitted
debug=False, # Can be omitted
inference_params={
"batch_size": 4,
"overlap_size": 512,
"chunk_size": 1024,
"standardize": True,
"normalize": False
} # Can be omitted
)
# process all audio files in the folder
separator.process_folder('path/to/input_folder')
Manual Constructor Parameters
For a detailed explanation of every MSSeparator argument, see the MSSeparator parameter guide.
- model_type: The type of model, e.g., 'htdemucs'. Must be one of ['bs_roformer', 'mel_band_roformer', 'htdemucs', 'mdx23c', 'bandit', 'bandit_v2', 'scnet', 'apollo', 'vr']
- model_path: The path to the model file.
- config_path: The path to the configuration file.
- device: The type of device, default is 'auto'. Must be one of ['auto', 'cuda', 'mps', 'cpu']
- device_ids: List of device IDs, default is [0].
- output_format: The output audio format, default is 'wav'. Must be one of ['wav', 'flac', 'mp3', 'm4a']
- use_tta: Whether to use TTA, default is False. Using TTA will triple the processing time with a little bit improvement in quality.
- store_dirs: Storage directories, can be a single folder path or a dictionary with instrument keys.
- audio_params: Audio parameters including wav_bit_depth, flac_bit_depth, mp3_bit_rate, m4a_bit_rate, and m4a_aac_at_quality. Default is {"wav_bit_depth": "FLOAT", "flac_bit_depth": "PCM_24", "mp3_bit_rate": "320k", "m4a_bit_rate": "192k", "m4a_aac_at_quality": 2}.
- logger: Logger instance. Default is pymss.get_separation_logger()
- debug: Whether to enable debug mode, default is False.
- inference_params: Inference parameters including batch_size, overlap_size, chunk_size, standardize, normalize, and
cuda_attention_backend.standardizecontrols model input standardization and defaults to the model config'sinference.normalizevalue, orFalsewhen missing.normalizecontrols linked output peak normalization for all returned stems. Formodel_type='vr', supported keys arebatch_size,window_size,aggression,enable_tta,enable_post_process,post_process_threshold,high_end_process, and outputnormalize.
CUDA Attention Backend
RoFormer-family models default to cuDNN attention on CUDA when the installed PyTorch build exposes it, otherwise they use PyTorch's default SDPA path. Override with inference_params={"cuda_attention_backend": "auto"} if you want fallback probing. Valid values are auto, default, flash, cudnn, efficient, math, and xformers. auto tries cuDNN attention first, then PyTorch memory-efficient SDPA, then PyTorch default SDPA. xformers is optional and only used if installed locally; it is not a required dependency.
Apple Silicon MLX Backend
Use device='mlx' to run the Apple Silicon MLX backend:
separator = MSSeparator.from_model_name(
"bs_roformer_voc_hyperacev2",
download=True,
device="mlx",
output_format="wav",
store_dirs="results",
)
On Apple Silicon, pyproject.toml installs mlx>=0.31.0 for this backend. If MLX is missing or a non-VR backend fails, the model records _pymss_mlx_full_backend_error and falls back to Torch MPS. Advanced users can still override mps_model_backend and mps_model_compute_dtype through inference_params.
Model Compatibility
HTDemucs checkpoints whose config uses model: htdemucs and htdemucs.cac: true are supported through model_type='htdemucs'.
Legacy Demucs/TasNet .th weights can use model_type='legacy_demucs' or model_type='legacy_tasnet' without a MSST YAML config. The dependency-free legacy loader supports classic Demucs, v3 time-domain Demucs, ConvTasNet, CaC HDemucs, package-style HTDemucs, multi-frequency CaC HDemucs, and simple Demucs bag YAML files. DiffQ-quantized checkpoints and non-CaC/Wiener HDemucs still need a dedicated legacy loader.
UVR VR support is available for the supported UVR/VR series .pth weights. Use the catalog model name in the same CLI/API paths as other models. The output stems are read from the built-in VR model list, for example Vocals, Instrumental, No Echo, or Echo.
pymss infer 1_HP-UVR \
-i path/to/input_folder \
-o results \
--device auto \
--param batch_size=2 \
--param window_size=512 \
--param aggression=5
separator = MSSeparator.from_model_name(
"1_HP-UVR",
download=True,
device="auto",
output_format="wav",
store_dirs="results",
inference_params={
"batch_size": 2,
"window_size": 512,
"aggression": 5,
},
)
separator.process_folder("path/to/input_folder")
Hugging Face Configs
Some model configs downloaded from Hugging Face or MSST-WebUI use inference.num_overlap. This optimized pymss path uses inference.overlap_size instead. If the config only has num_overlap, add an explicit overlap_size or pass it through inference_params; otherwise pymss falls back to 50% overlap and inference will be much slower.
Recommended fast setting:
audio:
chunk_size: 480000
inference:
batch_size: 2
overlap_size: 24000 # 5% of chunk_size
RTX 5090 Benchmark
Measured on an NVIDIA GeForce RTX 5090 with PyTorch 2.9.1+cu128, CUDA 12.8, no TTA, one warmup and three measured runs.
| model | type | RTFx | 1-hour audio |
|---|---|---|---|
| BS-Roformer-HyperACE_v2_voc | bs_roformer | 231.83x | 15.5s |
| model_bs_roformer_ep_368_sdr_12.9628 | bs_roformer | 109.06x | 33.0s |
| logic_bs_roformer | bs_roformer | 159.71x | 22.5s |
| mel-band-roformer-deux | mel_band_roformer | 169.93x | 21.2s |
| Mel-Band-Roformer-big | mel_band_roformer | 194.05x | 18.6s |
| model_vocals_mdx23c_sdr_10.17 | mdx23c | 209.41x | 17.2s |
| HTDemucs4 | htdemucs | 200.52x | 18.0s |
| scnet_checkpoint_musdb18 | scnet | 356.85x | 10.1s |
| model_bandit_plus_dnr_sdr_11.47 | bandit | 122.76x | 29.3s |
| checkpoint-multi_state_dict | bandit_v2 | 112.33x | 32.0s |
| Apollo_LQ_MP3_restoration | apollo | 100.62x | 35.8s |
VR models were measured with batch_size=2, window_size=512, aggression=5, TTA off, post-processing off.
| VR model | RTFx | 1-hour audio |
|---|---|---|
| UVR-DeNoise-Lite | 243.62x | 14.8s |
| Harmonic_Noise_Separation_yxlllc | 221.22x | 16.3s |
| MGM_HIGHEND_v4 | 217.39x | 16.6s |
| MGM_LOWEND_A_v4 | 133.67x | 26.9s |
| MGM_MAIN_v4 | 118.56x | 30.4s |
| 11_SP-UVR-2B-32000-2 | 109.73x | 32.8s |
| 10_SP-UVR-2B-32000-1 | 109.03x | 33.0s |
| 12_SP-UVR-3B-44100 | 104.67x | 34.4s |
| MGM_LOWEND_B_v4 | 100.64x | 35.8s |
| 15_SP-UVR-MID-44100-1 | 99.00x | 36.4s |
| 16_SP-UVR-MID-44100-2 | 98.76x | 36.5s |
| 13_SP-UVR-4B-44100-1 | 97.78x | 36.8s |
| 14_SP-UVR-4B-44100-2 | 94.97x | 37.9s |
| 5_HP-Karaoke-UVR | 94.72x | 38.0s |
| 2_HP-UVR | 93.94x | 38.3s |
| UVR-De-Echo-Aggressive | 90.99x | 39.6s |
| UVR-DeNoise | 90.39x | 39.8s |
| UVR-De-Echo-Normal | 87.25x | 41.3s |
| UVR-DeReverb-aufr33-jarredou_4band_v4_ms_fullband | 86.70x | 41.5s |
| UVR-DeEcho-DeReverb | 86.58x | 41.6s |
| 3_HP-Vocal-UVR | 85.15x | 42.3s |
| 4_HP-Vocal-UVR | 84.23x | 42.7s |
| 1_HP-UVR | 84.06x | 42.8s |
| 17_HP-Wind_Inst-UVR | 82.92x | 43.4s |
| 6_HP-Karaoke-UVR | 81.81x | 44.0s |
| UVR-BVE-4B_SN-44100-1 | 81.54x | 44.2s |
| 9_HP2-UVR | 58.48x | 61.6s |
| 8_HP2-UVR | 57.23x | 62.9s |
| 7_HP2-UVR | 56.10x | 64.2s |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymss-2.0.7.tar.gz.
File metadata
- Download URL: pymss-2.0.7.tar.gz
- Upload date:
- Size: 724.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55da95865bd3f6aea49e8dbde4cb870dafd947f823877e0495bcd958289da7f1
|
|
| MD5 |
840d0e63be9d324f996cb071818c07e0
|
|
| BLAKE2b-256 |
af5801a64bbcf4783128bc554971fb6c4233c744c3359aa38cb0b9ffa27b64ab
|
Provenance
The following attestation bundles were made for pymss-2.0.7.tar.gz:
Publisher:
release.yml on pymss-project/pymss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymss-2.0.7.tar.gz -
Subject digest:
55da95865bd3f6aea49e8dbde4cb870dafd947f823877e0495bcd958289da7f1 - Sigstore transparency entry: 1885296801
- Sigstore integration time:
-
Permalink:
pymss-project/pymss@9bdcad4aa883ac4fa2047f0a68488d239173be38 -
Branch / Tag:
refs/tags/v2.0.7 - Owner: https://github.com/pymss-project
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9bdcad4aa883ac4fa2047f0a68488d239173be38 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pymss-2.0.7-py3-none-any.whl.
File metadata
- Download URL: pymss-2.0.7-py3-none-any.whl
- Upload date:
- Size: 758.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
664a509c0576ac488afe2ae4a3bf73fa1bec41d28e43361c4fe3cee5996a1f4a
|
|
| MD5 |
c41a230203e07440360dc0f5d6cb9af8
|
|
| BLAKE2b-256 |
fc13558baa151f380208e445a3b66e58b833625738ea15cf10a3fd517623da49
|
Provenance
The following attestation bundles were made for pymss-2.0.7-py3-none-any.whl:
Publisher:
release.yml on pymss-project/pymss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymss-2.0.7-py3-none-any.whl -
Subject digest:
664a509c0576ac488afe2ae4a3bf73fa1bec41d28e43361c4fe3cee5996a1f4a - Sigstore transparency entry: 1885296830
- Sigstore integration time:
-
Permalink:
pymss-project/pymss@9bdcad4aa883ac4fa2047f0a68488d239173be38 -
Branch / Tag:
refs/tags/v2.0.7 - Owner: https://github.com/pymss-project
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9bdcad4aa883ac4fa2047f0a68488d239173be38 -
Trigger Event:
push
-
Statement type: