Skip to main content

MLX implementation of RMVPE (Robust Model for Vocal Pitch Estimation) for Apple Silicon

Project description

MLX-RMVPE

MLX implementation of RMVPE (Robust Model for Vocal Pitch Estimation) for Apple Silicon.

This is the F0 extraction component for RVC-MLX, a native Apple Silicon implementation of Retrieval-based Voice Conversion.

What is RMVPE?

RMVPE extracts fundamental frequency (F0) from audio, essential for preserving pitch/melody in voice conversion:

Input Audio (16kHz) → RMVPE → F0 Contour (Hz) → RVC Decoder → Converted Voice

Unlike simpler methods (CREPE, pYIN), RMVPE is specifically designed for polyphonic music, making it ideal for singing voice conversion where background music may be present.

Installation

uv pip install mlx-rmvpe

For development:

git clone https://github.com/lexandstuff/mlx-rmvpe.git
cd mlx-rmvpe
uv pip install -e .

Quick Start

import librosa
from mlx_rmvpe import RMVPE

# Load model (auto-downloads weights from HuggingFace)
model = RMVPE.from_pretrained()

# Load audio at 16kHz
audio, sr = librosa.load("singing.wav", sr=16000, mono=True)

# Extract F0
f0 = model.infer_from_audio(audio)

print(f"Audio: {len(audio)/16000:.2f}s -> F0: {f0.shape[0]} frames at 100fps")
print(f"Pitch range: {f0[f0 > 0].min():.1f} - {f0[f0 > 0].max():.1f} Hz")

Manual Weight Loading

If you prefer to manage weights yourself:

from huggingface_hub import hf_hub_download
from mlx_rmvpe import RMVPE

# Download weights
weights_path = hf_hub_download(
    repo_id="lexandstuff/mlx-rmvpe",
    filename="rmvpe.safetensors"
)

# Load model manually
model = RMVPE()
model.load_weights(weights_path)
model.eval()
Converting weights from PyTorch (advanced)

If you need to convert from PyTorch yourself:

# Download original PyTorch weights
wget -O weights/rmvpe.pt \
  "https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/rmvpe.pt"

# Convert to MLX format
python scripts/convert_weights.py \
  --pytorch_ckpt weights/rmvpe.pt \
  --mlx_ckpt weights/rmvpe.safetensors

See IMPLEMENTATION_NOTES.md for details.

API Reference

RMVPE

RMVPE(hop_length: int = 160)

Class Methods:

Method Description
from_pretrained(repo_id, filename, weights_path) Load pretrained model from HuggingFace
load_weights(path) Load weights from SafeTensors file
infer_from_audio(audio, sample_rate, threshold) Extract F0 from audio
mel_spectrogram(audio, ...) Compute mel spectrogram
decode(hidden, threshold) Decode pitch probabilities to Hz

Parameters:

Parameter Default Description
hop_length 160 Hop length for mel spectrogram (160 = 100fps at 16kHz)
threshold 0.03 Voicing threshold (frames below this are marked unvoiced)

Input:

  • Audio waveform at 16kHz, shape (samples,) or (batch, samples)

Output:

  • F0 in Hz, shape (frames,) where frames = samples / hop_length
  • Unvoiced frames have F0 = 0

RVC Integration

In the RVC voice conversion pipeline:

# 1. Extract content features with ContentVec
features = contentvec_model(audio)["x"]  # (1, T, 768) at 50fps

# 2. Extract pitch with RMVPE
f0 = rmvpe_model.infer_from_audio(audio)  # (T*2,) at 100fps

# 3. Interpolate F0 to match ContentVec frame rate
f0_interp = interpolate_f0(f0, target_len=features.shape[1])

# 4. Generate converted audio with RVC synthesizer
output = rvc_synthesizer(features, f0_interp, speaker_id)

Technical Details

Architecture

RMVPE uses a Deep U-Net with BiGRU layers:

Mel Spectrogram (128 mels)
    ↓
Encoder (5 layers, 16→32→64→128→256 channels)
    ↓
Intermediate (4 layers, 256→512 channels)
    ↓
Decoder (5 layers with skip connections)
    ↓
CNN (16→3 channels)
    ↓
BiGRU (384→512)
    ↓
Linear (512→360 pitch bins)
    ↓
Sigmoid → Pitch Probabilities

Pitch Decoding

The model outputs 360 pitch bins (covering ~32 Hz to ~1975 Hz). Decoding:

  1. Find peak bin via argmax
  2. Local averaging over 9-bin window
  3. Convert cents to Hz: f0 = 10 * (2 ** (cents / 1200))
  4. Apply voicing threshold

Frame Rate

  • Sample rate: 16,000 Hz
  • Hop length: 160 samples
  • Frame rate: 100 fps

This is 2x faster than ContentVec (50 fps), providing higher temporal resolution for pitch.

Validation

This implementation produces numerically similar outputs to the PyTorch reference:

Metric Value
Mean F0 difference 1.29 Hz
Correlation >0.99

See IMPLEMENTATION_NOTES.md for validation methodology.

Development

Project Structure

mlx-rmvpe/
├── mlx_rmvpe/
│   ├── __init__.py
│   ├── rmvpe.py          # Main RMVPE class with from_pretrained()
│   └── model.py          # DeepUnet, BiGRU, E2E model architecture
├── scripts/
│   └── convert_weights.py # PyTorch → SafeTensors conversion
├── tests/
│   └── test_rmvpe.py
├── IMPLEMENTATION_NOTES.md
└── README.md

Running Tests

pytest tests/

Publishing to PyPI

# Build
uv build

# Upload
uv publish

License

MIT

Acknowledgments

  • RMVPE - Original implementation
  • RVC - Voice conversion pipeline
  • MLX - Apple's machine learning framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_rmvpe-0.1.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_rmvpe-0.1.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file mlx_rmvpe-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_rmvpe-0.1.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for mlx_rmvpe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f82e961b5652c3adf3ddc0d6f449101fcdf4f8d8d906da61cfd9c6887228f90
MD5 2c1dfe1e713f8eafd8a84db53aaedfbf
BLAKE2b-256 d1af849eaaeabad94f5a3acbca257b1ab47aa4be0fb106a3e91ae9013ae53a58

See more details on using hashes here.

File details

Details for the file mlx_rmvpe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_rmvpe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.0

File hashes

Hashes for mlx_rmvpe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8816b6e9634b11e1e6252a0b683b9152445072048f4eca6f34808d0cfe2fbc4b
MD5 a81a023f094207d46445815c01c30639
BLAKE2b-256 faeec7ab4e6c23525ef47061dd5a6d1686ab2f9693f1a4e1bc81bea35088a4c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page