Skip to main content

MLX implementation of BigVGAN

Project description

MLX BigVGAN

An MLX-adapted implementation of BigVGAN.

Features

  • BigVGAN Integration: Fully integrates the original BigVGAN model with MLX for enhanced compatibility and performance.
  • Flexible Conversion: Includes tools to convert the original BigVGAN PyTorch weights to MLX format.
  • Customizable Configurations: Supports various configurations for kernel sizes, dilation rates, and activation functions (e.g., snake, snakebeta).
  • Pretrained Models: Easily load pretrained BigVGAN models from the Hugging Face Hub.

Installation

pip install mlx-bigvgan

Usage

1. Load Pretrained Model

from mlx_bigvgan import BigVGAN

model = BigVGAN.from_pretrained("wyrom/mlx-bigvgan_v2_24khz_100band_256x")
model.eval()
mx.eval(model.parameters())

2. Generate Audio

import numpy as np
import mlx.core as mx
from mlx_bigvgan import log_mel_spectrogram, load_audio
# Load audio file
audio = load_audio("path/to/audio.wav")
h = model.config
# Compute log-mel spectrogram
mel_spec = log_mel_spectrogram(audio,
    n_fft=h.n_fft,
    n_mels=h.num_mels,
    sample_rate=h.sampling_rate,
    hop_length=h.hop_size,
    fmin=h.fmin,
    fmax=h.fmax,
    padding=(h.n_fft - h.hop_size) // 2,
    mel_norm="slaney",
    mel_scale="slaney",
    power=1.0,
)
# reshape to [B(1), T, C_mels]
mel_spec = mx.expand_dims(mel_spec, 0)
# Generate waveform
waveform = model(mel_spec) # [B(1), T, 1]
# Reshape to [T, 1]
waveform_float = waveform.squeeze(0)

# Convert to int16
waveform_int16 = mx.clip(waveform_float * 32767, -32768, 32767).astype(mx.int16)

# save to wav
import soundfile as sf

sf.write("output.wav", waveform_int16, h.sampling_rate, "PCM_16")

3. Convert Original BigVGAN Weights to MLX Format

You can convert the original BigVGAN weights to MLX format using the provided script.

repo_id is the Hugging Face model ID of the original BigVGAN model you want to convert.

See nvidia/BigVGAN for move pretrained models.

python -m mlx_bigvgan.convert --repo_id nvidia/bigvgan_v2_xxx  --output_dir mlx_models

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_bigvgan-0.1.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_bigvgan-0.1.1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file mlx_bigvgan-0.1.1.tar.gz.

File metadata

  • Download URL: mlx_bigvgan-0.1.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.22

File hashes

Hashes for mlx_bigvgan-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5a519711ca35bfae99d61bfb476c887ab50e231b5dc3b55a5624ec3ad013dc0b
MD5 4b8862877506a761aa88510f4156f474
BLAKE2b-256 db1a56a1c0ed2193424ff007b2e0351d8598549bbfe9ad9eb3019151aaedd6af

See more details on using hashes here.

File details

Details for the file mlx_bigvgan-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mlx_bigvgan-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 487c732fa77b881b22a5988a1a24cd4cee3c69c793cb6ab86022f2250345c5f6
MD5 1c0956981e98b38e801ffcdc698d7c83
BLAKE2b-256 d9bce99c66b8105a38704b6253856f522ec2c43339e93d0cf7288dfe4d572922

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page