Skip to main content

A fork of so-vits-svc.

Project description

SoftVC VITS Singing Voice Conversion Fork

简体中文

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

A fork of so-vits-svc with realtime support and greatly improved interface. Based on branch 4.0 (v1) and the models are compatible.

Features not available in the original repo

  • Realtime voice conversion (enhanced in v1.1.0)
  • Integrates QuickVC
  • Fixed misuse of ContentVec in the original repository.^c
  • More accurate pitch estimation using CREPE.
  • GUI and unified CLI available
  • ~2x faster training
  • Ready to use just by installing with pip.
  • Automatically download pretrained models.
  • Code completely formatted with black, isort, autoflake etc.

Installation

One click easy installation

Download .bat

Manual installation

Creating a virtual environment

Windows:

py -3.10 -m venv venv
venv\Scripts\activate

Linux/MacOS:

python3.10 -m venv venv
source venv/bin/activate

Anaconda:

conda create -n so-vits-svc-fork python=3.10 pip
conda activate so-vits-svc-fork

Installing without creating a virtual environment may cause a PermissionError if Python is installed in Program Files, etc.

Install this via pip (or your favourite package manager that uses pip):

python -m pip install -U pip setuptools wheel
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -U so-vits-svc-fork
Notes
  • If no GPU is available, simply remove pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118.
  • If you are using an AMD GPU on Linux, replace --index-url https://download.pytorch.org/whl/cu118 with --index-url https://download.pytorch.org/whl/rocm5.4.2. AMD GPUs are not supported on Windows (#120).
  • If fairseq raises an error:
    • If it prompts Microsoft C++ Build Tools is not installed. please install it.
    • If it prompts that some dll is missing, reinstalling Microsoft Visual C++ 2022 and Windows SDK may help.

Update

Please update this package regularly to get the latest features and bug fixes.

pip install -U so-vits-svc-fork

Usage

Inference

GUI

GUI

GUI launches with the following command:

svcg

CLI

  • Realtime (from microphone)
svc vc
  • File
svc infer source.wav

Pretrained models are available on HuggingFace.

Notes

  • If using WSL, please note that WSL requires additional setup to handle audio and the GUI will not work without finding an audio device.
  • In real-time inference, if there is noise on the inputs, the HuBERT model will react to those as well. Consider using realtime noise reduction applications such as RTX Voice in this case.

Training

Before training

  • If your dataset has BGM, please remove the BGM using software such as Ultimate Vocal Remover. 3_HP-Vocal-UVR.pth or UVR-MDX-NET Main is recommended. ^1
  • If your dataset is a long audio file with a single speaker, use svc pre-split to split the dataset into multiple files (using librosa).
  • If your dataset is a long audio file with multiple speakers, use svc pre-sd to split the dataset into multiple files (using pyannote.audio). Further manual classification may be necessary due to accuracy issues. If speakers speak with a variety of speech styles, set --min-speakers larger than the actual number of speakers. Due to unresolved dependencies, please install pyannote.audio manually: pip install pyannote-audio.

Cloud

Open In Colab Open In Paperspace Paperspace Referral[^p]

If you do not have access to a GPU with more than 10 GB of VRAM, the free plan of Google Colab is recommended for light users and the Pro/Growth plan of Paperspace is recommended for heavy users. Conversely, if you have access to a high-end GPU, the use of cloud services is not recommended.

[^p]: If you register a referral code and then add a payment method, you may save about $5 on your first month's monthly billing. Note that both referral rewards are Paperspace credits and not cash. It was a tough decision but inserted because debugging and training the initial model requires a large amount of computing power and the developer is a student.

Local

Place your dataset like dataset_raw/{speaker_id}/**/{wav_file}.{any_format} (subfolders and non-ASCII filenames are acceptable) and run:

svc pre-resample
svc pre-config
svc pre-hubert
svc train -t

Notes

  • Dataset audio duration per file should be <~ 10s.
  • It is recommended to increase the batch_size as much as possible in config.json before the train command to match the VRAM capacity. Setting batch_size to auto-{init_batch_size}-{max_n_trials} (or simply auto) will automatically increase batch_size until OOM error occurs, but may not be useful in some cases.
  • To use CREPE, replace svc pre-hubert with svc pre-hubert -fm crepe.
  • To use QuickVC, replace svc pre-config with svc pre-config -t quickvc.
  • Silence removal and volume normalization are automatically performed (as in the upstream repo) and are not required.

Further help

For more details, run svc -h or svc <subcommand> -h.

> svc -h
Usage: svc [OPTIONS] COMMAND [ARGS]...

  so-vits-svc allows any folder structure for training data.
  However, the following folder structure is recommended.
      When training: dataset_raw/{speaker_name}/**/{wav_name}.{any_format}
      When inference: configs/44k/config.json, logs/44k/G_XXXX.pth
  If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.
  (The latest model will be automatically loaded.)
  To train a model, run pre-resample, pre-config, pre-hubert, train.
  To infer a model, run infer.

Options:
  -h, --help  Show this message and exit.

Commands:
  clean          Clean up files, only useful if you are using the default file structure
  infer          Inference
  onnx           Export model to onnx
  pre-config     Preprocessing part 2: config
  pre-hubert     Preprocessing part 3: hubert If the HuBERT model is not found, it will be...
  pre-resample   Preprocessing part 1: resample
  pre-sd         Speech diarization using pyannote.audio
  pre-split      Split audio files into multiple files
  train          Train model If D_0.pth or G_0.pth not found, automatically download from hub.
  train-cluster  Train k-means clustering
  vc             Realtime inference from microphone

External Links

Video Tutorial

Contributors ✨

Thanks goes to these wonderful people (emoji key):

34j
34j

💻 🤔 📖 💡 🚇 🚧 👀 ⚠️ 📣 🐛
GarrettConway
GarrettConway

💻 🐛 📖
BlueAmulet
BlueAmulet

🤔 💬 💻
ThrowawayAccount01
ThrowawayAccount01

🐛
緋

📖 🐛
Lordmau5
Lordmau5

🐛 💻
DL909
DL909

🐛
Satisfy256
Satisfy256

🐛
Pierluigi Zagaria
Pierluigi Zagaria

📓
ruckusmattster
ruckusmattster

🐛
Desuka-art
Desuka-art

🐛
heyfixit
heyfixit

📖
Nerdy Rodent
Nerdy Rodent

📹
谢宇
谢宇

📖
ColdCawfee
ColdCawfee

🐛
sbersier
sbersier

🤔 📓

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

so_vits_svc_fork-3.9.2.tar.gz (73.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

so_vits_svc_fork-3.9.2-py3-none-any.whl (86.5 kB view details)

Uploaded Python 3

File details

Details for the file so_vits_svc_fork-3.9.2.tar.gz.

File metadata

  • Download URL: so_vits_svc_fork-3.9.2.tar.gz
  • Upload date:
  • Size: 73.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.4.1 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.11

File hashes

Hashes for so_vits_svc_fork-3.9.2.tar.gz
Algorithm Hash digest
SHA256 479730b5f85d7bef8e1731f46c9568850b6b7aab7b12dfddf7e04ac18ba542a2
MD5 1b908064f5c1444c01d0e131aefaa6a2
BLAKE2b-256 9733c3adf3e9b347210a09f7ab9ee141be89bef60780e8beaf13c3c6dc4120a5

See more details on using hashes here.

File details

Details for the file so_vits_svc_fork-3.9.2-py3-none-any.whl.

File metadata

  • Download URL: so_vits_svc_fork-3.9.2-py3-none-any.whl
  • Upload date:
  • Size: 86.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.4.1 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.11

File hashes

Hashes for so_vits_svc_fork-3.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3a9da8f883e512ec8d7c963299c99d09144b1053b63059357d20176c7224cb4d
MD5 2a13bfa8621e548f02c866fe65eca920
BLAKE2b-256 bcca46678406382ca8962abba8b9ce934bc3a9cb4a7cdc809f3199796af529da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page