A fork of so-vits-svc.
Project description
SoftVC VITS Singing Voice Conversion Fork
A fork of so-vits-svc
with realtime support and greatly improved interface. Based on branch 4.0
(v1) and the models are compatible.
Features not available in the original repo
- Realtime voice conversion (enhanced in v1.1.0)
- More accurate pitch estimation using CREPE
- GUI available
- Unified command-line interface (no need to run Python scripts)
- Ready to use just by installing with
pip
. - Automatically download pretrained base model and HuBERT model
- Code completely formatted with black, isort, autoflake etc.
- Volume normalization in preprocessing
- Other minor differences
Installation
One click easy installation
Creating a Virtual Environment
Install
Install this via pip (or your favourite package manager that uses pip):
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install -U so-vits-svc-fork
Update
Please update this package regularly to get the latest features and bug fixes.
pip install -U so-vits-svc-fork
Usage
Inference
GUI
GUI launches with the following command:
svcg
CLI
- Realtime (from microphone)
svc vc --model-path <model-path>
- File
svc --model-path <model-path> source.wav
Pretrained models are available on HuggingFace.
Notes
- If using WSL, please note that WSL requires additional setup to handle audio and the GUI will not work without finding an audio device.
- In real-time inference, if there is noise on the inputs, the HuBERT model will react to those as well. Consider using realtime noise reduction applications such as RTX Voice in this case.
Training
Google Colab
Local
Place your dataset like dataset_raw/{speaker_id}/**/{wav_file}.{any_format}
(subfolders are acceptable) and run:
svc pre-resample
svc pre-config
svc pre-hubert -fm dio
svc train
Notes
- Dataset audio duration per file should be <~ 10s or VRAM will run out.
- To change the f0 inference method to CREPE, replace
svc pre-hubert -fm dio
withsvc pre-hubert -fm crepe
. You may need to reduce--n-jobs
due to performance issues. - It is recommended to change the batch_size in
config.json
before thetrain
command to match the VRAM capacity. As tested, the default requires about 14 GB.
Further help
For more details, run svc -h
or svc <subcommand> -h
.
> svc -h
Usage: svc [OPTIONS] COMMAND [ARGS]...
so-vits-svc allows any folder structure for training data.
However, the following folder structure is recommended.
When training: dataset_raw/{speaker_name}/{wav_name}.wav
When inference: configs/44k/config.json, logs/44k/G_XXXX.pth
If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.
(The latest model will be automatically loaded.)
To train a model, run pre-resample, pre-config, pre-hubert, train.
To infer a model, run infer.
Options:
-h, --help Show this message and exit.
Commands:
clean Clean up files, only useful if you are using the default file structure
infer Inference
onnx Export model to onnx
pre-config Preprocessing part 2: config
pre-hubert Preprocessing part 3: hubert If the HuBERT model is not found, it will be...
pre-resample Preprocessing part 1: resample
train Train model If D_0.pth or G_0.pth not found, automatically download from hub.
train-cluster Train k-means clustering
vc Realtime inference from microphone
Contributors ✨
Thanks goes to these wonderful people (emoji key):
34j 💻 🤔 📖 💡 🚇 🚧 👀 ⚠️ ✅ 📣 🐛 |
GarrettConway 💻 🐛 📖 |
BlueAmulet 🤔 💬 |
ThrowawayAccount01 🐛 |
緋 📖 🐛 |
Lordmau5 🐛 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for so_vits_svc_fork-1.3.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7537076abd125bd10a22a1c8bdf9e3ea161665c39c096fd4d44ae413c556b10 |
|
MD5 | 5ca44447ea88275fd0b15d873909cd36 |
|
BLAKE2b-256 | 65f9f37e04a333e95b3d50706b62d9db60f63f50b9d3e0efd5edcb19c81cd57e |