TTS with RVC pipeline (ONNX Version)
Project description
TTS-with-RVC-ONNX 0.1.9
TTS-with-RVC-ONNX (Text-to-Speech with RVC using ONNX) is a package designed to enhance the capabilities of text-to-speech (TTS) systems by introducing an RVC module running on the ONNX Runtime. The package enables users to not only convert text into speech but also personalize and customize the voice output according to their preferences with RVC support, optimized for various hardware backends (DirectML, CUDA, CPU).
ONNX Runtime is used for RVC inference, potentially leveraging hardware acceleration (DirectML on Windows/AMD, CUDA on NVIDIA). PyTorch is required only for specific F0 predictors (rmvpe).
It may contain bugs. Report an issue in case of error.
Release notes
0.1.9 - April 10, 2025: Current ONNX Branch Sync
- Synced RVC parameters with main branch 0.1.9 (
rms_mix_rate,protect,filter_radius,resample_sr,file_index2,verbose). - Added support for F0 predictors:
rmvpe(using ONNX),pm,dio,harvest. - Fixed F0 length mismatch issue and implemented correct audio padding.
- Added
set_devicemethod to switch ONNX Runtime providers. - Updated dependencies and ONNX Runtime selection.
(Based on main branch 0.1.9)
0.1.6 - Initial ONNX support.
Prerequisites
You must have Python >= 3.8 and <= 3.12 installed (3.12 is recommended). You must have ONNX Runtime compatible hardware/drivers if using GPU acceleration (DirectML for AMD on Windows, CUDA for NVIDIA). The CPU provider works generally.
- PyTorch is required only if using
f0_method='rmvpe'. - FFmpeg must be installed and accessible in your system's PATH or placed in the script's directory. Download from ffmpeg.org.
Installation
-
Install the package using pip: CPU Version:
pip install tts-with-rvc-onnx
CUDA version:
pip install tts-with-rvc-onnx[cuda]
DML version (recommedned for AMD):
pip install tts-with-rvc-onnx[dml]
-
Ensure FFmpeg is installed and accessible (see Prerequisites).
How it Works
- Text-to-Speech (TTS): Uses
edge-ttsto convert input text into speech, saved as a temporary audio file in thetmp_directory. - RVC (ONNX): With the
.onnxfile provided, the RVC module (via ONNX Runtime) reads the temporary audio file, processes it (feature extraction, F0, conversion, index lookup), and generates a new audio file saved inoutput_directorywith the voice replaced.
Usage
TTS-with-RVC-ONNX has a class called TTS_RVC.
Constructor Parameters:
model_path(str): Required. Path to your.onnxRVC model file.
And optional parameters:
-
voice(str): Voice fromedge-ttslist (default:"ru-RU-DmitryNeural"). -
device(str): ONNX Runtime provider ("dml","cuda:0","cpu", etc.). Defaults to"dml". -
tmp_directory(str): Path to directory for temporary TTS files (default: system temp folder). -
output_directory(str): Directory for saving final voiced audio (default:"temp/"). -
index_path(str): Path to the Faiss.indexfile for voice adjustments (default:""). -
f0_method(str): Method for calculating pitch. Available:'rmvpe','pm','harvest','dio','crepe'. Defaults to"rmvpe". -
sampling_rate(int): Target sample rate of the RVC model (default:40000). -
hop_size(int): Hop size of the RVC model (default:512).
Deprecated:
input_directory: Usetmp_directoryinstead.
Initialization Example:
from tts_with_rvc_onnx import TTS_RVC
tts = TTS_RVC(model_path="models/YourModel.onnx",
index_path="logs/YourIndex.index",
f0_method="rmvpe",
device="dml") # Or "cuda:0", "cpu"
tts.get_voices() is disabled indefinitely due to the problems
Next, set the voice for TTS with tts.set_voice() function:
tts.set_voice("ru-RU-DmitryNeural")
Setting the appropriate language is necessary if you are using other languages for voiceovers!
And final step is calling tts (the __call__ method) to generate and replace voice:
path = tts(text="Привет, мир!", pitch=6, index_rate=0.50)
__call__ Parameters:
-
text(str): Required. Text for TTS. -
pitch(int, optional): Pitch change (transpose) for RVC in semitones. Negative values compatible. Default:0. -
tts_rate(int, optional): Extra rate of speech for Edge TTS in percentage (+/-). Default:0. -
tts_volume(int, optional): Extra volume of speech for Edge TTS in percentage (+/-). Default:0. -
tts_pitch(int, optional): Extra pitch of TTS-generated audio in Hz (+/-). Not recommended. Default:0. -
output_filename(str, optional): Name for the output file. IfNone, a unique name is generated. Default:None. -
index_rate(float, optional): Blending rate between original and indexed voice conversion (0 to 1). Default:0.75. -
f0method(str, optional): F0 extraction method for this specific call, overrides the instance default:'rmvpe','pm','harvest','dio'. Default uses instance setting. -
file_index2(str, optional): Path to secondary index file for RVC. Default:"". -
filter_radius(int, optional): Median filter radius for pitch results. Values>=3reduce breathiness. Default:3. -
resample_sr(int, optional): Sample rate to resample final audio to.0means use model's sample rate. Default:0. -
rms_mix_rate(float, optional): Volume envelope scaling (0-1). Lower values mimic original volume more closely. Default:0.25. -
protect(float, optional): Protection for voiceless consonants and breaths (0-0.5). Lower values increase protection.0.5disables. Default:0.33. -
verbose(bool, optional): Enable verbose logging for RVC conversion. Default:False.
(Note: is_half parameter is removed as precision is handled by ONNX Runtime.)
Example of usage
A simple example for voicing text:
import os
from tts_with_rvc_onnx import TTS_RVC
# from playsound import playsound # Optional
# --- Configuration ---
model_file = "models/DenVot.onnx"
index_file = "logs/added_IVF1749_Flat_nprobe_1.index" # Optional
temp_dir = "audio_temp"
output_dir = "audio_output"
os.makedirs(temp_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)
# --- Initialize ---
try:
tts = TTS_RVC(
model_path=model_file,
index_path=index_file,
tmp_directory=temp_dir,
output_directory=output_dir,
device="dml", # Or 'cuda:0', 'cpu'
f0_method="rmvpe"
)
tts.set_voice("ru-RU-DmitryNeural")
# --- Generate ---
path = tts(text="Привет, мир!", pitch=6, index_rate=0.9)
print(f"Audio saved to: {path}")
# --- Play (Optional) ---
# playsound(path)
except Exception as e:
print(f"An error occurred: {e}")
Text parameters
There are some text parameters processor for integration issues such as adding GPT module.
You can process them using process_args in TTS_RVC class:
-
--tts-rate (value): TTS parameter to edit the speech rate. -
--tts-volume (value): TTS parameter to edit the speech volume. May have limited effect due to RVC volume normalization. -
--tts-pitch (value): TTS parameter to edit the pitch of TTS generated audio. Not recommended. -
--rvc-pitch (value): RVC parameter to edit the pitch of the output audio (semitones).
Now the principle of work:
from tts_with_rvc_onnx import TTS_RVC
tts = TTS_RVC(model_path="models/YourModel.onnx", device="dml", tmp_directory="temp/")
message_with_args = "This is a test --rvc-pitch -2 and slower --tts-rate -10"
# This method returns arguments and original text without these text parameters
args, clean_message = tts.process_args(message_with_args)
# args = [-10, 0, 0, -2] # [tts_rate, tts_volume, tts_pitch, rvc_pitch]
# clean_message = "This is a test and slower"
# Use extracted arguments for generation:
path = tts(clean_message, tts_rate=args[0],
tts_volume=args[1],
tts_pitch=args[2],
pitch=args[3])
The args variable contains a list with the following structure:
args[0] - TTS Rate
args[1] - TTS Volume
args[2] - TTS Pitch
args[3] - RVC pitch
Methods
-
set_voice(voice): Changes the Edge TTS voice. -
set_index_path(index_path): Updates the path to the Faiss.indexfile. -
set_device(device): Changes the ONNX Runtime provider (e.g., 'dml', 'cuda:0', 'cpu') and reinitializes the backend. -
set_output_directory(directory_path): Sets the default directory for saving output files. -
process_args(text): Extracts text parameters (see above). -
voiceover_file(input_path, ...): Applies RVC voice conversion directly to an existing audio file (accepts same RVC parameters as__call__).
Exceptions
RuntimeError: Failed to load ONNX model...: Check.onnxmodel path and integrity. Ensure correctonnxruntime-*package is installed.RuntimeError: Failed to initialize ONNX backend...: Check ONNX Runtime installation, drivers (CUDA/DirectML), or model compatibility.FileNotFoundError: Input audio,.onnxmodel,.indexfile, or required predictor models (rmvpe.onnx) not found.ValueError: Dimension mismatch...: Faiss.indexfile dimension doesn't matchContentVecoutput dimension (e.g., 256 vs 768). Use a compatible index.RuntimeError: Failed to load audio...: Ensure FFmpeg is installed and accessible in PATH.- Errors during F0 computation: Check if required libraries (
parselmouth,pyworld,torchfor rmvpe) are installed correctly.
Acknowledgements
- RVC Project - For the original RVC model and concepts.
License
MIT License
Authors
- Atm4x (Artem Dikarev)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tts_with_rvc_onnx-0.1.9.2.tar.gz.
File metadata
- Download URL: tts_with_rvc_onnx-0.1.9.2.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abfd230abdf6180986c6383365d640cfc8b390733968aba6939f465771ad8edf
|
|
| MD5 |
b245785a429b64cf24e333b0d5618959
|
|
| BLAKE2b-256 |
4bcf532a5bcb60e44aec9136904140b08a46dc3f035bcf94e13e5cdccc62a670
|
File details
Details for the file tts_with_rvc_onnx-0.1.9.2-py3-none-any.whl.
File metadata
- Download URL: tts_with_rvc_onnx-0.1.9.2-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4375bc30f901ad0ae981741b14300338b6decaee155c17a7d19c4a4ec71cc3f
|
|
| MD5 |
13549fa68c855b1d7a65252f4cf7a5eb
|
|
| BLAKE2b-256 |
24a7da94150d38965283b5a1ad662540d677cbc240f01641f877ea03338d2f2b
|