VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate
Project description
VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate
VoiceStar is a robust, duration-controllable TTS model with support for test-time extrapolation, meaning it can generate speech longer than the duration it was trained on.
Features
- Duration control: Specify the duration of the generated speech.
- Zero-shot voice cloning: Clone any voice with a short reference audio clip (demo video).
Coming soon: research paper (ETA: 7 April 2025 - 14 April 2025)
Quick Start
Install
pip install voicestar
Make sure you also have espeak-ng installed.
Note: If you run into issues installing VoiceStar with uv, try installing it with pip instead.
Usage
Basic usage:
voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too! And you know what? this audio is 8 seconds long." --target-duration 8
Please refer to the CLI and Python API documentation below for more advanced usage.
Training
Please refer to the training docs for more information.
Inference
CLI
voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too!"
View all available options:
voicestar --help
Python API
from voicestar import VoiceStar
# Initialize the model
model = VoiceStar()
# Generate speech from text
audio = model.generate("I cannot believe that the same model can also do text to speech synthesis too!")
audio.save("output.wav")
License
The code in this repo is licensed under the MIT license. The pretrained model weights available on Hugging Face are licensed under the CC-BY-4.0 license.
This repository may contain third-party software which may be licensed under different licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file voicestar-0.1.0.tar.gz.
File metadata
- Download URL: voicestar-0.1.0.tar.gz
- Upload date:
- Size: 121.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
799d716958b5bb301b1275673560e25590eb318f504ba1e5d203ed82a49776bd
|
|
| MD5 |
bc05cc62788f9b78c60d82fc32047d33
|
|
| BLAKE2b-256 |
d798a558d70a7871cd32973c1cf4add34eed87d4c614dfd49c5234e1dd33710c
|