VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate

Project description

VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate

VoiceStar is a robust, duration-controllable TTS model with support for test-time extrapolation, meaning it can generate speech longer than the duration it was trained on.

Features

Duration control: Specify the duration of the generated speech.
Zero-shot voice cloning: Clone any voice with a short reference audio clip (demo video).

Coming soon: research paper (ETA: 7 April 2025 - 14 April 2025)

Quick Start

Install

pip install voicestar

Make sure you also have espeak-ng installed.

Note: If you run into issues installing VoiceStar with uv, try installing it with pip instead.

Usage

Basic usage:

voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too! And you know what? this audio is 8 seconds long." --target-duration 8

Please refer to the CLI and Python API documentation below for more advanced usage.

Training

Please refer to the training docs for more information.

Inference

CLI

voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too!"

View all available options:

voicestar --help

Python API

from voicestar import VoiceStar

# Initialize the model
model = VoiceStar()

# Generate speech from text
audio = model.generate("I cannot believe that the same model can also do text to speech synthesis too!")
audio.save("output.wav")

License

The code in this repo is licensed under the MIT license. The pretrained model weights available on Hugging Face are licensed under the CC-BY-4.0 license.

This repository may contain third-party software which may be licensed under different licenses.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Apr 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicestar-0.1.0.tar.gz (121.8 kB view details)

Uploaded Apr 8, 2025 Source

File details

Details for the file voicestar-0.1.0.tar.gz.

File metadata

Download URL: voicestar-0.1.0.tar.gz
Upload date: Apr 8, 2025
Size: 121.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for voicestar-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`799d716958b5bb301b1275673560e25590eb318f504ba1e5d203ed82a49776bd`
MD5	`bc05cc62788f9b78c60d82fc32047d33`
BLAKE2b-256	`d798a558d70a7871cd32973c1cf4add34eed87d4c614dfd49c5234e1dd33710c`

See more details on using hashes here.

voicestar 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate

Features

Quick Start

Install

Usage

Training

Inference

CLI

Python API

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes