Skip to main content

VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate

Project description

VoiceStar: Robust, Duration-Controllable TTS that can Extrapolate

VoiceStar is a robust, duration-controllable TTS model with support for test-time extrapolation, meaning it can generate speech longer than the duration it was trained on.

Features

  • Duration control: Specify the duration of the generated speech.
  • Zero-shot voice cloning: Clone any voice with a short reference audio clip (demo video).

Coming soon: research paper (ETA: 7 April 2025 - 14 April 2025)

Quick Start

Install

pip install voicestar

Make sure you also have espeak-ng installed.

Note: If you run into issues installing VoiceStar with uv, try installing it with pip instead.

Usage

Basic usage:

voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too! And you know what? this audio is 8 seconds long." --target-duration 8

Please refer to the CLI and Python API documentation below for more advanced usage.

Training

Please refer to the training docs for more information.

Inference

CLI

voicestar --reference-speech "./demo/5895_34622_000026_000002.wav" --target-text "I cannot believe that the same model can also do text to speech synthesis too!"

View all available options:

voicestar --help

Python API

from voicestar import VoiceStar

# Initialize the model
model = VoiceStar()

# Generate speech from text
audio = model.generate("I cannot believe that the same model can also do text to speech synthesis too!")
audio.save("output.wav")

License

The code in this repo is licensed under the MIT license. The pretrained model weights available on Hugging Face are licensed under the CC-BY-4.0 license.

This repository may contain third-party software which may be licensed under different licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicestar-0.1.0.tar.gz (121.8 kB view details)

Uploaded Source

File details

Details for the file voicestar-0.1.0.tar.gz.

File metadata

  • Download URL: voicestar-0.1.0.tar.gz
  • Upload date:
  • Size: 121.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for voicestar-0.1.0.tar.gz
Algorithm Hash digest
SHA256 799d716958b5bb301b1275673560e25590eb318f504ba1e5d203ed82a49776bd
MD5 bc05cc62788f9b78c60d82fc32047d33
BLAKE2b-256 d798a558d70a7871cd32973c1cf4add34eed87d4c614dfd49c5234e1dd33710c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page