An Open Source text-to-speech system built by inverting Whisper (fork of WhisperSpeech)
Project description
🎙️ WhisperSpeech2
An Open Source text-to-speech system built by inverting Whisper. This is a fork of WhisperSpeech optimized for inference that introduces ⚡"cuda graphs"⚡ for faster inference.
🚀 Installation
pip install whisperspeech2
CUDA libraries if using an Nvidia GPU (tested with CUDA 12.8).
✨ Available Models
You can mix and match the models for different quality and compute requirements. See the WhisperSpeech Hugging Face repository.
S2A Models (Semantic to Acoustic)
| Model | Reference |
|---|---|
| Tiny | WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model |
| Base | WhisperSpeech/WhisperSpeech:s2a-q4-base-en+pl.model |
| Small | WhisperSpeech/WhisperSpeech:s2a-q4-small-en+pl.model |
| HQ Fast | WhisperSpeech/WhisperSpeech:s2a-q4-hq-fast-en+pl.model |
| v1.1 Small | WhisperSpeech/WhisperSpeech:s2a-v1.1-small-en+pl.model |
T2S Models (Text to Semantic)
| Model | Reference |
|---|---|
| Tiny | WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model |
| Base | WhisperSpeech/WhisperSpeech:t2s-base-en+pl.model |
| Small | WhisperSpeech/WhisperSpeech:t2s-small-en+pl.model |
| Fast Small | WhisperSpeech/WhisperSpeech:t2s-fast-small-en+pl.model |
| Fast Medium | WhisperSpeech/WhisperSpeech:t2s-fast-medium-en+pl+yt.model |
| HQ Fast | WhisperSpeech/WhisperSpeech:t2s-hq-fast-en+pl.model |
Benchmark (no cuda graph)
Benchmark (with cuda graph)
People with Nvidia GPUs can set the "use_cuda_graph" parameter to "true" for fastger processing and less VRAM usage.
Examples
See the examples/ directory for more usage examples including GUI applications and streaming playback.
Thanks/Shout Outs
Thanks to Jakub for the inspiration.
The fine folks at the Dia2 for forcing me to learn about cuda graph. Go check theirs out too.
BIG FUCK YOU
And finally a big "fuck you" to Microsoft for installing shit on my computer that I don't need and/or want for years.
Your pathetic VibeVoice project just got pwned!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisperspeech2-0.9.3.tar.gz.
File metadata
- Download URL: whisperspeech2-0.9.3.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0c3498aa98b6270bd6575ce19583754c4652ecccba6766da4587939a535ac18
|
|
| MD5 |
4d19dd9977281775b75961961b4a3763
|
|
| BLAKE2b-256 |
6ad0556f79f6926aaea091a1ac08762860d1493aac333998c7d441699ecf06f5
|
File details
Details for the file whisperspeech2-0.9.3-py3-none-any.whl.
File metadata
- Download URL: whisperspeech2-0.9.3-py3-none-any.whl
- Upload date:
- Size: 33.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30911af164a79c9dc3cef43148afeb8811889b1c2812fad2f788744bd761efbb
|
|
| MD5 |
d8aaefbfce6b8756a8fb7d5f5acc82ab
|
|
| BLAKE2b-256 |
bf4f7840afdb20ad825cefb32b3d9189fdd23e1a2a63523749a7e4d13bc29696
|