An Open Source text-to-speech system built by inverting Whisper (fork of WhisperSpeech)
Project description
WhisperSpeech2
An Open Source text-to-speech system built by inverting Whisper. This is a fork of WhisperSpeech optimized for inference. The creators of the original project abandoned it, hence this fork.
Installation
pip install whisperspeech2
Note: You must also have PyTorch installed. Visit pytorch.org for installation instructions.
Quick Start
from whisperspeech2.pipeline import Pipeline
# Initialize the pipeline
pipe = Pipeline(
s2a_ref='WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model',
t2s_ref='WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model'
)
# Generate audio and save to file
pipe.generate_to_file('output.wav', "Hello, world!")
# Or get the audio tensor directly
audio = pipe.generate("Hello, world!")
Available Models
For more details about each model, visit the WhisperSpeech Hugging Face repository.
S2A Models (Semantic to Acoustic)
| Model | Reference |
|---|---|
| Tiny (Q4) | WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model |
| Base (Q4) | WhisperSpeech/WhisperSpeech:s2a-q4-base-en+pl.model |
| Small (Q4) | WhisperSpeech/WhisperSpeech:s2a-q4-small-en+pl.model |
| HQ Fast (Q4) | WhisperSpeech/WhisperSpeech:s2a-q4-hq-fast-en+pl.model |
| v1.1 Small | WhisperSpeech/WhisperSpeech:s2a-v1.1-small-en+pl.model |
| v1.95 Small Fast | WhisperSpeech/WhisperSpeech:s2a-v1.95-small-fast-en.model |
T2S Models (Text to Semantic)
| Model | Reference |
|---|---|
| Tiny | WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model |
| Base | WhisperSpeech/WhisperSpeech:t2s-base-en+pl.model |
| Small | WhisperSpeech/WhisperSpeech:t2s-small-en+pl.model |
| Fast Small | WhisperSpeech/WhisperSpeech:t2s-fast-small-en+pl.model |
| Fast Medium | WhisperSpeech/WhisperSpeech:t2s-fast-medium-en+pl+yt.model |
| HQ Fast | WhisperSpeech/WhisperSpeech:t2s-hq-fast-en+pl.model |
| v1.1 Small | WhisperSpeech/WhisperSpeech:t2s-v1.1-small-en+pl.model |
Model Recommendations
| Use Case | S2A Model | T2S Model | VRAM | Speed |
|---|---|---|---|---|
| Lowest Resources | s2a-q4-tiny | t2s-tiny | ~450 MB | ~16s |
| Best Speed | s2a-v1.95-small-fast | t2s-tiny | ~1.7 GB | ~15s |
| Balanced | s2a-q4-hq-fast | t2s-tiny | ~1.7 GB | ~15s |
| Higher Quality | s2a-q4-hq-fast | t2s-hq-fast | ~2.1 GB | ~16s |
Avoid: Combinations using s2a-q4-small or s2a-v1.1-small with t2s-fast-medium result in high VRAM (~4GB) and slow processing (~42s).
Speaker Embedding (Optional)
To use custom speaker embeddings, install the optional dependency:
pip install whisperspeech2[speaker]
Then pass an audio file path to clone a voice:
pipe.generate_to_file('output.wav', "Hello!", speaker='reference.wav')
Examples
See the examples/ directory for more usage examples including GUI applications and streaming playback.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisperspeech2-0.9.1.tar.gz.
File metadata
- Download URL: whisperspeech2-0.9.1.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a43e830f077a883f7643a8617e9b2cb682d10490684dba1b292f28d364f5355
|
|
| MD5 |
f8c6c116423e5c615836445f79d9d17e
|
|
| BLAKE2b-256 |
61e87ef886e368a8bbe363b30a8be628c6b6e108c918b5ef53d29ac99e67f5e8
|
File details
Details for the file whisperspeech2-0.9.1-py3-none-any.whl.
File metadata
- Download URL: whisperspeech2-0.9.1-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f24ff83718a607b485dfce392059868d2b6771e9259562bccf9c95f2d096dd38
|
|
| MD5 |
42fae4c61c9e294176e9492a0be3ba70
|
|
| BLAKE2b-256 |
c42df3cd4cd531dcd504b0b2a8c8b502cb6eaeb14e75d660444f7c37d78f0264
|