Skip to main content

An Open Source text-to-speech system built by inverting Whisper (fork of WhisperSpeech)

Project description

🎙️ WhisperSpeech2

An Open Source text-to-speech system built by inverting Whisper. This is a fork of WhisperSpeech optimized for inference that introduces ⚡"cuda graphs"⚡ for faster inference.

🚀 Installation

pip install whisperspeech2

PyTorch

CUDA libraries if using an Nvidia GPU (tested with CUDA 12.8).

✨ Available Models

You can mix and match the models for different quality and compute requirements. See the WhisperSpeech Hugging Face repository.

S2A Models (Semantic to Acoustic)

Model Reference
Tiny WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model
Base WhisperSpeech/WhisperSpeech:s2a-q4-base-en+pl.model
Small WhisperSpeech/WhisperSpeech:s2a-q4-small-en+pl.model
HQ Fast WhisperSpeech/WhisperSpeech:s2a-q4-hq-fast-en+pl.model
v1.1 Small WhisperSpeech/WhisperSpeech:s2a-v1.1-small-en+pl.model

T2S Models (Text to Semantic)

Model Reference
Tiny WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model
Base WhisperSpeech/WhisperSpeech:t2s-base-en+pl.model
Small WhisperSpeech/WhisperSpeech:t2s-small-en+pl.model
Fast Small WhisperSpeech/WhisperSpeech:t2s-fast-small-en+pl.model
Fast Medium WhisperSpeech/WhisperSpeech:t2s-fast-medium-en+pl+yt.model
HQ Fast WhisperSpeech/WhisperSpeech:t2s-hq-fast-en+pl.model

Benchmark (no cuda graph)

image

Benchmark (with cuda graph)

People with Nvidia GPUs can set the "use_cuda_graph" parameter to "true" for fastger processing and less VRAM usage.

image

Examples

See the examples/ directory for more usage examples including GUI applications and streaming playback.

Thanks/Shout Outs

Thanks to Jakub for the inspiration.

The fine folks at the Dia2 for forcing me to learn about cuda graph. Go check theirs out too.

BIG FUCK YOU

And finally a big "fuck you" to Microsoft for installing shit on my computer that I don't need and/or want for years.

Your pathetic VibeVoice project just got pwned!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperspeech2-0.9.3.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisperspeech2-0.9.3-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file whisperspeech2-0.9.3.tar.gz.

File metadata

  • Download URL: whisperspeech2-0.9.3.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for whisperspeech2-0.9.3.tar.gz
Algorithm Hash digest
SHA256 e0c3498aa98b6270bd6575ce19583754c4652ecccba6766da4587939a535ac18
MD5 4d19dd9977281775b75961961b4a3763
BLAKE2b-256 6ad0556f79f6926aaea091a1ac08762860d1493aac333998c7d441699ecf06f5

See more details on using hashes here.

File details

Details for the file whisperspeech2-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: whisperspeech2-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for whisperspeech2-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 30911af164a79c9dc3cef43148afeb8811889b1c2812fad2f788744bd761efbb
MD5 d8aaefbfce6b8756a8fb7d5f5acc82ab
BLAKE2b-256 bf4f7840afdb20ad825cefb32b3d9189fdd23e1a2a63523749a7e4d13bc29696

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page