Skip to main content

An Open Source text-to-speech system built by inverting Whisper (fork of WhisperSpeech)

Project description

🎙️ WhisperSpeech2

An Open Source text-to-speech system built by inverting Whisper. This is a fork of WhisperSpeech optimized for inference that introduces ⚡"cuda graphs"⚡ for faster inference.

🚀 Installation

pip install whisperspeech2

PyTorch

CUDA libraries if using an Nvidia GPU (tested with CUDA 12.8).

✨ Available Models

You can mix and match the models for different quality and compute requirements. See the WhisperSpeech Hugging Face repository.

S2A Models (Semantic to Acoustic)

Model Reference
Tiny WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model
Base WhisperSpeech/WhisperSpeech:s2a-q4-base-en+pl.model
Small WhisperSpeech/WhisperSpeech:s2a-q4-small-en+pl.model
HQ Fast WhisperSpeech/WhisperSpeech:s2a-q4-hq-fast-en+pl.model
v1.1 Small WhisperSpeech/WhisperSpeech:s2a-v1.1-small-en+pl.model

T2S Models (Text to Semantic)

Model Reference
Tiny WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model
Base WhisperSpeech/WhisperSpeech:t2s-base-en+pl.model
Small WhisperSpeech/WhisperSpeech:t2s-small-en+pl.model
Fast Small WhisperSpeech/WhisperSpeech:t2s-fast-small-en+pl.model
Fast Medium WhisperSpeech/WhisperSpeech:t2s-fast-medium-en+pl+yt.model
HQ Fast WhisperSpeech/WhisperSpeech:t2s-hq-fast-en+pl.model

Benchmark (no cuda graph)

image

Benchmark (with cuda graph)

People with Nvidia GPUs can set the "use_cuda_graph" parameter to "true" for faster processing.

image

Examples

See the examples/ directory for more usage examples including GUI applications and streaming playback.

Thanks/Shout Outs

Thanks to Jakub for the inspiration.

The fine folks at the Dia2 for forcing me to learn about cuda graph. Go check theirs out too.

BIG FUCK YOU

And finally a big "fuck you" to Microsoft for installing shit on my computer that I don't need and/or want for years.

Your pathetic VibeVoice project just got pwned!

You can end support for Windows 10 but still find ways to surreptitiously install Microsoft Edge, the shittest browser ever, on my computer without notification or approval whatsoever...or consistently defeat ways to simpy install Windows without having to register an account online...again, a big FUCK YOU to Microsoft!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperspeech2-1.0.0.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisperspeech2-1.0.0-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file whisperspeech2-1.0.0.tar.gz.

File metadata

  • Download URL: whisperspeech2-1.0.0.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for whisperspeech2-1.0.0.tar.gz
Algorithm Hash digest
SHA256 985758cac36e4d8f77a5fab0e2d4992b515dfac043b7189dab83939c8ff83daf
MD5 1f3f8870e31fc1aaf59a99b8ff9cda8a
BLAKE2b-256 f4527546830b77c9072ddcc1acba9c11823158d57c06d2cfe6430cf61100487f

See more details on using hashes here.

File details

Details for the file whisperspeech2-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: whisperspeech2-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for whisperspeech2-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c781876372f0e07e8ccce05f420d2c108af36c70be8d85d96864c20ff1dad046
MD5 09bdb2424e672a9000077e5e72dd5520
BLAKE2b-256 53eb8a419a8d5ef00251890d3e0cd97fe873fed7f55e6d459af0dc8eb6f3495a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page