Skip to main content

An Open Source text-to-speech system built by inverting Whisper (fork of WhisperSpeech)

Project description

WhisperSpeech2

An Open Source text-to-speech system built by inverting Whisper. This is a fork of WhisperSpeech optimized for inference. The creators of the original project abandoned it, hence this fork.

Installation

pip install whisperspeech2

Note: You must also have PyTorch installed. Visit pytorch.org for installation instructions.

Quick Start

from whisperspeech2.pipeline import Pipeline

# Initialize the pipeline
pipe = Pipeline(
    s2a_ref='WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model',
    t2s_ref='WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model'
)

# Generate audio and save to file
pipe.generate_to_file('output.wav', "Hello, world!")

# Or get the audio tensor directly
audio = pipe.generate("Hello, world!")

Available Models

For more details about each model, visit the WhisperSpeech Hugging Face repository.

S2A Models (Semantic to Acoustic)

Model Reference
Tiny (Q4) WhisperSpeech/WhisperSpeech:s2a-q4-tiny-en+pl.model
Base (Q4) WhisperSpeech/WhisperSpeech:s2a-q4-base-en+pl.model
Small (Q4) WhisperSpeech/WhisperSpeech:s2a-q4-small-en+pl.model
HQ Fast (Q4) WhisperSpeech/WhisperSpeech:s2a-q4-hq-fast-en+pl.model
v1.1 Small WhisperSpeech/WhisperSpeech:s2a-v1.1-small-en+pl.model
v1.95 Small Fast WhisperSpeech/WhisperSpeech:s2a-v1.95-small-fast-en.model

T2S Models (Text to Semantic)

Model Reference
Tiny WhisperSpeech/WhisperSpeech:t2s-tiny-en+pl.model
Base WhisperSpeech/WhisperSpeech:t2s-base-en+pl.model
Small WhisperSpeech/WhisperSpeech:t2s-small-en+pl.model
Fast Small WhisperSpeech/WhisperSpeech:t2s-fast-small-en+pl.model
Fast Medium WhisperSpeech/WhisperSpeech:t2s-fast-medium-en+pl+yt.model
HQ Fast WhisperSpeech/WhisperSpeech:t2s-hq-fast-en+pl.model
v1.1 Small WhisperSpeech/WhisperSpeech:t2s-v1.1-small-en+pl.model

Model Recommendations

Use Case S2A Model T2S Model VRAM Speed
Lowest Resources s2a-q4-tiny t2s-tiny ~450 MB ~16s
Best Speed s2a-v1.95-small-fast t2s-tiny ~1.7 GB ~15s
Balanced s2a-q4-hq-fast t2s-tiny ~1.7 GB ~15s
Higher Quality s2a-q4-hq-fast t2s-hq-fast ~2.1 GB ~16s

Avoid: Combinations using s2a-q4-small or s2a-v1.1-small with t2s-fast-medium result in high VRAM (~4GB) and slow processing (~42s).

image

Speaker Embedding (Optional)

To use custom speaker embeddings, install the optional dependency:

pip install whisperspeech2[speaker]

Then pass an audio file path to clone a voice:

pipe.generate_to_file('output.wav', "Hello!", speaker='reference.wav')

Examples

See the examples/ directory for more usage examples including GUI applications and streaming playback.

License

MIT License




Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperspeech2-0.9.1.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisperspeech2-0.9.1-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file whisperspeech2-0.9.1.tar.gz.

File metadata

  • Download URL: whisperspeech2-0.9.1.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for whisperspeech2-0.9.1.tar.gz
Algorithm Hash digest
SHA256 4a43e830f077a883f7643a8617e9b2cb682d10490684dba1b292f28d364f5355
MD5 f8c6c116423e5c615836445f79d9d17e
BLAKE2b-256 61e87ef886e368a8bbe363b30a8be628c6b6e108c918b5ef53d29ac99e67f5e8

See more details on using hashes here.

File details

Details for the file whisperspeech2-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: whisperspeech2-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for whisperspeech2-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f24ff83718a607b485dfce392059868d2b6771e9259562bccf9c95f2d096dd38
MD5 42fae4c61c9e294176e9492a0be3ba70
BLAKE2b-256 c42df3cd4cd531dcd504b0b2a8c8b502cb6eaeb14e75d660444f7c37d78f0264

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page