A simple CLI tool for text-to-speech using OpenAI's API
Project description
Wiz TTS
A simple command-line tool for text-to-speech using OpenAI's API, featuring real-time FFT visualization.
Installation
uv tool install -U wiz-tts
# or if you prefer pip
pip install wiz-tts
Usage
After installation, you can run the tool with:
# Recommended: run with uv for best performance
uv run -- wiz-tts "Your text to convert to speech"
# Alternatively, run directly
wiz-tts "Your text to convert to speech"
Or pipe text from another command:
echo "Your text" | uv run -- wiz-tts
cat file.txt | uv run -- wiz-tts
Fun Zsh Function
genz() {
[[ -z "$1" ]] && echo "Error: URL is required" >&2 && return 1
local content=$(w3m -dump "$1")
local persona="gen-z podcaster, cringe but cool and on fire"
ollama run llama3.2 <<PROMPT | tee /dev/tty | wiz-tts -i "$persona"
Analyze the contents of this webpage:
$content
---
Act as a ${persona}, and compose a succinct monologue.
[no intros or closing, just the meat\!]
PROMPT
}
Example as Mac Shortcut.app Script
exec /opt/homebrew/bin/zsh -i -c "$(cat <<SHELL
INSTRUCTIONS="Voice: Warm, conversational, and authentic, with natural vocal variety and a relaxed pace that invites listeners to settle in for extended listening. Occasional subtle emphasis on key points without sounding rehearsed.
Punctuation: Thoughtful pauses that allow ideas to breathe, varied sentence lengths to maintain interest, and natural breaks that mimic genuine conversation rather than reading.
Delivery: Balanced and rhythmic with a dynamic range that prevents monotony, incorporating strategic shifts in pace and volume to highlight important information or create narrative tension.
Phrasing: Accessible and relatable, using conversational language with occasional colorful expressions or personal anecdotes to build connection. Ideas flow naturally from one to the next with smooth transitions.
Tone: Engaging and inclusive, striking a balance between informative and entertaining, with moments of gentle humor, curiosity, and genuine enthusiasm speaking with a knowledgeable friend.
"
pbpaste | wiz-tts --voice verse --data-dir "${HOME}/Downloads/wiz-tts-data" --instructions "\$INSTRUCTIONS"
SHELL
)"
Options
usage: wiz-tts [-h] [--voice {alloy,echo,fable,onyx,nova,shimmer,coral}] [--instructions INSTRUCTIONS]
[--model {tts-1,tts-1-hd,gpt-4o-mini-tts}] [--data-dir DATA_DIR] [--split {period,paragraph}]
[text]
Convert text to speech with visualization
positional arguments:
text Text to convert to speech (default: reads from stdin or uses a sample text)
options:
-h, --help show this help message and exit
--voice {alloy,echo,fable,onyx,nova,shimmer,coral}, -v {alloy,echo,fable,onyx,nova,shimmer,coral}
Voice to use for speech (default: coral)
--instructions INSTRUCTIONS, -i INSTRUCTIONS
Instructions for the speech style
--model {tts-1,tts-1-hd,gpt-4o-mini-tts}, -m {tts-1,tts-1-hd,gpt-4o-mini-tts}
TTS model to use (default: tts-1)
--split {period,paragraph}
Split input text by periods or paragraphs and add natural pauses
--data-dir DATA_DIR, -d DATA_DIR
Directory to save audio files and metadata (default: $WIZ_TTS_DATA_DIR if set)
--bitrate BITRATE, -b BITRATE
Audio bitrate for saved files (default: 24k)
Examples
Basic usage:
uv run -- wiz-tts "Hello, world!"
Using stdin:
echo "Hello from stdin" | uv run -- wiz-tts
Using a different voice:
uv run -- wiz-tts --voice nova "Welcome to the future of text to speech!"
Adding speech instructions:
uv run -- wiz-tts --voice shimmer --instructions "Speak slowly and clearly" "This is important information."
Using a different model:
uv run -- wiz-tts --model tts-1-hd "This will be rendered in high definition."
Processing a text file:
cat story.txt | uv run -- wiz-tts --voice echo
Splitting text by sentences with natural pauses:
uv run -- wiz-tts --split period "First sentence. Second sentence. And a third one!"
Splitting text by paragraphs with longer pauses:
cat essay.txt | uv run -- wiz-tts --split paragraph --voice verse
Saving audio to a directory (files are saved as they start generating):
uv run -- wiz-tts "Save this speech to a file" --data-dir ./saved_audio
Using environment variable for audio saving:
# Set the environment variable
export WIZ_TTS_DATA_DIR=./saved_speeches
# Run without --data-dir, audio will be saved to ./saved_speeches
uv run -- wiz-tts "This will be saved using the environment variable"
Saving with custom audio compression:
# Higher bitrate for better quality but larger file size
uv run -- wiz-tts "High quality audio" --data-dir ./saved_audio --bitrate 64k
# Lower bitrate for smaller file size
uv run -- wiz-tts "Compressed audio" --data-dir ./saved_audio --bitrate 16k
Features
- Converts text to speech using OpenAI's TTS API
- Real-time FFT (Fast Fourier Transform) visualization during playback
- Multiple voice options
- Custom speech style instructions
- Reads text from command line arguments or stdin
- Supports multiple TTS models
- Text splitting with natural pauses between segments
- Option to save generated audio as WebM files with metadata and configurable compression
- Saves metadata and audio files immediately as they're being generated (no need to wait for completion)
Requirements
- Python 3.12 or higher
- An OpenAI API key set in your environment variables as
OPENAI_API_KEY
Environment Variables
OPENAI_API_KEY: Your OpenAI API key (required)WIZ_TTS_DATA_DIR: Default directory for saving audio files (optional, can be overridden with--data-dir)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wiz_tts-0.5.0.tar.gz.
File metadata
- Download URL: wiz_tts-0.5.0.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b2b5a61da7638d4c70927b0cc389caebaa01193ba32f32ea617838a5be119bd
|
|
| MD5 |
64bb2fd6839373e1ebb93c2d0f979dad
|
|
| BLAKE2b-256 |
d74432516d7326f0e4eeed5f41161821bfdb6facaacff0de1fff055366bcb79c
|
Provenance
The following attestation bundles were made for wiz_tts-0.5.0.tar.gz:
Publisher:
python-publish.yml on ddrscott/wiz-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wiz_tts-0.5.0.tar.gz -
Subject digest:
0b2b5a61da7638d4c70927b0cc389caebaa01193ba32f32ea617838a5be119bd - Sigstore transparency entry: 206192424
- Sigstore integration time:
-
Permalink:
ddrscott/wiz-tts@be054ac8613b97fc4b6192f9239813362cfda467 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/ddrscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@be054ac8613b97fc4b6192f9239813362cfda467 -
Trigger Event:
release
-
Statement type:
File details
Details for the file wiz_tts-0.5.0-py3-none-any.whl.
File metadata
- Download URL: wiz_tts-0.5.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
518a2da903ba528ab0d71d6680973c0a20f57b68bd09ebcfe7bb5d2bd5b04b77
|
|
| MD5 |
a46d895139a641cee35577f2545f86b6
|
|
| BLAKE2b-256 |
e133a8cb6cf8754585a8e2aabb354c5326d14a95a509374eb1fa540896792949
|
Provenance
The following attestation bundles were made for wiz_tts-0.5.0-py3-none-any.whl:
Publisher:
python-publish.yml on ddrscott/wiz-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wiz_tts-0.5.0-py3-none-any.whl -
Subject digest:
518a2da903ba528ab0d71d6680973c0a20f57b68bd09ebcfe7bb5d2bd5b04b77 - Sigstore transparency entry: 206192426
- Sigstore integration time:
-
Permalink:
ddrscott/wiz-tts@be054ac8613b97fc4b6192f9239813362cfda467 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/ddrscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@be054ac8613b97fc4b6192f9239813362cfda467 -
Trigger Event:
release
-
Statement type: