Skip to main content

Whisper-style CLI for Qwen3-TTS text-to-speech

Project description

qwen-tts-cli

Whisper-style CLI for Qwen3-TTS text-to-speech. One command, instant speech.

Install

# Apple Silicon (recommended for Mac — 6x faster)
pip install "qwen-tts-cli[mlx]"

# CUDA / CPU
pip install "qwen-tts-cli[transformers]"

Usage

# Just speak
qwen-tts "Hello, world!"

# Choose a speaker and style
qwen-tts "I can't believe it!" --speaker Aiden --instruct "Speak with excitement"

# Save to a specific file
qwen-tts "Good morning." -o greeting.wav

# Use the larger model
qwen-tts "Higher quality voice." --model 1.7B

# Force a specific backend (auto-detected by default)
qwen-tts "Fast on Mac!" --backend mlx

# Clone a voice from a 3-second sample
qwen-tts "Now I sound like someone else." --clone reference.wav --ref-text "Transcript of the reference audio."

# Design a voice from a description
qwen-tts "Hi there!" --design --instruct "A warm, deep male voice with a calm tone"

# Read from stdin
echo "Pipe text in" | qwen-tts -

# List available speakers
qwen-tts --list-speakers

Options

positional arguments:
  text                    Text to speak. Use "-" to read from stdin.

options:
  -o, --output FILE       Output audio file (default: output.wav)
  -m, --model SIZE        Model: 0.6B, 1.7B, or full HF ID (default: 0.6B)
  -b, --backend BACKEND   Inference backend: transformers, mlx (default: auto)
  -s, --speaker NAME      Speaker voice (default: Ryan)
  -l, --language LANG     Language (default: Auto)
  -i, --instruct TEXT     Style/emotion instruction
  --device DEVICE         Force device: cuda:0, mps, cpu (default: auto, transformers only)
  --play / --no-play      Play audio after generation (default: on for macOS)
  --list-speakers         List available speakers and exit

voice cloning:
  --clone AUDIO           Reference audio for voice cloning
  --ref-text TEXT         Transcript of reference audio

voice design:
  --design                Design a voice using --instruct description

Speakers

Speaker Description Language
Ryan Dynamic rhythmic male English
Aiden Sunny clear male English
Vivian Bright young female Chinese
Serena Warm gentle female Chinese
Uncle_Fu Seasoned mellow male Chinese
Dylan Clear natural male Chinese (Beijing)
Eric Lively bright male Chinese (Sichuan)
Ono_Anna Playful light female Japanese
Sohee Warm emotional female Korean

Supported languages

Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian.

Backends

Transformers (default)

Uses PyTorch + HuggingFace Transformers. Works on all platforms.

Platform Device Precision
NVIDIA GPU cuda bfloat16
Apple Silicon mps float32
CPU cpu float32

MLX (Apple Silicon)

Uses mlx-audio with quantized models from mlx-community for native Apple Silicon acceleration. All modes (speak, clone, design) are supported.

qwen-tts "Hello!" --backend mlx
qwen-tts "Hello!" --backend mlx --model 0.6B              # smaller, faster
qwen-tts "Hi!" --backend mlx --design --instruct "warm"   # voice design
Size Mode Quant HuggingFace ID
0.6B speak 6-bit mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-6bit
0.6B clone 4-bit mlx-community/Qwen3-TTS-12Hz-0.6B-Base-4bit
1.7B speak 4-bit mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-4bit
1.7B clone 8-bit mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit
1.7B design 8-bit mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit

Additional quantization variants (4bit, 5bit, 6bit, 8bit, bf16) are available on HuggingFace for all families. Use benchmark_quant.py to find the best variant for your hardware.

Benchmark (Apple Silicon)

Tested on a 16GB M1 MacBook Pro with the same input text (~14s of audio output):

Model Load Avg Gen RTF
Transformers 0.6B (mps) 10.6s 61.4s 4.36
Transformers 1.7B (mps) 85.0s 117.7s 8.08
MLX 1.7B 8-bit 2.3s 10.2s 1.00

MLX is 6x faster than the equivalent transformers model while using less memory. RTF (real-time factor) of 1.0 means generation runs at real-time speed.

License

Apache-2.0 (same as Qwen3-TTS)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_tts_cli-0.4.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwen_tts_cli-0.4.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file qwen_tts_cli-0.4.0.tar.gz.

File metadata

  • Download URL: qwen_tts_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qwen_tts_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2dda75b6a5206009717165c54d29db55ac87599ed966aac13dd45ca5715d8880
MD5 8e86346e5a7a71d0c82c12813798acb9
BLAKE2b-256 5789ff3ce6c3fb9652bf4d50b7da068398b2cf7a97d4dedd5b2a59d057cf7eb7

See more details on using hashes here.

File details

Details for the file qwen_tts_cli-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: qwen_tts_cli-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qwen_tts_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01128b9f6c5b9987d31d8235015da9bef0815045190b3607fa0b5d3f78d83bb9
MD5 c7c5d0f5b434c839d2d62be2002d9c56
BLAKE2b-256 629a9cfb8df5c19659a5104f0f0cbd568ae0c06d628b6df987872d4cc32873d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page