A simple text-to-speech client using azure TTS API(trial).
Project description
:speaking_head: aspeak
A simple text-to-speech client using azure TTS API(trial). :laughing:
TL;DR: This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.
You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech
Installation
$ pip install --upgrade aspeak
Usage
usage: aspeak [-h] [-V | -L | -Q | [-t [TEXT] | -s [SSML]]] [-p PITCH] [-r RATE] [-S STYLE] [-f FILE] [-e ENCODING] [-o OUTPUT_PATH] [--mp3 | --ogg | --webm | --wav | -F FORMAT]
[-l LOCALE] [-v VOICE] [-q QUALITY]
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-L, --list-voices list available voices, you can combine this argument with -v and -l
-Q, --list-qualities-and-formats
list available qualities and formats
-t [TEXT], --text [TEXT]
Text to speak. Left blank when reading from file/stdin
-s [SSML], --ssml [SSML]
SSML to speak. Left blank when reading from file/stdin
-f FILE, --file FILE Text/SSML file to speak, default to `-`(stdin)
-e ENCODING, --encoding ENCODING
Text/SSML file encoding, default to "utf-8"(Not for stdin!)
-o OUTPUT_PATH, --output OUTPUT_PATH
Output file path, wav format by default
--mp3 Use mp3 format for output. (Only works when outputting to a file)
--ogg Use ogg format for output. (Only works when outputting to a file)
--webm Use webm format for output. (Only works when outputting to a file)
--wav Use wav format for output
-F FORMAT, --format FORMAT
Set output audio format (experts only)
-l LOCALE, --locale LOCALE
Locale to use, default to en-US
-v VOICE, --voice VOICE
Voice to use
-q QUALITY, --quality QUALITY
Output quality, default to 0
Options for --text:
-p PITCH, --pitch PITCH
Set pitch, default to 0
-r RATE, --rate RATE Set speech rate, default to 0.04
-S STYLE, --style STYLE
Set speech style, default to "general"
- If you don't specify
-o
, we will use your default speaker. - If you don't specify
-t
or-s
, we will assume-t
is provided. - You must specify voice if you want to use
-p
or-r
option.
Examples
Speak "Hello, world!" to default speaker.
$ aspeak -t "Hello, world"
List all available voices.
$ aspeak -L
List all available voices for Chinese.
$ aspeak -L -l zh-CN
Get information about a voice.
$ aspeak -L -v en-US-SaraNeural
Output
Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display Name: Sara
Local Name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Styles: ['cheerful', 'angry', 'sad']
Voice Type: Neural
Status: GA
Save synthesized speech to a file.
$ aspeak -t "Hello, world" -o output.wav
If you prefer mp3/ogg/webm, you can use --mp3
/--ogg
/--webm
option.
$ aspeak -t "Hello, world" -o output.mp3 --mp3
$ aspeak -t "Hello, world" -o output.ogg --ogg
$ aspeak -t "Hello, world" -o output.webm --webm
List available quality levels and formats
$ aspeak -Q
Output
Available qualities:
Qualities for wav:
-2: Riff8Khz16BitMonoPcm
-1: Riff16Khz16BitMonoPcm
0: Riff24Khz16BitMonoPcm
1: Riff24Khz16BitMonoPcm
Qualities for mp3:
-3: Audio16Khz32KBitRateMonoMp3
-2: Audio16Khz64KBitRateMonoMp3
-1: Audio16Khz128KBitRateMonoMp3
0: Audio24Khz48KBitRateMonoMp3
1: Audio24Khz96KBitRateMonoMp3
2: Audio24Khz160KBitRateMonoMp3
3: Audio48Khz96KBitRateMonoMp3
4: Audio48Khz192KBitRateMonoMp3
Qualities for ogg:
-1: Ogg16Khz16BitMonoOpus
0: Ogg24Khz16BitMonoOpus
1: Ogg48Khz16BitMonoOpus
Qualities for webm:
-1: Webm16Khz16BitMonoOpus
0: Webm24Khz16BitMonoOpus
1: Webm24Khz16Bit24KbpsMonoOpus
Available formats:
- Riff8Khz16BitMonoPcm
- Riff16Khz16BitMonoPcm
- Audio16Khz128KBitRateMonoMp3
- Raw24Khz16BitMonoPcm
- Raw48Khz16BitMonoPcm
- Raw16Khz16BitMonoPcm
- Audio24Khz160KBitRateMonoMp3
- Ogg24Khz16BitMonoOpus
- Audio16Khz64KBitRateMonoMp3
- Raw8Khz8BitMonoALaw
- Audio24Khz16Bit48KbpsMonoOpus
- Ogg16Khz16BitMonoOpus
- Riff8Khz8BitMonoALaw
- Riff8Khz8BitMonoMULaw
- Audio48Khz192KBitRateMonoMp3
- Raw8Khz16BitMonoPcm
- Audio24Khz48KBitRateMonoMp3
- Raw24Khz16BitMonoTrueSilk
- Audio24Khz16Bit24KbpsMonoOpus
- Audio24Khz96KBitRateMonoMp3
- Webm24Khz16BitMonoOpus
- Ogg48Khz16BitMonoOpus
- Riff48Khz16BitMonoPcm
- Webm24Khz16Bit24KbpsMonoOpus
- Raw8Khz8BitMonoMULaw
- Audio16Khz16Bit32KbpsMonoOpus
- Audio16Khz32KBitRateMonoMp3
- Riff24Khz16BitMonoPcm
- Raw16Khz16BitMonoTrueSilk
- Audio48Khz96KBitRateMonoMp3
- Webm16Khz16BitMonoOpus
Increase/Decrease audio qualities
# Less than default quality.
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=-1
# Best quality for mp3
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=3
Read text from file and speak it.
$ cat input.txt | aspeak
or
$ aspeak -f input.txt
with custom encoding:
$ aspeak -f input.txt -e gbk
Read from stdin and speak it.
$ aspeak
or (more verbose)
$ aspeak -f -
maybe you prefer:
$ aspeak -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF
Speak Chinese.
$ aspeak -t "你好,世界!" -l zh-CN
Use a custom voice.
$ aspeak -t "你好,世界!" -v zh-CN-YunjianNeural
Custom pitch, rate and style
$ aspeak -t "你好,世界!" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad
Examples for Advanced Users
Use a custom audio format for output
Note: When outputing to default speaker, using a non-wav format may lead to white noises.
$ python -m aspeak -t "Hello World" -F Riff48Khz16BitMonoPcm -o high-quality.wav
About This Application
- I found Azure TTS can synthesize nearly authentic human voice, which is very interesting :laughing:.
- I wrote this program to learn Azure Cognitive Services.
- And I use this program daily, because
espeak
andfestival
outputs terrible :fearful: audio.- But I respect :raised_hands: their maintainers' work, both are good open source software and they can be used off-line.
- I hope you like it :heart:.
Alternative Applications
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aspeak-1.3.0.tar.gz
(12.5 kB
view hashes)
Built Distribution
aspeak-1.3.0-py3-none-any.whl
(12.2 kB
view hashes)