A simple text-to-speech client using azure TTS API(trial).

These details have not been verified by PyPI

Project links

Project description

:speaking_head: aspeak

A simple text-to-speech client using azure TTS API(trial). :laughing:

TL;DR: This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech

Installation

$ pip install --upgrade aspeak

Limitations

Since we are using Azure Cognitive Services, there are some limitations:

Quota	Free (F0)³
Max number of transactions per certain time period per Speech service resource
Real-time API. Prebuilt neural voices and custom neural voices.	20 transactions per 60 seconds
Adjustable	No⁴
HTTP-specific quotas
Max audio length produced per request	10 min
Max total number of distinct `<voice>` and `<audio>` tags in SSML	50
Websocket specific quotas
Max audio length produced per turn	10 min
Max total number of distinct `<voice>` and `<audio>` tags in SSML	50
Max SSML message size per turn	64 KB

This table is copied from Azure Cognitive Services documentation

And the limitations may be subject to change. The table above might become outdated in the future. Please refer to the latest Azure Cognitive Services documentation for the latest information.

Usage

usage: aspeak [-h] [-V | -L | -Q | [-t [TEXT] | -s [SSML]]] [-p PITCH] [-r RATE] [-S STYLE] [-f FILE] [-e ENCODING] [-o OUTPUT_PATH] [--mp3 | --ogg | --webm | --wav | -F FORMAT]
              [-l LOCALE] [-v VOICE] [-q QUALITY]

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -L, --list-voices     list available voices, you can combine this argument with -v and -l
  -Q, --list-qualities-and-formats
                        list available qualities and formats
  -t [TEXT], --text [TEXT]
                        Text to speak. Left blank when reading from file/stdin
  -s [SSML], --ssml [SSML]
                        SSML to speak. Left blank when reading from file/stdin
  -f FILE, --file FILE  Text/SSML file to speak, default to `-`(stdin)
  -e ENCODING, --encoding ENCODING
                        Text/SSML file encoding, default to "utf-8"(Not for stdin!)
  -o OUTPUT_PATH, --output OUTPUT_PATH
                        Output file path, wav format by default
  --mp3                 Use mp3 format for output. (Only works when outputting to a file)
  --ogg                 Use ogg format for output. (Only works when outputting to a file)
  --webm                Use webm format for output. (Only works when outputting to a file)
  --wav                 Use wav format for output
  -F FORMAT, --format FORMAT
                        Set output audio format (experts only)
  -l LOCALE, --locale LOCALE
                        Locale to use, default to en-US
  -v VOICE, --voice VOICE
                        Voice to use
  -q QUALITY, --quality QUALITY
                        Output quality, default to 0

Options for --text:
  -p PITCH, --pitch PITCH
                        Set pitch, default to 0
  -r RATE, --rate RATE  Set speech rate, default to 0.04
  -S STYLE, --style STYLE
                        Set speech style, default to "general"

If you don't specify -o, we will use your default speaker.
If you don't specify -t or -s, we will assume -t is provided.
You must specify voice if you want to use -p or -r option.

Examples

Speak "Hello, world!" to default speaker.

$ aspeak -t "Hello, world"

List all available voices.

$ aspeak -L

List all available voices for Chinese.

$ aspeak -L -l zh-CN

Get information about a voice.

$ aspeak -L -v en-US-SaraNeural

Output

Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display Name: Sara
Local Name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Styles: ['cheerful', 'angry', 'sad']
Voice Type: Neural
Status: GA

Save synthesized speech to a file.

$ aspeak -t "Hello, world" -o output.wav

If you prefer mp3/ogg/webm, you can use --mp3/--ogg/--webm option.

$ aspeak -t "Hello, world" -o output.mp3 --mp3
$ aspeak -t "Hello, world" -o output.ogg --ogg
$ aspeak -t "Hello, world" -o output.webm --webm

List available quality levels and formats

$ aspeak -Q

Output

Available qualities:
Qualities for wav:
-2: Riff8Khz16BitMonoPcm
-1: Riff16Khz16BitMonoPcm
 0: Riff24Khz16BitMonoPcm
 1: Riff24Khz16BitMonoPcm
Qualities for mp3:
-3: Audio16Khz32KBitRateMonoMp3
-2: Audio16Khz64KBitRateMonoMp3
-1: Audio16Khz128KBitRateMonoMp3
 0: Audio24Khz48KBitRateMonoMp3
 1: Audio24Khz96KBitRateMonoMp3
 2: Audio24Khz160KBitRateMonoMp3
 3: Audio48Khz96KBitRateMonoMp3
 4: Audio48Khz192KBitRateMonoMp3
Qualities for ogg:
-1: Ogg16Khz16BitMonoOpus
 0: Ogg24Khz16BitMonoOpus
 1: Ogg48Khz16BitMonoOpus
Qualities for webm:
-1: Webm16Khz16BitMonoOpus
 0: Webm24Khz16BitMonoOpus
 1: Webm24Khz16Bit24KbpsMonoOpus

Available formats:
- Riff8Khz16BitMonoPcm
- Riff16Khz16BitMonoPcm
- Audio16Khz128KBitRateMonoMp3
- Raw24Khz16BitMonoPcm
- Raw48Khz16BitMonoPcm
- Raw16Khz16BitMonoPcm
- Audio24Khz160KBitRateMonoMp3
- Ogg24Khz16BitMonoOpus
- Audio16Khz64KBitRateMonoMp3
- Raw8Khz8BitMonoALaw
- Audio24Khz16Bit48KbpsMonoOpus
- Ogg16Khz16BitMonoOpus
- Riff8Khz8BitMonoALaw
- Riff8Khz8BitMonoMULaw
- Audio48Khz192KBitRateMonoMp3
- Raw8Khz16BitMonoPcm
- Audio24Khz48KBitRateMonoMp3
- Raw24Khz16BitMonoTrueSilk
- Audio24Khz16Bit24KbpsMonoOpus
- Audio24Khz96KBitRateMonoMp3
- Webm24Khz16BitMonoOpus
- Ogg48Khz16BitMonoOpus
- Riff48Khz16BitMonoPcm
- Webm24Khz16Bit24KbpsMonoOpus
- Raw8Khz8BitMonoMULaw
- Audio16Khz16Bit32KbpsMonoOpus
- Audio16Khz32KBitRateMonoMp3
- Riff24Khz16BitMonoPcm
- Raw16Khz16BitMonoTrueSilk
- Audio48Khz96KBitRateMonoMp3
- Webm16Khz16BitMonoOpus

Increase/Decrease audio qualities

# Less than default quality.
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=-1
# Best quality for mp3
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=3

Read text from file and speak it.

$ cat input.txt | aspeak

$ aspeak -f input.txt

with custom encoding:

$ aspeak -f input.txt -e gbk

Read from stdin and speak it.

$ aspeak

or (more verbose)

$ aspeak -f -

maybe you prefer:

$ aspeak -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF

Speak Chinese.

$ aspeak -t "你好，世界！" -l zh-CN

Use a custom voice.

$ aspeak -t "你好，世界！" -v zh-CN-YunjianNeural

Custom pitch, rate and style

$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad

Examples for Advanced Users

Use a custom audio format for output

Note: When outputing to default speaker, using a non-wav format may lead to white noises.

$ python -m aspeak -t "Hello World" -F Riff48Khz16BitMonoPcm -o high-quality.wav

About This Application

I found Azure TTS can synthesize nearly authentic human voice, which is very interesting :laughing:.
I wrote this program to learn Azure Cognitive Services.
And I use this program daily, because espeak and festival outputs terrible :fearful: audio.
- But I respect :raised_hands: their maintainers' work, both are good open source software and they can be used off-line.
I hope you like it :heart:.

Alternative Applications

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.1.0

Mar 28, 2025

6.1.0rc1 pre-release

Mar 28, 2025

6.0.1

Oct 3, 2023

6.0.0

Jun 29, 2023

6.0.0rc1 pre-release

Jun 28, 2023

6.0.0b3 pre-release

Jun 27, 2023

6.0.0b2 pre-release

Jun 26, 2023

6.0.0b1 pre-release

Jun 23, 2023

6.0.0a3 pre-release

Jun 20, 2023

6.0.0a2 pre-release

Jun 12, 2023

5.2.0

May 5, 2023

5.1.0

Apr 20, 2023

5.0.1a2 pre-release

Apr 20, 2023

5.0.0

Apr 18, 2023

4.3.1

Apr 5, 2023

4.3.0

Apr 4, 2023

4.3.0b2 pre-release

Mar 31, 2023

4.3.0b1 pre-release

Mar 30, 2023

4.2.0

Mar 25, 2023

4.1.0

Mar 9, 2023

4.0.0

Mar 4, 2023

4.0.0rc1 pre-release

Mar 3, 2023

4.0.0b4 pre-release

Mar 3, 2023

4.0.0b3 pre-release

Mar 3, 2023

4.0.0b2 pre-release

Mar 2, 2023

3.2.0

Feb 3, 2023

3.1.0

Nov 8, 2022

3.0.2

Sep 5, 2022

3.0.1

Sep 5, 2022

3.0.0

Sep 4, 2022

3.0.0b2 pre-release

Sep 2, 2022

3.0.0b1 pre-release

Sep 2, 2022

3.0.0.dev2 pre-release

Sep 2, 2022

3.0.0.dev1 pre-release

Sep 1, 2022

2.1.0

Jul 1, 2022

2.0.1

Jun 26, 2022

2.0.0

May 16, 2022

2.0.0rc2 pre-release

May 16, 2022

2.0.0rc1 pre-release

May 16, 2022

2.0.0b2 pre-release

May 15, 2022

2.0.0b1 pre-release

May 15, 2022

2.0.0.dev3 pre-release

May 15, 2022

2.0.0.dev2 pre-release

May 14, 2022

2.0.0.dev1 pre-release

May 14, 2022

2.0.0.dev0 pre-release

May 14, 2022

1.4.2

May 12, 2022

1.4.1

May 11, 2022

1.4.0

May 11, 2022

This version

1.3.1

May 8, 2022

1.3.0

May 7, 2022

1.2.0

May 5, 2022

1.1.4

May 5, 2022

1.1.3

May 5, 2022

1.1.2

May 5, 2022

1.1.1

May 3, 2022

1.1.0

May 3, 2022

1.0.0

May 2, 2022

0.3.2

May 2, 2022

0.3.1

May 2, 2022

0.3.0

May 2, 2022

0.2.1

May 1, 2022

0.2.0

May 1, 2022

0.1.1

May 1, 2022

0.1

May 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aspeak-1.3.1.tar.gz (14.2 kB view details)

Uploaded May 8, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aspeak-1.3.1-py3-none-any.whl (12.6 kB view details)

Uploaded May 8, 2022 Python 3

File details

Details for the file aspeak-1.3.1.tar.gz.

File metadata

Download URL: aspeak-1.3.1.tar.gz
Upload date: May 8, 2022
Size: 14.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.0

File hashes

Hashes for aspeak-1.3.1.tar.gz
Algorithm	Hash digest
SHA256	`cc24d32a351898c70b1a8671c2da5f7b07754dfab3297105de970bb4c58651f1`
MD5	`10ddb18cf96c8dbc913c8905af180fe6`
BLAKE2b-256	`d25e88371d78546b8b0e6e0682ee5751bce5d70faa5321524112f407e317ae44`

See more details on using hashes here.

File details

Details for the file aspeak-1.3.1-py3-none-any.whl.

File metadata

Download URL: aspeak-1.3.1-py3-none-any.whl
Upload date: May 8, 2022
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.0

File hashes

Hashes for aspeak-1.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`550f0dbfa28139764f21a037ea58fa0bc6dec7dfde8b152e7ddc35fcda2f1548`
MD5	`d380b2f985ffb9aa39ffea0814df4b5f`
BLAKE2b-256	`c62512ededbe21647c8909fa1f787da56c2e3a254abdabf0ef9ad692a03ce9f7`

See more details on using hashes here.

aspeak 1.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

:speaking_head: aspeak

Installation

Limitations

Usage

Examples

Speak "Hello, world!" to default speaker.

List all available voices.

List all available voices for Chinese.

Get information about a voice.

Save synthesized speech to a file.

List available quality levels and formats

Increase/Decrease audio qualities

Read text from file and speak it.

Read from stdin and speak it.

Speak Chinese.

Use a custom voice.

Custom pitch, rate and style

Examples for Advanced Users

Use a custom audio format for output

About This Application

Alternative Applications

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes