A simple text-to-speech client based on Azure's speech synthesis API

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

:speaking_head: aspeak

A simple text-to-speech client for Azure TTS API. :laughing:

You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech

Note

Starting from version 4.0.0, aspeak is rewritten in rust. The old python version is available at the python branch.

Please note that the rust rewritten version is experimental and might have bugs!

Installation

Download from GitHub Releases

Download the latest release from here.

After downloading, extract the archive and you will get a binary executable file.

You can put it in a directory that is in your PATH environment variable so that you can run it from anywhere.

Install from PyPI

Installing from PyPI will also install the python binding of aspeak for you. Check Library Usage#Python for more information on using the python binding.

pip install -U aspeak

Now the prebuilt wheels are only available for x86_64 architecture. Due to some technical issues, I haven't uploaded the source distribution to PyPI yet. So to build wheel from source, you need to follow the instructions in Install from Source.

Because of manylinux compatibility issues, the wheels for linux are not available on PyPI. (But you can still build them from source.)

Install from Source

CLI Only

The easiest way to install aspeak is to use cargo:

cargo install aspeak

Python Wheel

To build the python wheel, you need to install maturin first:

pip install maturin

After cloning the repository and cd into the directory , you can build the wheel by running:

maturin build --release --strip -F python --bindings pyo3 --interpreter python --manifest-path Cargo.toml --out dist-pyo3
maturin build --release --strip --bindings  bin --interpreter python --manifest-path Cargo.toml --out dist-bin
bash merge-wheel.bash

If everything goes well, you will get a wheel file in the dist directory.

Usage

Run aspeak help to see the help message.

Run aspeak help <subcommand> to see the help message of a subcommand.

Configuration

You can configure aspeak by creating a profile. Run the following command to create a profile:

$ aspeak config init

To edit the profile, run:

$ aspeak config edit

If you have trouble running the above command, you can edit the profile manually:

Fist get the path of the profile by running:

$ aspeak config where

Then edit the file with your favorite text editor.

The profile is a TOML file. The default profile looks like this:

Check the comments in the config file for more information about available options.

# Profile for aspeak
# GitHub: https://github.com/kxxt/aspeak

# Output verbosity
# 0   - Default
# 1   - Verbose
# The following output verbosity levels are only supported on debug build
# 2   - Debug
# >=3 - Trace
verbosity = 0

#
# Authentication configuration
#

[auth]
# Endpoint for TTS
# endpoint = "wss://eastus.api.speech.microsoft.com/cognitiveservices/websocket/v1"

# Alternatively, you can specify the region if you are using official endpoints
# region = "eastus"

# Azure Subscription Key
# key = "YOUR_KEY"

# Authentication Token
# token = "Your Authentication Token"

# Extra http headers (for experts)
# headers = [["X-My-Header", "My-Value"], ["X-My-Header2: My-Value2"]]

#
# Configuration for text subcommand
#

[text]
# Voice to use. Note that it takes precedence over the locale
# voice = "en-US-JennyNeural"
# Locale to use
locale = "en-US"
# Rate
rate = 0
# Pitch
pitch = 0
# Role
role = "Boy"
# Style, "general" by default
style = "general"
# Style degree, a floating-point number between 0.1 and 2.0
# style_degree = 1.0

#
# Output Configuration
#

[output]
# Container Format, Only wav/mp3/ogg/webm is supported.
container = "wav"
# Audio Quality. Run `aspeak list-qualities` to see available qualities.
#
# If you choose a container format that does not support the quality level you specified here, 
# we will automatically select the closest level for you.
quality = 0
# Audio Format(for experts). Run `aspeak list-formats` to see available formats.
# Note that it takes precedence over container and quality!
# format = "audio-16khz-128kbitrate-mono-mp4"

Examples

Speak "Hello, world!" to default speaker.

$ aspeak text "Hello, world"

SSML to Speech

$ aspeak ssml << EOF 
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-JennyNeural'>Hello, world!</voice></speak>
EOF

List all available voices.

$ aspeak list-voices

List all available voices for Chinese.

$ aspeak list-voices -l zh-CN

Get information about a voice.

$ aspeak list-voices -v en-US-SaraNeural

Output

Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display name: Sara
Local name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Voice type: Neural
Status: GA
Sample rate: 48000Hz
Words per minute: 157
Styles: ["angry", "cheerful", "excited", "friendly", "hopeful", "sad", "shouting", "terrified", "unfriendly", "whispering"]

Save synthesized speech to a file.

$ aspeak text "Hello, world" -o output.wav

If you prefer mp3/ogg/webm, you can use -c mp3/-c ogg/-c webm option.

$ aspeak text "Hello, world" -o output.mp3 -c mp3
$ aspeak text "Hello, world" -o output.ogg -c ogg
$ aspeak text "Hello, world" -o output.webm -c webm

List available quality levels

$ aspeak list-qualities

Output

Qualities for MP3:
  3: audio-48khz-192kbitrate-mono-mp3
  2: audio-48khz-96kbitrate-mono-mp3
 -3: audio-16khz-64kbitrate-mono-mp3
  1: audio-24khz-160kbitrate-mono-mp3
 -2: audio-16khz-128kbitrate-mono-mp3
 -4: audio-16khz-32kbitrate-mono-mp3
 -1: audio-24khz-48kbitrate-mono-mp3
  0: audio-24khz-96kbitrate-mono-mp3

Qualities for WAV:
 -2: riff-8khz-16bit-mono-pcm
  1: riff-24khz-16bit-mono-pcm
  0: riff-24khz-16bit-mono-pcm
 -1: riff-16khz-16bit-mono-pcm

Qualities for OGG:
  0: ogg-24khz-16bit-mono-opus
 -1: ogg-16khz-16bit-mono-opus
  1: ogg-48khz-16bit-mono-opus

Qualities for WEBM:
  0: webm-24khz-16bit-mono-opus
 -1: webm-16khz-16bit-mono-opus
  1: webm-24khz-16bit-24kbps-mono-opus

List available audio formats (For expert users)

$ aspeak list-formats

Output

amr-wb-16000hz
audio-16khz-128kbitrate-mono-mp3
audio-16khz-16bit-32kbps-mono-opus
audio-16khz-32kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3
audio-24khz-160kbitrate-mono-mp3
audio-24khz-16bit-24kbps-mono-opus
audio-24khz-16bit-48kbps-mono-opus
audio-24khz-48kbitrate-mono-mp3
audio-24khz-96kbitrate-mono-mp3
audio-48khz-192kbitrate-mono-mp3
audio-48khz-96kbitrate-mono-mp3
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus
raw-16khz-16bit-mono-pcm
raw-16khz-16bit-mono-truesilk
raw-22050hz-16bit-mono-pcm
raw-24khz-16bit-mono-pcm
raw-24khz-16bit-mono-truesilk
raw-44100hz-16bit-mono-pcm
raw-48khz-16bit-mono-pcm
raw-8khz-16bit-mono-pcm
raw-8khz-8bit-mono-alaw
raw-8khz-8bit-mono-mulaw
riff-16khz-16bit-mono-pcm
riff-22050hz-16bit-mono-pcm
riff-24khz-16bit-mono-pcm
riff-44100hz-16bit-mono-pcm
riff-48khz-16bit-mono-pcm
riff-8khz-16bit-mono-pcm
riff-8khz-8bit-mono-alaw
riff-8khz-8bit-mono-mulaw
webm-16khz-16bit-mono-opus
webm-24khz-16bit-24kbps-mono-opus
webm-24khz-16bit-mono-opus

Increase/Decrease audio qualities

# Less than default quality.
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=-1
# Best quality for mp3
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=3

Read text from file and speak it.

$ cat input.txt | aspeak text

$ aspeak text -f input.txt

with custom encoding:

$ aspeak text -f input.txt -e gbk

Read from stdin and speak it.

$ aspeak text

maybe you prefer:

$ aspeak text -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF

Speak Chinese.

$ aspeak text "你好，世界！" -l zh-CN

Use a custom voice.

$ aspeak text "你好，世界！" -v zh-CN-YunjianNeural

Custom pitch, rate and style

$ aspeak text "你好，世界！" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad
$ aspeak text "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=-10% -r=+5% -S cheerful
$ aspeak text "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=+40Hz -r=1.2f -S fearful
$ aspeak text "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=high -r=x-slow -S calm
$ aspeak text "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=+1st -r=-7% -S lyrical

Advanced Usage

Use a custom audio format for output

Note: Some audio formats are not supported when you are outputting to speaker.

$ aspeak text "Hello World" -F riff-48khz-16bit-mono-pcm -o high-quality.wav

Library Usage

Python

The new version of aspeak is written in Rust, and the Python binding is provided by PyO3.

Here is a simple example:

from aspeak import SpeechService, AudioFormat

service = SpeechService()
service.connect()
service.speak_text("Hello, world")

Rust

Add aspeak to your Cargo.toml:

$ cargo add aspeak

Then follow the documentation of aspeak crate.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

6.0.1

Oct 3, 2023

6.0.0

Jun 29, 2023

6.0.0rc1 pre-release

Jun 28, 2023

6.0.0b3 pre-release

Jun 27, 2023

6.0.0b2 pre-release

Jun 26, 2023

6.0.0b1 pre-release

Jun 23, 2023

6.0.0a3 pre-release

Jun 20, 2023

6.0.0a2 pre-release

Jun 12, 2023

5.2.0

May 5, 2023

5.1.0

Apr 20, 2023

5.0.1a2 pre-release

Apr 20, 2023

5.0.0

Apr 18, 2023

4.3.1

Apr 5, 2023

4.3.0

Apr 4, 2023

4.3.0b2 pre-release

Mar 31, 2023

4.3.0b1 pre-release

Mar 30, 2023

4.2.0

Mar 25, 2023

4.1.0

Mar 9, 2023

4.0.0

Mar 4, 2023

4.0.0rc1 pre-release

Mar 3, 2023

This version

4.0.0b4 pre-release

Mar 3, 2023

4.0.0b3 pre-release

Mar 3, 2023

4.0.0b2 pre-release

Mar 2, 2023

3.2.0

Feb 3, 2023

3.1.0

Nov 8, 2022

3.0.2

Sep 5, 2022

3.0.1

Sep 5, 2022

3.0.0

Sep 4, 2022

3.0.0b2 pre-release

Sep 2, 2022

3.0.0b1 pre-release

Sep 2, 2022

3.0.0.dev2 pre-release

Sep 2, 2022

3.0.0.dev1 pre-release

Sep 1, 2022

2.1.0

Jul 1, 2022

2.0.1

Jun 26, 2022

2.0.0

May 16, 2022

2.0.0rc2 pre-release

May 16, 2022

2.0.0rc1 pre-release

May 16, 2022

2.0.0b2 pre-release

May 15, 2022

2.0.0b1 pre-release

May 15, 2022

2.0.0.dev3 pre-release

May 15, 2022

2.0.0.dev2 pre-release

May 14, 2022

2.0.0.dev1 pre-release

May 14, 2022

2.0.0.dev0 pre-release

May 14, 2022

1.4.2

May 12, 2022

1.4.1

May 11, 2022

1.4.0

May 11, 2022

1.3.1

May 8, 2022

1.3.0

May 7, 2022

1.2.0

May 5, 2022

1.1.4

May 5, 2022

1.1.3

May 5, 2022

1.1.2

May 5, 2022

1.1.1

May 3, 2022

1.1.0

May 3, 2022

1.0.0

May 2, 2022

0.3.2

May 2, 2022

0.3.1

May 2, 2022

0.3.0

May 2, 2022

0.2.1

May 1, 2022

0.2.0

May 1, 2022

0.1.1

May 1, 2022

0.1

May 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

aspeak-4.0.0b4-cp311-none-win_amd64.whl (3.1 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.11 Windows x86-64

aspeak-4.0.0b4-cp311-cp311-macosx_10_7_x86_64.whl (3.0 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.11 macOS 10.7+ x86-64

aspeak-4.0.0b4-cp310-none-win_amd64.whl (3.1 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.10 Windows x86-64

aspeak-4.0.0b4-cp310-cp310-macosx_10_7_x86_64.whl (3.0 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.10 macOS 10.7+ x86-64

aspeak-4.0.0b4-cp39-none-win_amd64.whl (3.1 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.9 Windows x86-64

aspeak-4.0.0b4-cp39-cp39-macosx_10_7_x86_64.whl (3.0 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.9 macOS 10.7+ x86-64

aspeak-4.0.0b4-cp38-none-win_amd64.whl (3.1 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.8 Windows x86-64

aspeak-4.0.0b4-cp38-cp38-macosx_10_7_x86_64.whl (3.0 MB view hashes)

Uploaded Mar 3, 2023 CPython 3.8 macOS 10.7+ x86-64

Hashes for aspeak-4.0.0b4-cp311-none-win_amd64.whl

Hashes for aspeak-4.0.0b4-cp311-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`d970d41a7f28827e45df041708a08fdcceceb42ebeae1817868a4ea894590200`
MD5	`ac993a587a8bc1280971781edf37daf0`
BLAKE2b-256	`b2bedbb6f4a8d9119efec9e2fa5e56de9742bf322245c872734476c2be23f2a1`

Hashes for aspeak-4.0.0b4-cp311-cp311-macosx_10_7_x86_64.whl

Hashes for aspeak-4.0.0b4-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm	Hash digest
SHA256	`accb782f64febd33a345a9de1134818f42170e1215be58fbaba7229e6242f32f`
MD5	`587ad1ad767a56281fb4073b6e7dc6bb`
BLAKE2b-256	`9ca31cffab703012dcde85ef761225fb4328034173cc1ed6ecff5ec6b8cf6dee`

Hashes for aspeak-4.0.0b4-cp310-none-win_amd64.whl

Hashes for aspeak-4.0.0b4-cp310-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`6c0efdc874070566140831b214833bdf7e9f9645d3debdc8081e43fdfd9ba547`
MD5	`280d4849f8d631faca186013a39ac470`
BLAKE2b-256	`523edd7e1951e2448e5f91fadcfde0470b0cdcdc66f9d4265f65fa624daa1521`

Hashes for aspeak-4.0.0b4-cp310-cp310-macosx_10_7_x86_64.whl

Hashes for aspeak-4.0.0b4-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm	Hash digest
SHA256	`43d66f4a88e73414c416b898c930a5d78dc81c58228195101f4ddabfab8b0e18`
MD5	`72f08a0d609d8bc781e220d4157f2952`
BLAKE2b-256	`5a0bd133822a2e0e2fed5026a6afb7efb6167fe075b5ebc74c4a23663d0a0525`

Hashes for aspeak-4.0.0b4-cp39-none-win_amd64.whl

Hashes for aspeak-4.0.0b4-cp39-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`1a68f91593df7c73e8232f6177af48dcd3a587380e3845eeb73607bcdcbd996a`
MD5	`36fb6364daa4416ab7782f572f9a7bb5`
BLAKE2b-256	`320c6817a12fe2709c2df7cc1c87f922e90e342a9c6763ee4414c64d27db085c`

Hashes for aspeak-4.0.0b4-cp39-cp39-macosx_10_7_x86_64.whl

Hashes for aspeak-4.0.0b4-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm	Hash digest
SHA256	`c6a0338015d0cf0f360f3cfba8be0f7e8ad2bbdcff4b7ecd0486419b5e01e7ae`
MD5	`712e4acc9e2dc4ac3580bb603fa66b48`
BLAKE2b-256	`81e765b8fd0751081e59274ed67f8655e4dfca5428a891eab55544ffec43bba9`

Hashes for aspeak-4.0.0b4-cp38-none-win_amd64.whl

Hashes for aspeak-4.0.0b4-cp38-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`1e80a30bf87648787866bff2153da160d68cd480decfe6f4a386accf781d37a2`
MD5	`18f744a100044c43c176678416bba0b5`
BLAKE2b-256	`cbdf6cdaf957dee299825c212570497e6788b18da45ace1f65b1a8328422edfe`

Hashes for aspeak-4.0.0b4-cp38-cp38-macosx_10_7_x86_64.whl

Hashes for aspeak-4.0.0b4-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm	Hash digest
SHA256	`e2d86609fd7ce455cf8d3a352216484c448d62921db7f8f8e449340286b2b6c1`
MD5	`e76da268321a5cba23ced7b8c873fb96`
BLAKE2b-256	`f5310b5318f08b51eade98386503818c8694f9957a75f3e74e554f7019e61e1e`

aspeak 4.0.0b4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

:speaking_head: aspeak

Note

Installation

Download from GitHub Releases

Install from PyPI

Install from Source

CLI Only

Python Wheel

Usage

Configuration

Examples

Speak "Hello, world!" to default speaker.

SSML to Speech

List all available voices.

List all available voices for Chinese.

Get information about a voice.

Save synthesized speech to a file.

List available quality levels

List available audio formats (For expert users)

Increase/Decrease audio qualities

Read text from file and speak it.

Read from stdin and speak it.

Speak Chinese.

Use a custom voice.

Custom pitch, rate and style

Advanced Usage

Use a custom audio format for output

Library Usage

Python

Rust

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions