A simple text-to-speech client based on Azure's speech synthesis API
Project description
:speaking_head: aspeak
A simple text-to-speech client for Azure TTS API. :laughing:
You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech
Note
Starting from version 4.0.0, aspeak
is rewritten in rust. The old python version is available at the python
branch.
Please note that the rust rewritten version is experimental and might have bugs!
Installation
Download from GitHub Releases
Download the latest release from here.
After downloading, extract the archive and you will get a binary executable file.
You can put it in a directory that is in your PATH
environment variable so that you can run it from anywhere.
Install from PyPI
Installing from PyPI will also install the python binding of aspeak
for you. Check Library Usage#Python for more information on using the python binding.
pip install -U aspeak
Now the prebuilt wheels are only available for x86_64 architecture. Due to some technical issues, I haven't uploaded the source distribution to PyPI yet. So to build wheel from source, you need to follow the instructions in Install from Source.
Because of manylinux compatibility issues, the wheels for linux are not available on PyPI. (But you can still build them from source.)
Install from Source
CLI Only
The easiest way to install aspeak
is to use cargo:
cargo install aspeak
Python Wheel
To build the python wheel, you need to install maturin
first:
pip install maturin
After cloning the repository and cd
into the directory
, you can build the wheel by running:
maturin build --release --strip -F python --bindings pyo3 --interpreter python --manifest-path Cargo.toml --out dist-pyo3
maturin build --release --strip --bindings bin --interpreter python --manifest-path Cargo.toml --out dist-bin
bash merge-wheel.bash
If everything goes well, you will get a wheel file in the dist
directory.
Usage
Run aspeak help
to see the help message.
Run aspeak help <subcommand>
to see the help message of a subcommand.
Configuration
You can configure aspeak
by creating a profile. Run the following command to create a profile:
$ aspeak config init
To edit the profile, run:
$ aspeak config edit
If you have trouble running the above command, you can edit the profile manually:
Fist get the path of the profile by running:
$ aspeak config where
Then edit the file with your favorite text editor.
The profile is a TOML file. The default profile looks like this:
Check the comments in the config file for more information about available options.
# Profile for aspeak
# GitHub: https://github.com/kxxt/aspeak
# Output verbosity
# 0 - Default
# 1 - Verbose
# The following output verbosity levels are only supported on debug build
# 2 - Debug
# >=3 - Trace
verbosity = 0
#
# Authentication configuration
#
[auth]
# Endpoint for TTS
# endpoint = "wss://eastus.api.speech.microsoft.com/cognitiveservices/websocket/v1"
# Alternatively, you can specify the region if you are using official endpoints
# region = "eastus"
# Azure Subscription Key
# key = "YOUR_KEY"
# Authentication Token
# token = "Your Authentication Token"
# Extra http headers (for experts)
# headers = [["X-My-Header", "My-Value"], ["X-My-Header2: My-Value2"]]
#
# Configuration for text subcommand
#
[text]
# Voice to use. Note that it takes precedence over the locale
# voice = "en-US-JennyNeural"
# Locale to use
locale = "en-US"
# Rate
rate = 0
# Pitch
pitch = 0
# Role
role = "Boy"
# Style, "general" by default
style = "general"
# Style degree, a floating-point number between 0.1 and 2.0
# style_degree = 1.0
#
# Output Configuration
#
[output]
# Container Format, Only wav/mp3/ogg/webm is supported.
container = "wav"
# Audio Quality. Run `aspeak list-qualities` to see available qualities.
#
# If you choose a container format that does not support the quality level you specified here,
# we will automatically select the closest level for you.
quality = 0
# Audio Format(for experts). Run `aspeak list-formats` to see available formats.
# Note that it takes precedence over container and quality!
# format = "audio-16khz-128kbitrate-mono-mp4"
Examples
Speak "Hello, world!" to default speaker.
$ aspeak text "Hello, world"
SSML to Speech
$ aspeak ssml << EOF
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-JennyNeural'>Hello, world!</voice></speak>
EOF
List all available voices.
$ aspeak list-voices
List all available voices for Chinese.
$ aspeak list-voices -l zh-CN
Get information about a voice.
$ aspeak list-voices -v en-US-SaraNeural
Output
Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display name: Sara
Local name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Voice type: Neural
Status: GA
Sample rate: 48000Hz
Words per minute: 157
Styles: ["angry", "cheerful", "excited", "friendly", "hopeful", "sad", "shouting", "terrified", "unfriendly", "whispering"]
Save synthesized speech to a file.
$ aspeak text "Hello, world" -o output.wav
If you prefer mp3/ogg/webm, you can use -c mp3
/-c ogg
/-c webm
option.
$ aspeak text "Hello, world" -o output.mp3 -c mp3
$ aspeak text "Hello, world" -o output.ogg -c ogg
$ aspeak text "Hello, world" -o output.webm -c webm
List available quality levels
$ aspeak list-qualities
Output
Qualities for MP3:
3: audio-48khz-192kbitrate-mono-mp3
2: audio-48khz-96kbitrate-mono-mp3
-3: audio-16khz-64kbitrate-mono-mp3
1: audio-24khz-160kbitrate-mono-mp3
-2: audio-16khz-128kbitrate-mono-mp3
-4: audio-16khz-32kbitrate-mono-mp3
-1: audio-24khz-48kbitrate-mono-mp3
0: audio-24khz-96kbitrate-mono-mp3
Qualities for WAV:
-2: riff-8khz-16bit-mono-pcm
1: riff-24khz-16bit-mono-pcm
0: riff-24khz-16bit-mono-pcm
-1: riff-16khz-16bit-mono-pcm
Qualities for OGG:
0: ogg-24khz-16bit-mono-opus
-1: ogg-16khz-16bit-mono-opus
1: ogg-48khz-16bit-mono-opus
Qualities for WEBM:
0: webm-24khz-16bit-mono-opus
-1: webm-16khz-16bit-mono-opus
1: webm-24khz-16bit-24kbps-mono-opus
List available audio formats (For expert users)
$ aspeak list-formats
Output
amr-wb-16000hz
audio-16khz-128kbitrate-mono-mp3
audio-16khz-16bit-32kbps-mono-opus
audio-16khz-32kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3
audio-24khz-160kbitrate-mono-mp3
audio-24khz-16bit-24kbps-mono-opus
audio-24khz-16bit-48kbps-mono-opus
audio-24khz-48kbitrate-mono-mp3
audio-24khz-96kbitrate-mono-mp3
audio-48khz-192kbitrate-mono-mp3
audio-48khz-96kbitrate-mono-mp3
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus
raw-16khz-16bit-mono-pcm
raw-16khz-16bit-mono-truesilk
raw-22050hz-16bit-mono-pcm
raw-24khz-16bit-mono-pcm
raw-24khz-16bit-mono-truesilk
raw-44100hz-16bit-mono-pcm
raw-48khz-16bit-mono-pcm
raw-8khz-16bit-mono-pcm
raw-8khz-8bit-mono-alaw
raw-8khz-8bit-mono-mulaw
riff-16khz-16bit-mono-pcm
riff-22050hz-16bit-mono-pcm
riff-24khz-16bit-mono-pcm
riff-44100hz-16bit-mono-pcm
riff-48khz-16bit-mono-pcm
riff-8khz-16bit-mono-pcm
riff-8khz-8bit-mono-alaw
riff-8khz-8bit-mono-mulaw
webm-16khz-16bit-mono-opus
webm-24khz-16bit-24kbps-mono-opus
webm-24khz-16bit-mono-opus
Increase/Decrease audio qualities
# Less than default quality.
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=-1
# Best quality for mp3
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=3
Read text from file and speak it.
$ cat input.txt | aspeak text
or
$ aspeak text -f input.txt
with custom encoding:
$ aspeak text -f input.txt -e gbk
Read from stdin and speak it.
$ aspeak text
maybe you prefer:
$ aspeak text -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF
Speak Chinese.
$ aspeak text "你好,世界!" -l zh-CN
Use a custom voice.
$ aspeak text "你好,世界!" -v zh-CN-YunjianNeural
Custom pitch, rate and style
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=-10% -r=+5% -S cheerful
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=+40Hz -r=1.2f -S fearful
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=high -r=x-slow -S calm
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=+1st -r=-7% -S lyrical
Advanced Usage
Use a custom audio format for output
Note: Some audio formats are not supported when you are outputting to speaker.
$ aspeak text "Hello World" -F riff-48khz-16bit-mono-pcm -o high-quality.wav
Library Usage
Python
The new version of aspeak
is written in Rust, and the Python binding is provided by PyO3.
Here is a simple example:
from aspeak import SpeechService, AudioFormat
service = SpeechService()
service.connect()
service.speak_text("Hello, world")
Rust
Add aspeak
to your Cargo.toml
:
$ cargo add aspeak
Then follow the documentation of aspeak
crate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for aspeak-4.0.0b4-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d970d41a7f28827e45df041708a08fdcceceb42ebeae1817868a4ea894590200 |
|
MD5 | ac993a587a8bc1280971781edf37daf0 |
|
BLAKE2b-256 | b2bedbb6f4a8d9119efec9e2fa5e56de9742bf322245c872734476c2be23f2a1 |
Hashes for aspeak-4.0.0b4-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | accb782f64febd33a345a9de1134818f42170e1215be58fbaba7229e6242f32f |
|
MD5 | 587ad1ad767a56281fb4073b6e7dc6bb |
|
BLAKE2b-256 | 9ca31cffab703012dcde85ef761225fb4328034173cc1ed6ecff5ec6b8cf6dee |
Hashes for aspeak-4.0.0b4-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c0efdc874070566140831b214833bdf7e9f9645d3debdc8081e43fdfd9ba547 |
|
MD5 | 280d4849f8d631faca186013a39ac470 |
|
BLAKE2b-256 | 523edd7e1951e2448e5f91fadcfde0470b0cdcdc66f9d4265f65fa624daa1521 |
Hashes for aspeak-4.0.0b4-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43d66f4a88e73414c416b898c930a5d78dc81c58228195101f4ddabfab8b0e18 |
|
MD5 | 72f08a0d609d8bc781e220d4157f2952 |
|
BLAKE2b-256 | 5a0bd133822a2e0e2fed5026a6afb7efb6167fe075b5ebc74c4a23663d0a0525 |
Hashes for aspeak-4.0.0b4-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a68f91593df7c73e8232f6177af48dcd3a587380e3845eeb73607bcdcbd996a |
|
MD5 | 36fb6364daa4416ab7782f572f9a7bb5 |
|
BLAKE2b-256 | 320c6817a12fe2709c2df7cc1c87f922e90e342a9c6763ee4414c64d27db085c |
Hashes for aspeak-4.0.0b4-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6a0338015d0cf0f360f3cfba8be0f7e8ad2bbdcff4b7ecd0486419b5e01e7ae |
|
MD5 | 712e4acc9e2dc4ac3580bb603fa66b48 |
|
BLAKE2b-256 | 81e765b8fd0751081e59274ed67f8655e4dfca5428a891eab55544ffec43bba9 |
Hashes for aspeak-4.0.0b4-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e80a30bf87648787866bff2153da160d68cd480decfe6f4a386accf781d37a2 |
|
MD5 | 18f744a100044c43c176678416bba0b5 |
|
BLAKE2b-256 | cbdf6cdaf957dee299825c212570497e6788b18da45ace1f65b1a8328422edfe |
Hashes for aspeak-4.0.0b4-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2d86609fd7ce455cf8d3a352216484c448d62921db7f8f8e449340286b2b6c1 |
|
MD5 | e76da268321a5cba23ced7b8c873fb96 |
|
BLAKE2b-256 | f5310b5318f08b51eade98386503818c8694f9957a75f3e74e554f7019e61e1e |