No project description provided
Project description
PySpeechService Developer Documentation
This documentation will cover utilizing the PySpeechService application for text-to-speech and speech recognition purposes in another application.
Step 1: Generate gRPC Files
Use the PySpeechService gRPC proto file to generate the files needed to utilize the service.
Step 2: Launch the PySpeechService Application
Launch the PySpeechService application, keeping in mind that the application can be setup in multiple ways.
- Executable in path - Simply execute PySpeechService
- Python module -
python -m py-speech-serviceorpython3 -m py-speech-serviceorpy -m py-speech-service - Local app folder -
~/.local/share/py_speech_service/py-speech-serviceor%localappdata%/py_speech_service/py-speech-service.exe
Once it's launched, read the application output. The first line is a JSON message to give information about PySpeechService as it waits for the a connection.
{
"version": "0.1.0",
"port": 12345
}
The version is the current version of the PySpeechService, which can be used to verify compatibility. The port is the random port used by the PySpeechService application for gRPC.
Step 3: Send Requests
Connect to the PySpeechService gRPC Channel
Using gRPC generated code and the standards for gRPC usage for the language of your application, connect to the PySpeechService channel and client, then call StartSpeechService. StartSpeechService is a two-way stream of SpeechServiceRequests and SpeechServiceResponses.
Use the stream to send SpeechServiceRequests to PySpeechService to initialize and use TTS and speech recognition. You'll then listen to the stream's SpeechServiceResponses to receive updates on when initialization is complete, when TTS starts and stops, and when speech has been recognized.
Initialize TTS
Before you use TTS, you need to first send a request to PySpeechService informing it of the defaults to use for TTS. This allows it to do a few things. First, it'll tell PySpeechService to download any files necessary. Second, it gives it default information to use when sending text to use for TTS.
The following is an example of the request you can send:
{
"set_speech_settings": {
"speech_settings": {
"model_name": "hfc_female",
"alt_model_name": "hfc_male",
}
}
}
The model name is the name of a Piper TTS model. If you have an onnx and config file for a Piper voice, you can also pass that in as onnx_path and config_path. The alt details are used as any voice if you use the SSML voice tag.
Speak via TTS
To request PySpeechService to speak a message, send the following request:
{
"speak": {
"message": "This is a new message.\nThis is the second line.",
"speech_settings": {
"model_name": "hfc_male",
"pitch": 1.1
}
}
}
The message is either basic text, or it can include basic SSML for changing pitch, speed, or voice. You can include speech settings to modify the pitch, voice, speed, and other settings. Any speak requests sent while a message is being spoken, those requests will be sent to a queue.
Set Speech Volume
You can update the default text to speech volume by calling making a set volume request:
{
"set_volume": {
"volume": .8
}
}
The volume is a number from 0 to 2, 0 being muted, 1 being the default volume, and 2 being twice as loud as default.
Stop Speaking
If you need to have the PySpeechService stop speaking, you can send the stop_speaking request:
{
"stop_speaking": {}
}
Initialize Speech Recognition
To initialize speech recognition, you need to first have grammar created. First you need to write a JSON file with the grammar details. The following is a very basic example:
{
"Rules": [
{
"Type": 0,
"Key": "Launch calculator rule",
"Data": [
{
"Type": 1,
"Key": null,
"Data": "Hey computer, launch the calculator."
}
]
}
],
"Replacements": {},
"Prefix": "Hey computer"
}
Rules is an array of different things for speech recognition to listen for. Because VOSK is unable to work with non-standard words, replacements can be used to have VOSK listen for particular words and replace them with non-standard and fantasy words. If all of the phrases it listens for start with the same words, you can use prefix to make sure the first word(s) match the prefix before trying to determine the rest of the spoken phrase.
Once you have the grammar JSON file written, you can send the initialize speech recognition request.
{
"start_speech_recognition": {
"vosk_model": "vosk-model-small-en-us-0.15",
"grammar_file": "/tmp/grammar.json",
"required_confidence": 80
}
}
The VOSK model is a name of the VOSK model to use. By default if not provided, the small English US model will be used. Grammar file is the path to the generated grammar JSON file, and required confidence is the percent confidence that the phrase matches what the user said. Note that VOSK does not return a confidence in what it hears, so this is just the confidence that what VOSK thinks you said matches one of the phrases in the grammar file.
Stop Speech Recognition
If you want speech recognition to be stopped, then you can send the following request. Note that if you want to restart speech recognition, you will need to send another start_speech_recognition request.
{
"stop_speech_recognition": {}
}
Ping
In order to make sure that the PySpeechService application isn't running in the background indefinitely for no reason, it will shut down if it hasn't received any requests in 5 minutes. To avoid this, it is recommended to send the ping request every 60 seconds.
{
"ping": {
"time": "2024-03-14 21:34:06
}
}
Shutdown PySpeechService
In many cases, the PySpeechService application shouldn't need to be manually shutdown. If it doesn't receive a message, it will automatically shutdown after around 5 minutes. If the connected application disconnects, it will also shutdown. However, if for any reason your application needs to shutdown the application and wants to do so gracefully, it can call the shutdown request.
{
"shutdown": {}
}
Step 4: Receive PySpeechService Responses
PySpeechService will send various responses to the gRPC stream when certain events occur.
Speech Settings Set
This is returned when you initialize the default speech settings.
{
"speech_settings_set": {
"successful": true
}
}
Speech Update Response
You will receive updates based on PySpeechService's progress on speech requests. These will happen when PySpeechService starts or ends a line.
{
"speak_update": {
"message": "This is a new message.\nThis is the second line.",
"chunk": "This is the second line.",
"is_start_of_message": false,
"is_start_of_chunk": true,
"is_end_of_message": false,
"is_end_of_chunk": false,
"has_another_request": false
}
}
The message is the full text of the original request, while the chunk is the current part of the message that is being spoken. To make responses faster, paragraphs and multiline messages are broken out into smaller chunks. The four boolean values notify you of the current status of the TTS request, while the has_another_request value informs you of if there is another request pending in the queue.
Set Volume Response
This is returned when you attempt to set the default volume used by text to speech.
{
"set_volume": {
"successful": true
}
}
Speech Recognition Initialized Response
This is returned when you attempt to start speech recognition.
{
"speech_recognition_started": {
"successful": true
}
}
Speech Recognition Response
Whenever PySpeechService is able to recognize text, it send a response with various details about the text that it recognizes.
{
"speech_recognized": {
"heard_text": "hey computer lunch eh calculator",
"recognized_text": "hey computer launch the calculator",
"recognized_rule": "Launch calculator rule",
"confidence": 83,
"semantics": []
}
}
Heard text is the text recognized by the VOSK speech recognition, whereas the recognized text is the matched passed in phrase that PySpeechService thinks that heard text matches with. The confidence is the confidence that the heard text matches the recognized text. The recognized rule is the rule matching the recognized text, and semantics are the matched key value pairs in the recognized text.
Ping
Each time you send a ping request, you'll get a ping response. This way you can confirm you're also receiving responses.
{
"ping": {
"time": "2025-03-15 05:12:32.137872"
}
}
Error
When the PySpeechService runs into an error, it will send a message that can be used for troubleshooting.
{
"error": {
"error_message": "Speech settings have not been initialized. Call set_speech_settings first.",
"exception": ""
}
}
Note that these error messages and exception details are not useful messages to display to the user, but are meant for developer messaging.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_speech_service-0.1.3.tar.gz.
File metadata
- Download URL: py_speech_service-0.1.3.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5360e5bd6131357ee000a7df55c0c1d6c2fda5802d50ca574387bfcb346f961d
|
|
| MD5 |
cdf9479777a8b6c56883f9cd835db23a
|
|
| BLAKE2b-256 |
76953effff64413f0eee5931f238ff3b226ac457560c5a7643b59b25927b1a5f
|
File details
Details for the file py_speech_service-0.1.3-py3-none-any.whl.
File metadata
- Download URL: py_speech_service-0.1.3-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9fc62f3c898933c9cc0e54febc87b73976c90c31da851dffa5bb2fb7be0506a
|
|
| MD5 |
338d6dd569c004f674c0b9fc4fcd6612
|
|
| BLAKE2b-256 |
975107a38b8068ae00520800bdadaabda09da3eb0854c38555684967a7ff96ae
|