Skip to main content

No project description provided

Project description

PySpeechService Developer Documentation

This documentation will cover utilizing the PySpeechService application for text-to-speech and speech recognition purposes in another application.

Step 1: Generate gRPC Files

Use the PySpeechService gRPC proto file to generate the files needed to utilize the service.

Step 2: Launch the PySpeechService Application

Launch the PySpeechService application, keeping in mind that the application can be setup in multiple ways.

  • Executable in path - Simply execute PySpeechService
  • Python module - python -m py-speech-service or python3 -m py-speech-service or py -m py-speech-service
  • Local app folder - ~/.local/share/py_speech_service/py-speech-service or %localappdata%/py_speech_service/py-speech-service.exe

Once it's launched, read the application output. The first line is a JSON message to give information about PySpeechService as it waits for the a connection.

{
    "version": "0.1.0",
    "port": 12345
}

The version is the current version of the PySpeechService, which can be used to verify compatibility. The port is the random port used by the PySpeechService application for gRPC.

Step 3: Send Requests

Connect to the PySpeechService gRPC Channel

Using gRPC generated code and the standards for gRPC usage for the language of your application, connect to the PySpeechService channel and client, then call StartSpeechService. StartSpeechService is a two-way stream of SpeechServiceRequests and SpeechServiceResponses.

Use the stream to send SpeechServiceRequests to PySpeechService to initialize and use TTS and speech recognition. You'll then listen to the stream's SpeechServiceResponses to receive updates on when initialization is complete, when TTS starts and stops, and when speech has been recognized.

Initialize TTS

Before you use TTS, you need to first send a request to PySpeechService informing it of the defaults to use for TTS. This allows it to do a few things. First, it'll tell PySpeechService to download any files necessary. Second, it gives it default information to use when sending text to use for TTS.

The following is an example of the request you can send:

{
    "set_speech_settings": {
        "speech_settings": {
            "model_name": "hfc_female",
            "alt_model_name": "hfc_male",
        }
    }
}

The model name is the name of a Piper TTS model. If you have an onnx and config file for a Piper voice, you can also pass that in as onnx_path and config_path. The alt details are used as any voice if you use the SSML voice tag.

Speak via TTS

To request PySpeechService to speak a message, send the following request:

{
    "speak": {
        "message": "This is a new message.\nThis is the second line.",
        "speech_settings": {
            "model_name": "hfc_male",
            "pitch": 1.1
        }
    }
}

The message is either basic text, or it can include basic SSML for changing pitch, speed, or voice. You can include speech settings to modify the pitch, voice, speed, and other settings. Any speak requests sent while a message is being spoken, those requests will be sent to a queue.

Set Speech Volume

You can update the default text to speech volume by calling making a set volume request:

{
    "set_volume": {
        "volume": .8
    }
}

The volume is a number from 0 to 2, 0 being muted, 1 being the default volume, and 2 being twice as loud as default.

Stop Speaking

If you need to have the PySpeechService stop speaking, you can send the stop_speaking request:

{
    "stop_speaking": {}
}

Initialize Speech Recognition

To initialize speech recognition, you need to first have grammar created. First you need to write a JSON file with the grammar details. The following is a very basic example:

{
    "Rules": [
        {
            "Type": 0,
            "Key": "Launch calculator rule",
            "Data": [
                {
                    "Type": 1,
                    "Key": null,
                    "Data": "Hey computer, launch the calculator."
                }
            ]
        }
    ],
    "Replacements": {},
    "Prefix": "Hey computer"
}

Rules is an array of different things for speech recognition to listen for. Because VOSK is unable to work with non-standard words, replacements can be used to have VOSK listen for particular words and replace them with non-standard and fantasy words. If all of the phrases it listens for start with the same words, you can use prefix to make sure the first word(s) match the prefix before trying to determine the rest of the spoken phrase.

Once you have the grammar JSON file written, you can send the initialize speech recognition request.

{
    "start_speech_recognition": {
        "vosk_model": "vosk-model-small-en-us-0.15",
        "grammar_file": "/tmp/grammar.json",
        "required_confidence": 80
    }
}

The VOSK model is a name of the VOSK model to use. By default if not provided, the small English US model will be used. Grammar file is the path to the generated grammar JSON file, and required confidence is the percent confidence that the phrase matches what the user said. Note that VOSK does not return a confidence in what it hears, so this is just the confidence that what VOSK thinks you said matches one of the phrases in the grammar file.

Stop Speech Recognition

If you want speech recognition to be stopped, then you can send the following request. Note that if you want to restart speech recognition, you will need to send another start_speech_recognition request.

{
    "stop_speech_recognition": {}
}

Ping

In order to make sure that the PySpeechService application isn't running in the background indefinitely for no reason, it will shut down if it hasn't received any requests in 5 minutes. To avoid this, it is recommended to send the ping request every 60 seconds.

{
    "ping": {
        "time": "2024-03-14 21:34:06
    }
}

Shutdown PySpeechService

In many cases, the PySpeechService application shouldn't need to be manually shutdown. If it doesn't receive a message, it will automatically shutdown after around 5 minutes. If the connected application disconnects, it will also shutdown. However, if for any reason your application needs to shutdown the application and wants to do so gracefully, it can call the shutdown request.

{
    "shutdown": {}
}

Step 4: Receive PySpeechService Responses

PySpeechService will send various responses to the gRPC stream when certain events occur.

Speech Settings Set

This is returned when you initialize the default speech settings.

{
    "speech_settings_set": {
        "successful": true
    }
}

Speech Update Response

You will receive updates based on PySpeechService's progress on speech requests. These will happen when PySpeechService starts or ends a line.

{
    "speak_update": {
        "message": "This is a new message.\nThis is the second line.",
        "chunk": "This is the second line.",
        "is_start_of_message": false,
        "is_start_of_chunk": true,
        "is_end_of_message": false,
        "is_end_of_chunk": false,
        "has_another_request": false
    }
}

The message is the full text of the original request, while the chunk is the current part of the message that is being spoken. To make responses faster, paragraphs and multiline messages are broken out into smaller chunks. The four boolean values notify you of the current status of the TTS request, while the has_another_request value informs you of if there is another request pending in the queue.

Set Volume Response

This is returned when you attempt to set the default volume used by text to speech.

{
    "set_volume": {
        "successful": true
    }
}

Speech Recognition Initialized Response

This is returned when you attempt to start speech recognition.

{
    "speech_recognition_started": {
        "successful": true
    }
}

Speech Recognition Response

Whenever PySpeechService is able to recognize text, it send a response with various details about the text that it recognizes.

{
    "speech_recognized": {
        "heard_text": "hey computer lunch eh calculator",
        "recognized_text": "hey computer launch the calculator",
        "recognized_rule": "Launch calculator rule",
        "confidence": 83,
        "semantics": []
    }
}

Heard text is the text recognized by the VOSK speech recognition, whereas the recognized text is the matched passed in phrase that PySpeechService thinks that heard text matches with. The confidence is the confidence that the heard text matches the recognized text. The recognized rule is the rule matching the recognized text, and semantics are the matched key value pairs in the recognized text.

Ping

Each time you send a ping request, you'll get a ping response. This way you can confirm you're also receiving responses.

{
    "ping": {
        "time": "2025-03-15 05:12:32.137872"
    }
}

Error

When the PySpeechService runs into an error, it will send a message that can be used for troubleshooting.

{
    "error": {
        "error_message": "Speech settings have not been initialized. Call set_speech_settings first.",
        "exception": ""
    }
}

Note that these error messages and exception details are not useful messages to display to the user, but are meant for developer messaging.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_speech_service-0.1.1.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_speech_service-0.1.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file py_speech_service-0.1.1.tar.gz.

File metadata

  • Download URL: py_speech_service-0.1.1.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-54-generic

File hashes

Hashes for py_speech_service-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c8b620c6ae6a65c88b650bc57cd1ec88b42d85348c141b8f54612d8a4575bbf2
MD5 9a5dbe0460f8acee311f9f5f084e879e
BLAKE2b-256 769f4d74efe5358b6143675135027f416b1a48f19f3476c52f6c76a8fa481063

See more details on using hashes here.

File details

Details for the file py_speech_service-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: py_speech_service-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-54-generic

File hashes

Hashes for py_speech_service-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d5e4264a7ef5f1f949b4834c8b02870c98f3913c0b92c0fcf29ef3004ffcc44
MD5 aa6e457458f0d91cde2ef40e274d9e76
BLAKE2b-256 eda57469155eface2ec8243099e6410db0f5f1e34dd97bd73a593ed3f19cce8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page