Interface to speech services for image-guided surgery.
Author: Kim-Celine Kahl
scikit-surgeryspeech is part of the SNAPPY software project, developed at the Wellcome EPSRC Centre for Interventional and Surgical Sciences, part of University College London (UCL).
scikit-surgeryspeech supports Python 3.6.
scikit-surgeryspeech is a project which runs the Python Speech Recognition API in the background listening for a specific command. After saying the keyword you can say different commands, which get converted to QT Signals.
The speech recognition is done by the Google Cloud API, you have to get the credentials to use it or change the recognition service.
Keyword detection is done by the Porcupine API. This should be have been installed automatically via the pvporcupine dependency
Please explore the project structure, and implement your own functionality.
To run an example, just start
sksurgeryspeech.py -c example_config.json
The config file should define the paths for the porcupine library and the Google Cloud API if you are using it.
You can then say the keyword depending on the Porcupine keyword file you chose and afterwards a command. The command “quit” exits the application.
Note: each time you have already entered a command, you need to say the keyword again to trigger the listening to commands.
You can clone the repository using the following command:
git clone https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/scikit-surgeryspeech
If you have problems running the application, you might need to install portaudio
brew install portaudio
sudo apt-get install libasound-dev portaudio19-dev
If you’re going to try sphinx might need to install pulseaudo-dev
sudo apt-get install swig libpulse-dev
Set up the Porcupine keyword detection
Then, you have to set the following variables in the configuration file
"porcupine dynamic library path" : ".tox/py37/lib/python3.7/site-packages/pvporcupine/lib/linux/x86_64/libpv_porcupine.so", "porcupine model file path" : ".tox/py37/lib/python3.7/site-packages/pvporcupine/lib/common/porcupine_params.pv", "porcupine keyword file" : [".tox/py37/lib/python3.7/site-packages/pvporcupine/resources/keyword_files/linux/jarvis_linux.ppn"],
You can also generate your own keyword files
If you are using the speech recognition service within your own application, you have to start a background thread which calls the method to listen to the keyword over and over again.
You can find an example how to create such a thread in the sksurgeryspech_demo.py
Use the Google Cloud speech recognition service
To use the Google Cloud speech recognition service, you need to get the credentials first. After signing up, you should get a json file with your credentials. Download this file and add add it to the configuration file
"google credentials file" : "snappy-speech-6ff24bf3e262.json",
To the path of your json file. You should then be able to run the application.
Change speech recognition service
To change the speech recognition service if you don’t want to use the Google Cloud API, just change the command
words = recognizer.recognize_google_cloud(audio, credentials_json=self.credentials)
(file “voice_recognition_service.py”, method “listen_to_command(self)”) to the recognition service of your choice. Currently available services are:
recognizer.recognize_sphinx(audio) recognizer.recognize_google(audio) recognizer.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS) recognizer.recognize_wit(audio, key=WIT_AI_KEY) recognizer.recognize_bing(audio, key=BING_KEY) recognizer.recognize_azure(audio, key=AZURE_SPEECH_KEY) recognizer.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY) recognizer.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)
This project uses tox. Start with a clean python environment, then do:
pip install tox tox
and the commands that are run can be found in tox.ini.
You can pip install directly from the repository as follows:
pip install git+https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/scikit-surgeryspeech
Please see the contributing guidelines.
Licensing and copyright
Copyright 2019 University College London. scikit-surgeryspeech is released under the BSD-3 license. Please see the license file for details.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for scikit_surgeryspeech-0.1.0-py2.py3-none-any.whl