Skip to main content

Interface to speech services for image-guided surgery.

Project description

Logo GitLab-CI test status Test coverage

Author: Kim-Celine Kahl

scikit-surgeryspeech is part of the SNAPPY software project, developed at the Wellcome EPSRC Centre for Interventional and Surgical Sciences, part of University College London (UCL).

scikit-surgeryspeech supports Python 3.6.

scikit-surgeryspeech is a project which runs the Python Speech Recognition API in the background listening for a specific command (For now “start “). After saying the keyword you can say different commands, which get converted to QT Signals.

The speech recognition is done by the Google Cloud API, you have to get the credentials to use it or change the recognition service.

Please explore the project structure, and implement your own functionality.

Example usage

To run an example, just start

sksurgeryspeech.py

Make sure Google Cloud API is set up correctly as described in the section below.

You can then say “start” as keyword and afterwards a command. The command “quit” exits the application.

Note: each time you have already entered a command, you need to say “start” again to trigger the listening to commands.

Developing

Cloning

You can clone the repository using the following command:

git clone https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/scikit-surgeryspeech

If you have problems running the application, you might need to install portaudio

brew install portaudio

Use the Google Cloud speech recognition service

To use the Google Cloud speech recognition service, you need to get the credentials first. After signing up, you should get a json file with your credentials. Download this file and set the environment variable

GOOGLE_APPLICATION_CREDENTIALS

To the path of your json file. You should then be able to run the application.

Change speech recognition service

To change the speech recognition service if you don’t want to use the Google Cloud API, just change the command

words = recognizer.recognize_google_cloud(audio, credentials_json=self.credentials)

(file “voice_recognition_service.py”, methods “callback(self, recognizer, audio)”, “listen_to_command(self)”) to the recognition service of your choice. Currently available services are:

recognizer.recognize_sphinx(audio)
recognizer.recognize_google(audio)
recognizer.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)
recognizer.recognize_wit(audio, key=WIT_AI_KEY)
recognizer.recognize_bing(audio, key=BING_KEY)
recognizer.recognize_azure(audio, key=AZURE_SPEECH_KEY)
recognizer.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY)
recognizer.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)

Python development

This project uses tox. Start with a clean python environment, then do:

pip install tox
tox

and the commands that are run can be found in tox.ini.

Installing

You can pip install directly from the repository as follows:

pip install git+https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/scikit-surgeryspeech

Contributing

Please see the contributing guidelines.

Acknowledgements

Supported by Wellcome and EPSRC.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scikit_surgeryspeech-0.0.2-py2.py3-none-any.whl (15.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page