Skip to main content

Somnus is keyword detection made easy.

Project description

Somnus

Build PyPI - License GitHub release (latest SemVer)

Somnus offers easy keyword detection for everyone. It allows you to listen for and detect a specific keyword in a continuous stream of audio data. It uses keyword detection models developed by Google and Baidu to detect instances of the keyword and by using these small-footprint models Somnus keeps memory usage and latency to a minimum.

Getting started

Prerequisites

Linux

sudo apt-get install portaudio19-dev python-pyaudio python3-pyaudio

Windows 10

You need to install Microsoft C++ Build Tools before you can install Somnus.

Installation

Use the package manager pip to install the Somnus package and the CLI

pip install somnus

Quickstart

Somnus makes it simple to go from raw audio recordings to a working keyword detection model. To get started create a few recordings of yourself saying the keyword and download the datasets in the Recommended datasets section. Move the files to the raw audio directory you specify by running somnus configure.

Now that you have your raw audio files set up, you can use our default configurations to create a highly effective keyword detection model.

  1. Run somnus augment_audio to augment the audio files with background noise and create your audio dataset
  2. Run somnus preprocess to normalize the data stored in the augmented audio files and create a dataset that's been prepared for our keyword detection models
  3. Run somnus train --epochs 10 to train a keyword detection model using the dataset you just created. The resulting model will be saved to saved_model.h5 in your current working directory.
  4. Run somnus test to test the accuracy of the model you just trained using a test dataset that was generated by the preprocess command.

Now that you have a trained model you can use the Somnus client to detect a keyword using your microphone. First run somnus list_microphones to find the device index of your microphone. Then run the following test script using your microphone's device index and verify that the keyword detection is working.

from somnus.somnus import Somnus

s = Somnus(model='./saved_model.h5', device_index=1)
activated = s.listen()

if activated:
	print('You did it!')
else:
	print('Something went wrong!')

Usage

Somnus

Somnus can be used to listen for an instance of a selected keyword in a continuous stream of audio data from a single channel from a microphone. To find the device index of your microphone run somnus list_microphones.

Somnus can handle all the audio interfacing for you so that you only need to initialize Somnus and and call the listen() and it will start listening to your microphone until it detects the keyword. Somnus also offers a nonblocking method (detect_keyword()) that allows the user to process the audio themselves and only use Somnus to detect a keyword in an audio time series passed to detect_keyword() as an argument.

Parameters

  • model (default: ''): The relative or absolute path to a Keras model file for the keyword model.
  • device_index (default: 0): The device index of the microphone that Somnus should listen to.
  • threshold (default: 0.5): A threshold for how confident Somnus has to be for it to detect the keyword
  • audio_config: A dictionary containing the configuration specific to the audio time series. It contains the following:
    • data_shape (default: (101, 40, 1)): The input shape for the keyword model
    • sample_duration (default: 1): How long the input of the keyword model should be in seconds
    • n_filters (default: 40): The number of filters in each frame
    • win_length (default: 400): The length of each window in frames
    • win_hop (default: 160): the number of frames between the starting frame of each consecutive window.

CLI

The Somnus CLI Documentation

Models

Currently Somnus offers the choice between the following models:

Name Original paper Description Total parameters Size
cnn-one-stride Convolutional Neural Networks for Small-footprint Keyword Spotting A frequency strided convolutional model with a stride of 4 and no pooling 381k 1.5MB
cnn-trad-pool Convolutional Neural Networks for Small-footprint Keyword Spotting A keyword detection model with two convolutional layers followed by max pooling 649k 2.5MB
crnn-time-stride Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting A convolutional recurrent network with time striding 88k 380KB

Recommended datasets

Before you start we highly recommend downloading pre-made datasets for both the negative examples and background noise. For negative examples we recommend the Librispeech dataset. You can pick any of the dev, test, or train datasets. To start with we recommend using the train-clean-100.tar.gz dataset and moving on to the larger datasets if needed. For background noise we recommend the DEMAND dataset that you can download from Kaggle here.

Extract the data and move the Librispeech dataset to the raw audio directory and place it in the negatives/ sub-directory and the DEMAND dataset to the backgrounds/ sub-directory.

positives/ will then contain utterances of your keyword in various conditions using multiple different voices and dialects. Additionally, you can add custom negative examples to the negatives/ sub-directory. We recommend that a majority of these utterances use a microphone similar to the one you will be using in the final product. This is because data gathered from different types of microphones can look completely different, e.g. a model trained on utterances recorded using headset microphone will probably not work well with a far field microphone array.

If your model is intended to be used with many different types of microphones then we recommend gathering positive and negative recordings using as many different microphones as you can.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

somnus-0.2.2.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

somnus-0.2.2-py3-none-any.whl (15.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page