Somnus is keyword detection made easy.
Project description
Somnus
Somnus offers easy keyword detection for everyone. It allows you to listen for and detect a specific keyword in a continuous stream of audio data. It uses keyword detection models developed by Google and Baidu to detect instances of the keyword and by using these small-footprint models Somnus keeps memory usage and latency to a minimum.
Getting started
Prerequisites
Linux
sudo apt-get install portaudio19-dev python-pyaudio python3-pyaudio
Windows 10
You need to install Microsoft C++ Build Tools before you can install Somnus.
Installation
Use the package manager pip to install the Somnus package and the CLI
pip install somnus
Quickstart
Somnus makes it simple to go from raw audio recordings to a working keyword detection model. To get started create a few recordings of yourself saying the keyword and download the datasets in the Recommended datasets section. Move the files to the raw audio directory you specify by running somnus configure
.
Now that you have your raw audio files set up, you can use our default configurations to create a highly effective keyword detection model.
- Run
somnus augment_audio
to augment the audio files with background noise and create your audio dataset - Run
somnus preprocess
to normalize the data stored in the augmented audio files and create a dataset that's been prepared for our keyword detection models - Run
somnus train --epochs 10
to train a keyword detection model using the dataset you just created. The resulting model will be saved tosaved_model.h5
in your current working directory. - Run
somnus test
to test the accuracy of the model you just trained using a test dataset that was generated by thepreprocess
command.
Now that you have a trained model you can use the Somnus client to detect a keyword using your microphone. First run somnus list_microphones
to find the device index of your microphone. Then run the following test script using your microphone's device index and verify that the keyword detection is working.
from somnus.somnus import Somnus
s = Somnus(model='./saved_model.h5', device_index=1)
activated = s.listen()
if activated:
print('You did it!')
else:
print('Something went wrong!')
Usage
Somnus
Somnus can be used to listen for an instance of a selected keyword in a continuous stream of audio data from a single channel from a microphone. To find the device index of your microphone run somnus list_microphones
.
Somnus can handle all the audio interfacing for you so that you only need to initialize Somnus and and call the listen()
and it will start listening to your microphone until it detects the keyword. Somnus also offers a nonblocking method (detect_keyword()
) that allows the user to process the audio themselves and only use Somnus to detect a keyword in an audio time series passed to detect_keyword()
as an argument.
Parameters
- model (default: ''): The relative or absolute path to a Keras model file for the keyword model.
- device_index (default: 0): The device index of the microphone that Somnus should listen to.
- threshold (default: 0.5): A threshold for how confident Somnus has to be for it to detect the keyword
- audio_config: A dictionary containing the configuration specific to the audio time series. It contains the following:
- data_shape (default: (101, 40, 1)): The input shape for the keyword model
- sample_duration (default: 1): How long the input of the keyword model should be in seconds
- n_filters (default: 40): The number of filters in each frame
- win_length (default: 400): The length of each window in frames
- win_hop (default: 160): the number of frames between the starting frame of each consecutive window.
CLI
Models
Currently Somnus offers the choice between the following models:
Name | Original paper | Description | Total parameters | Size |
---|---|---|---|---|
cnn-one-stride | Convolutional Neural Networks for Small-footprint Keyword Spotting | A frequency strided convolutional model with a stride of 4 and no pooling | 381k | 1.5MB |
cnn-trad-pool | Convolutional Neural Networks for Small-footprint Keyword Spotting | A keyword detection model with two convolutional layers followed by max pooling | 649k | 2.5MB |
crnn-time-stride | Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting | A convolutional recurrent network with time striding | 88k | 380KB |
Recommended datasets
Before you start we highly recommend downloading pre-made datasets for both the negative examples and background noise. For negative examples we recommend the Librispeech dataset. You can pick any of the dev, test, or train datasets. To start with we recommend using the train-clean-100.tar.gz
dataset and moving on to the larger datasets if needed. For background noise we recommend the DEMAND dataset that you can download from Kaggle here.
Extract the data and move the Librispeech dataset to the raw audio directory and place it in the negatives/
sub-directory and the DEMAND dataset to the backgrounds/
sub-directory.
positives/
will then contain utterances of your keyword in various conditions using multiple different voices and dialects. Additionally, you can add custom negative examples to the negatives/
sub-directory. We recommend that a majority of these utterances use a microphone similar to the one you will be using in the final product. This is because data gathered from different types of microphones can look completely different, e.g. a model trained on utterances recorded using headset microphone will probably not work well with a far field microphone array.
If your model is intended to be used with many different types of microphones then we recommend gathering positive and negative recordings using as many different microphones as you can.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file somnus-0.2.2.tar.gz
.
File metadata
- Download URL: somnus-0.2.2.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15b29970089f93afea98e504536a4ac989cc148fdb37eb4f267208e97196745c |
|
MD5 | 304e4e1b7db805895b5ab2f75bbee393 |
|
BLAKE2b-256 | 56330dd40174b2410fe60d94fc1945ffee9fa3a84cf9a85797763ffcc8c7afd8 |
File details
Details for the file somnus-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: somnus-0.2.2-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dfdb4276e9181213117e1c6d795ca5a58b3bd111641be7ffdcf5c047ae37ccc |
|
MD5 | 46835b53f55a0437c457cd9ae0a09b93 |
|
BLAKE2b-256 | f35db6a0d9fc4edad457a1271aac12991dccd622a5eecebf29031151b62d54a4 |