Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Somnus

Build

Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data. It uses keyword detection models written in Tensorflow 2.0 to detect instances of the keyword and by using small-footprint models Somnus keeps memory usage low and latency to a minimum.

Getting started

Prerequisites

sudo apt-get install portaudio19-dev python-pyaudio python3-pyaudio

Installation

Use the package manager pip to install the Somnus package and the CLI

pip install somnus

Usage

Somnus

Somnus can be used to listen for an instance of a selected keyword in a continuous stream of audio data from a single channel from a microphone. To find the device index of your microphone run somnus list_microphones.

Somnus can handle all the audio interfacing for you so that you only need to initialize Somnus and and call the listen() and it will start listening to your microphone until it detects the keyword. Somnus also offers a nonblocking method (detect_keyword()) that allows the user to process the audio themselves and only use Somnus to detect a keyword in an audio time series passed to detect_keyword() as an argument.

Somnus has the following parameters:

keyword_file_path: The relative or absolute path to a weights file for the keyword model.
model (default: 'cnn-one-stride'): The name of the model you wish to use.
device_index (default: 0): The device index of the microphone that Somnus should listen to.
threshold (default: 0.5): A threshold for how confident Somnus has to be for it to detect the keyword
audio_config: A dictionary containing the configuration specific to the audio time series. It contains the following:
- data_shape (default: (101, 40, 1)): The input shape for the keyword model
- sample_duration (default: 1): How long the input of the keyword model should be in seconds
- n_filters (default: 40): The number of filters in each frame
- win_length (default: 400): The length of each window in frames
- win_hop (default: 160): the number of frames between the starting frame of each consecutive window.

Example

s = Somnus('./model_weights.hdf5', device_index=1)
activated = s.listen()

if activated:
	do_stuff()

CLI

Somnus comes with a CLI that allows you to generate audio data and train your own keyword detection model. The CLI is implemented using Python-Fire. For each command you can use -h or --help to get a description of the command and a list of the possible arguments for the command.

To start using the CLI run somnus configure to create the configuration for the Somnus CLI. Then the raw data directory must contain three sub-directories:

positives/ for audio files containing utterances of the keyword. Must contain at least 1 audio file.
negatives/ for audio files containing speech that does not contain utterances of the keyword. Must contain at least 1 audio file.
backgrounds/ for audio files that contain background noise. This directory is optional but we recommend adding noise to the training data so that the keyword detector also works in noisy conditions.

The CLI currently supports the following audio types: wav, mp3, flac, ogg, flv, wma, aac

Configure

somnus configure

Create a configuration file with the absolute paths to the:

Raw audio data directory
Directory that should contain the augmented audio files
Directory that should contain the preprocessed data files

Note that the augmented audio files and preprocessed data files can use a lot of space so make sure to put them somewhere with a lot of available space.

Augmenting audio

somnus augment_audio

The command to generate an audio dataset takes the raw audio in your raw audio directory as input and generates positive, negative, and silent audio files with varying amounts of background noise. These audio files are written to the augmented audio directory.

The command has the following options:

--duration: The duration of the audio clips in seconds
--positive: The number of positive examples
--negative: The number of negative examples
--silent: The number of examples containing only background noise

Preprocessing and creating the dataset

somnus preprocess

The command to preprocess the augmented audio files. It takes the files stored in the augmented audio directory, normalizes them and stores the output array in the preprocessed data directory.

The command has the following options:

--filters: The number of filters in each frame
--show_progress: Boolean option to decide whether to show a progress bar
--split: The split between train, validation, and test data. The total should add up to 1. E.g. (0.9, 0.05, 0.05)
--win_length: The length of each window in seconds
--win_hop: the time between the start of each consecutive window.

Training

somnus train

The command to train a small-footprint keyword model loads the data in ./preprocessed_data/ and uses it to train the keyword model.

The command has the following options:

--model_name: The name of the model we want to train
--epochs: The number of epochs
--weights_file: The name of the file the final weights should be saved to
--save_best: Whether or not the model should save the best model throughout the training process
--batch_size: The size of each mini batch
--lr: The initial learning rate

Testing

somnus test

The command to test a trained model on a witheld test dataset.

The command has the following options:

--model_name: The name of the model we want to test
--weights_file: The path to the weights file

List microphones

somnus list_microphones

Prints out a list of microphones connected to your device along with their device IDs.

Models

Currently Somnus offers the choice between the following models:

Name	Original paper	Description	Total parameters	Size
cnn-one-stride	Convolutional Neural Networks for Small-footprint Keyword Spotting	A frequency strided convolutional model with a stride of 4 and no pooling	381k	1.5MB
cnn-trad-pool	Convolutional Neural Networks for Small-footprint Keyword Spotting	A keyword detection model with two convolutional layers followed by max pooling	649k	2.5MB
crnn-time-stride	Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting	A convolutional recurrent network with time striding	88k	380KB

Recommended datasets

Before you start we highly recommend downloading pre-made datasets for both the negative examples and background noise. For negative examples we recommend the Librispeech dataset. You can pick any of the dev, test, or train datasets. To start with we recommend using the train-clean-100.tar.gz dataset and moving on to the larger datasets if needed. For background noise we recommend the DEMAND dataset that you can download from Kaggle here.

Extract the data and move the Librispeech dataset to the raw audio directory and place it in the negatives/ sub-directory and the DEMAND dataset to the backgrounds/ sub-directory.

positives/ will then contain utterances of your keyword in various conditions using multiple different voices and dialects. Additionally, you can add custom negative examples to the negatives/ sub-directory. We recommend that a majority of these utterances use a microphone similar to the one you will be using in the final product. This is because data gathered from different types of microphones can look completely different, e.g. a model trained on utterances recorded using headset microphone will probably not work well with a far field microphone array.

If your model is intended to be used with many different types of microphones then we recommend gathering positive and negative recordings using as many different microphones as you can.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.2

Aug 18, 2020

0.2.1

Aug 18, 2020

0.2.0

Aug 17, 2020

This version

0.1.2

Aug 10, 2020

0.1.0

Aug 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

somnus-0.1.2.tar.gz (15.5 kB view hashes)

Uploaded Aug 10, 2020 Source

Built Distribution

somnus-0.1.2-py3-none-any.whl (14.8 kB view hashes)

Uploaded Aug 10, 2020 Python 3

Hashes for somnus-0.1.2.tar.gz

Hashes for somnus-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`a6f06bafce42f30d2cdf840be49085bc557bfdcde7785e91dfebeb858145994d`
MD5	`c46747da9af531eed1390a04b62f62f4`
BLAKE2b-256	`7bde74fb9de9edf0f070e28b3249ff5df06c3389d5a827f1cb7a09c3d5fcc756`

Hashes for somnus-0.1.2-py3-none-any.whl

Hashes for somnus-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a1c9eeec8abe6a43120e1ea679b07391b4c31e56cee80b78b04ebf20329a061`
MD5	`64cb72644d2a89d6f24cd121be6391e5`
BLAKE2b-256	`247eefddae7c493e4e5251cb71317e909ff2274be9da6979f462c87f7d2e8dbd`