Skip to main content

Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data.

Project description

Somnus

Build

Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data. It uses keyword detection models written in Tensorflow 2.0 to detect instances of the keyword and by using small-footprint models Somnus keeps memory usage low and latency to a minimum.

Getting started

Use the package manager pip to install the Somnus package and the CLI

pip install somnus

Usage

Somnus

Somnus can be used to listen for an instance of a selected keyword in a continuous stream of audio data from a single channel from a microphone. To find the device index of your microphone run somnus list_microphones.

Somnus can handle all the audio interfacing for you so that you only need to initialize Somnus and and call the listen() and it will start listening to your microphone until it detects the keyword. Somnus also offers a nonblocking method (detect_keyword()) to that allows the user to process the audio themselves and only use Somnus to detect a keyword in an audio time series passed to detect_keyword() as an argument.

Somnus has the following parameters:

  • keyword_file_path: The relative or absolute path to a weights file for the keyword model.
  • model (default: 'cnn-one-stride'): The name of the model you wish to use.
  • device_index (default: 0): The device index of the microphone that Somnus should listen to.
  • threshold (default: 0.9): A threshold for how confident Somnus has to be for it to detect the keyword
  • audio_config: A dictionary containing the configuration specific to the audio time series. It contains the following:
    • data_shape (default: (101, 40, 1)): The input shape for the keyword model
    • sample_duration (default: 1): How long the input of the keyword model should be in seconds
    • n_filters (default: 40): The number of filters in each frame
    • win_length (default: 400): The length of each window in frames
    • win_hop (default: 160): the number of frames between the starting frame of each consecutive window.

Example

s = Somnus('./model_weights.hdf5', device_index=1)
activated = s.listen()

if activated:
	do_stuff()

CLI

Somnus comes with a CLI that allows you to generate audio data and train your own keyword detection model. The CLI is implemented using Python-Fire. For each command you can use -h or --help to get a description of the command and a list of the possible arguments for the command.

To start using the CLI run somnus configure to create the configuration for the Somnus CLI. Then the raw data directory must contain three sub-directories:

  • positives/ for audio files containing utterances of the keyword. Must contain at least 1 audio file.
  • negatives/ for audio files containing speech that does not contain utterances of the keyword. Must contain at least 1 audio file.
  • backgrounds/ for audio files that contain background noise. This directory is optional but we recommend adding noise to the training data so that the keyword detector also works in noisy conditions.

The CLI currently supports the following audio types: wav, mp3, flac, ogg, flv, wma, aac

Configure

somnus configure

Create a configuration file with the absolute paths to the:

  • Raw audio data directory
  • Directory that should contain the augmented audio files
  • Directory that should contain the preprocessed data files

Note that the augmented audio files and preprocessed data files can use a lot of space so make sure to put them somewhere with a lot of available space.

Augmenting audio

somnus augment_audio

The command to generate an audio dataset takes the raw audio in your raw audio directory as input and generates positive, negative, and silent audio files with varying amounts of background noise. These audio files are written to the augmented audio directory.

The command has the following options:

  • duration: The duration of the audio clips in seconds
  • positive: The number of positive examples
  • negative: The number of negative examples
  • silent: The number of examples containing only background noise

Preprocessing and creating the dataset

somnus preprocess

The command to preprocess the augmented audio files. It takes the files stored in the augmented audio directory, normalizes them and stores the output array in the preprocessed data directory.

The command has the following options:

  • filters: The number of filters in each frame
  • show_progress: Boolean option to decide whether to show a progress bar (NOTE: showing progress bar may slow down processing)
  • split: The split between train, validation, and test data. The total should add up to 1. E.g. (0.9, 0.05, 0.05)
  • win_length: The length of each window in seconds
  • win_hop: the time between the start of each consecutive window.

Training

somnus train

The command to train a small-footprint keyword model loads the data in ./preprocessed_data/ and uses it to train the keyword model.

The command has the following options:

  • model_name: The name of the model we want to train
  • epochs: The number of epochs
  • weights_file: The name of the file the final weights should be saved to
  • save_best: Whether or not the model should save the best model throughout the training process
  • batch_size: The size of each mini batch
  • lr: The initial learning rate

Testing

somnus test

The command to test a trained model on a witheld test dataset.

The command has the following options:

  • model_name: The name of the model we want to test
  • weights_file: The path to the weights file

List microphones

somnus list_microphones

Prints out a list of microphones connected to your device along with their device IDs.

Models

Currently Somnus offers the choice between the following models:

Name Original paper Description Total parameters Size
cnn-one-stride Convolutional Neural Networks for Small-footprint Keyword Spotting A frequency strided convolutional model with a stride of 4 and no pooling 381k 1.5MB
cnn-trad-pool Convolutional Neural Networks for Small-footprint Keyword Spotting A keyword detection model with two convolutional layers followed by max pooling 649k 2.5MB
crnn-time-stride Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting A convolutional recurrent network with time striding 88k 380KB

Recommended datasets

Before you start we highly recommend downloading pre-made datasets for both the negative examples and background noise. For negative examples we recommend the Librispeech dataset. You can pick any of the dev, test, or train datasets. To start with we recommend using the train-clean-100.tar.gz dataset and moving on to the larger datasets if needed. For background noise we recommend the DEMAND dataset that you can download from Kaggle here.

Extract the data and move the Librispeech dataset to the raw audio directory and place it in the negatives/ sub-directory and the DEMAND dataset to the backgrounds/ sub-directory.

positives/ will then contain utterances of your keyword in various conditions using multiple different voices and dialects. Additionally, you can add custom negative examples to the negatives/ sub-directory. We recommend that a majority of these utterances use a microphone similar to the one you will be using in the final product. This is because data gathered from different types of microphones can look completely different, e.g. a model trained on utterances recorded using headset microphone will probably not work well with a far field microphone array.

If your model is intended to be used with many different types of microphones then we recommend gathering positive and negative recordings using as many different microphones as you can.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

somnus-0.1.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

somnus-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file somnus-0.1.0.tar.gz.

File metadata

  • Download URL: somnus-0.1.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for somnus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c9a0dabbef383872a0ee74dc4c5f32971055dfa57a9244135756e26e5b11a7d
MD5 4687973be65974a4be0d22b56f94594f
BLAKE2b-256 cc270de435a4cfed52802ff04b99f9b9180e26de898a366357fb32e0151c20f0

See more details on using hashes here.

File details

Details for the file somnus-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: somnus-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for somnus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a21a02fb5a8c486d3767e45ddfe65274a0450bf04ca02048beabdd7079d67543
MD5 072afe4384bdadcc307324002be1d735
BLAKE2b-256 ce71db569327f49aa161e7e611187c0e0a61cf713aa744875485892eb57f88d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page