Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data.
Project description
Somnus
Somnus allows you to listen for and detect a specific keyword in a continuous stream of audio data. It uses keyword detection models written in Tensorflow 2.0 to detect instances of the keyword and by using small-footprint models Somnus keeps memory usage low and latency to a minimum.
Getting started
Use the package manager pip to install the Somnus package and the CLI
pip install somnus
Usage
Somnus
Somnus can be used to listen for an instance of a selected keyword in a continuous stream of audio data from a single channel from a microphone. To find the device index of your microphone run somnus list_microphones
.
Somnus can handle all the audio interfacing for you so that you only need to initialize Somnus and and call the listen()
and it will start listening to your microphone until it detects the keyword. Somnus also offers a nonblocking method (detect_keyword()
) to that allows the user to process the audio themselves and only use Somnus to detect a keyword in an audio time series passed to detect_keyword()
as an argument.
Somnus has the following parameters:
- keyword_file_path: The relative or absolute path to a weights file for the keyword model.
- model (default: 'cnn-one-stride'): The name of the model you wish to use.
- device_index (default: 0): The device index of the microphone that Somnus should listen to.
- threshold (default: 0.9): A threshold for how confident Somnus has to be for it to detect the keyword
- audio_config: A dictionary containing the configuration specific to the audio time series. It contains the following:
- data_shape (default: (101, 40, 1)): The input shape for the keyword model
- sample_duration (default: 1): How long the input of the keyword model should be in seconds
- n_filters (default: 40): The number of filters in each frame
- win_length (default: 400): The length of each window in frames
- win_hop (default: 160): the number of frames between the starting frame of each consecutive window.
Example
s = Somnus('./model_weights.hdf5', device_index=1)
activated = s.listen()
if activated:
do_stuff()
CLI
Somnus comes with a CLI that allows you to generate audio data and train your own keyword detection model. The CLI is implemented using Python-Fire. For each command you can use -h
or --help
to get a description of the command and a list of the possible arguments for the command.
To start using the CLI run somnus configure
to create the configuration for the Somnus CLI. Then the raw data directory must contain three sub-directories:
positives/
for audio files containing utterances of the keyword. Must contain at least 1 audio file.negatives/
for audio files containing speech that does not contain utterances of the keyword. Must contain at least 1 audio file.backgrounds/
for audio files that contain background noise. This directory is optional but we recommend adding noise to the training data so that the keyword detector also works in noisy conditions.
The CLI currently supports the following audio types: wav, mp3, flac, ogg, flv, wma, aac
Configure
somnus configure
Create a configuration file with the absolute paths to the:
- Raw audio data directory
- Directory that should contain the augmented audio files
- Directory that should contain the preprocessed data files
Note that the augmented audio files and preprocessed data files can use a lot of space so make sure to put them somewhere with a lot of available space.
Augmenting audio
somnus augment_audio
The command to generate an audio dataset takes the raw audio in your raw audio directory as input and generates positive, negative, and silent audio files with varying amounts of background noise. These audio files are written to the augmented audio directory.
The command has the following options:
- duration: The duration of the audio clips in seconds
- positive: The number of positive examples
- negative: The number of negative examples
- silent: The number of examples containing only background noise
Preprocessing and creating the dataset
somnus preprocess
The command to preprocess the augmented audio files. It takes the files stored in the augmented audio directory, normalizes them and stores the output array in the preprocessed data directory.
The command has the following options:
- filters: The number of filters in each frame
- show_progress: Boolean option to decide whether to show a progress bar (NOTE: showing progress bar may slow down processing)
- split: The split between train, validation, and test data. The total should add up to 1. E.g.
(0.9, 0.05, 0.05)
- win_length: The length of each window in seconds
- win_hop: the time between the start of each consecutive window.
Training
somnus train
The command to train a small-footprint keyword model loads the data in ./preprocessed_data/
and uses it to train the keyword model.
The command has the following options:
- model_name: The name of the model we want to train
- epochs: The number of epochs
- weights_file: The name of the file the final weights should be saved to
- save_best: Whether or not the model should save the best model throughout the training process
- batch_size: The size of each mini batch
- lr: The initial learning rate
Testing
somnus test
The command to test a trained model on a witheld test dataset.
The command has the following options:
- model_name: The name of the model we want to test
- weights_file: The path to the weights file
List microphones
somnus list_microphones
Prints out a list of microphones connected to your device along with their device IDs.
Models
Currently Somnus offers the choice between the following models:
Name | Original paper | Description | Total parameters | Size |
---|---|---|---|---|
cnn-one-stride | Convolutional Neural Networks for Small-footprint Keyword Spotting | A frequency strided convolutional model with a stride of 4 and no pooling | 381k | 1.5MB |
cnn-trad-pool | Convolutional Neural Networks for Small-footprint Keyword Spotting | A keyword detection model with two convolutional layers followed by max pooling | 649k | 2.5MB |
crnn-time-stride | Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting | A convolutional recurrent network with time striding | 88k | 380KB |
Recommended datasets
Before you start we highly recommend downloading pre-made datasets for both the negative examples and background noise. For negative examples we recommend the Librispeech dataset. You can pick any of the dev, test, or train datasets. To start with we recommend using the train-clean-100.tar.gz
dataset and moving on to the larger datasets if needed. For background noise we recommend the DEMAND dataset that you can download from Kaggle here.
Extract the data and move the Librispeech dataset to the raw audio directory and place it in the negatives/
sub-directory and the DEMAND dataset to the backgrounds/
sub-directory.
positives/
will then contain utterances of your keyword in various conditions using multiple different voices and dialects. Additionally, you can add custom negative examples to the negatives/
sub-directory. We recommend that a majority of these utterances use a microphone similar to the one you will be using in the final product. This is because data gathered from different types of microphones can look completely different, e.g. a model trained on utterances recorded using headset microphone will probably not work well with a far field microphone array.
If your model is intended to be used with many different types of microphones then we recommend gathering positive and negative recordings using as many different microphones as you can.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file somnus-0.1.0.tar.gz
.
File metadata
- Download URL: somnus-0.1.0.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c9a0dabbef383872a0ee74dc4c5f32971055dfa57a9244135756e26e5b11a7d |
|
MD5 | 4687973be65974a4be0d22b56f94594f |
|
BLAKE2b-256 | cc270de435a4cfed52802ff04b99f9b9180e26de898a366357fb32e0151c20f0 |
File details
Details for the file somnus-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: somnus-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a21a02fb5a8c486d3767e45ddfe65274a0450bf04ca02048beabdd7079d67543 |
|
MD5 | 072afe4384bdadcc307324002be1d735 |
|
BLAKE2b-256 | ce71db569327f49aa161e7e611187c0e0a61cf713aa744875485892eb57f88d6 |