Named after a spell in the Harry Potter Universe, where it amplies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.
Named after a spell in the Harry Potter Universe, where it amplifies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.
The repository has dependencies such as
libsndfile1 which needs to be installed before pip-installable modules
kenlm with python bindings, refer to the
kenlm github repository.
pyflashlight with python bindings, refer to the installation instructions. NOTE that the C++ build itself is not necessarily required for building python bindings. FURTHERMORE,
pyflashlight will soon be made
fairseq, refer to requirements and installations from the
fairseq github repository. NOTE that the current
pypi module is of version < 1.0 and hence installation from source is currently required. Once the
pypi index is updated with the latest
fairseq package, the same can be installed using
soundfile have dependencies on
libsndfile1. If not using conda, make sure these are installed. For Ubuntu, the same can be installed by executing:
sudo apt install portaudio19-dev libsndfile1
Finally, install requirements by executing:
pip install -r requirements.txt
or install using conda in a conda environment.
Finally, install the package using:
pip install sonorus
Environment set up:
Note: Environment set up is required while using Google Cloud's speech to text api. For this, Google Application Credentials is to be set as an environment variable by exporting e.g.:
Sample running instructions:
- Receives speech input from microphone and prints it on console using on-device Facebook's Wav2Vec2 model made available by Hugging Face..
To modify the execution parameters of the on-device model such as providing GPU device index in case of availability, the program can be run as:
python3 examples/streaming-stt.py --gpu_idx 0
- For using Google cloud's speech to text execute:
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.