Skip to main content

Named after a spell in the Harry Potter Universe, where it amplies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.

Project description

sonorus

Named after a spell in the Harry Potter Universe, where it amplifies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.

Getting Started:

Installation:

Install dependencies

The repository has dependencies such as kenlm, pyflashlight, fairseq, portaudio and libsndfile1 which needs to be installed before pip-installable modules

To install kenlm with python bindings, refer to the kenlm github repository.

To install pyflashlight with python bindings, refer to the installation instructions. NOTE that the C++ build itself is not necessarily required for building python bindings. FURTHERMORE, pyflashlight will soon be made pip-installable via pypi.

To install fairseq, refer to requirements and installations from the fairseq github repository. NOTE that the current pip-installable pypi module is of version < 1.0 and hence installation from source is currently required. Once the pypi index is updated with the latest fairseq package, the same can be installed using pip.

pyaudio and librosa/soundfile have dependencies on portaudio and libsndfile1. If not using conda, make sure these are installed. For Ubuntu, the same can be installed by executing:

sudo apt install portaudio19-dev libsndfile1

Finally, install requirements by executing:

pip install -r requirements.txt

or install using conda in a conda environment.

Finally, install the package using:

pip install sonorus

Environment set up:

Note: Environment set up is required while using Google Cloud's speech to text api. For this, Google Application Credentials is to be set as an environment variable by exporting e.g.:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-cloud-credentials.json

Sample running instructions:

  • Receives speech input from microphone and prints it on console using on-device Facebook's Wav2Vec2 model made available by Hugging Face..

python3 examples/streaming-stt.py

To modify the execution parameters of the on-device model such as providing GPU device index in case of availability, the program can be run as:

python3 examples/streaming-stt.py --gpu_idx 0

  • For using Google cloud's speech to text execute:

python3 examples/google-streaming-stt.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sonorus-0.1.1-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file sonorus-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sonorus-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.10

File hashes

Hashes for sonorus-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5ac2bf4d91cb07c94defccb587e3bcfc4bdb38a94cb896b228e8c7b584657d0f
MD5 fdc0a6c16493ab5d26026db0c6978232
BLAKE2b-256 bff7ec6ec09ad98903865dad7999b60afea0820ec61ce14ca46d91afdbca8d29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page