Skip to main content

Framework for training deep automatic speech recognition models.

Project description

# Sonosco



Sonosco (from Lat. sonus - sound and nōscō - I know, recognize) is a library for training and deploying deep speech recognition models.

The goal of this project is to enable fast, repeatable and structured training of deep automatic speech recognition (ASR) models as well as providing a transcription server (REST API & frontend) to try out the trained models for transcription.
Additionally, we provide interfaces to ROS in order to use it with the anthropomimetic robot Roboy.



Installation

Via pip

The easiest way to use Sonosco's functionality is via pip:

pip install sonosco

Note: Sonosco requires Python 3.6 or higher.

For reliability, we recommend using an environment virtualization tool, like virtualenv or conda.



For developers or trying out the transcription server

Clone the repository and install dependencies:

# Clone the repo and cd inside it
git clone https://github.com/Roboy/sonosco.git && cd sonosco

# Create a virtual python environment to not pollute the global setup
python -m venv venv

# Activate the virtual environment
source venv/bin/activate

# Install normal requirements
pip install -r requirements.txt

# Link your local sonosco clone into your virtual environment
pip install -e .

Now you can check out some of the Getting Started tutorials, to train a model or use the transcription server.



Quick Start

Dockerized inference server

Get the hold of our new fully trained models from the latest release! Try out the LAS model for the best performance. Then specify the folder with the model to the runner script as shown underneath.

You can get the docker image from dockerhub under yuriyarabskyy/sonosco-inference:1.0. Just run cd server && ./run.sh yuriyarabskyy/sonosco-inference:1.0 to pull and start the server or optionally build your own image by executing the following commands.

cd server

# Build the docker image
./build.sh

# Run the built image
./run.sh sonosco_server

You can also specify the path to your own models by writing ./run.sh <image_name> <path/to/models>.

Open http://localhost:5000 in Chrome. You should be able to add models for performing transcription by clicking on the plus button. Once the models are added, record your own voice by clicking on the record button. You can replay and transcribe with the corresponding buttons.

You can get pretrained models from the release tab in this repository.


High Level Design

# High-Level-Design

The project is split into 4 parts that correlate with each other:

For data(-processing) scripts are provided to download and preprocess some publicly available datasets for speech recognition. Additionally, we provide scripts and functions to create manifest files (i.e. catalog files) for your own data and merge existing manifest files into one.

This data or rather the manifest files can then be used to easily train and evaluate an ASR model. We provide some ASR model architectures, such as LAS, TDS and DeepSpeech2 but also individual pytorch models can be designed to be trained.

The trained model can then be used in a transcription server, that consists of a REST API as well as a simple Vue.js frontend to transcribe voice recorded by a microphone and compare the transcription results to other models (that can be downloaded in our Github repository).

Further we provide example code, how to use different ASR models with ROS and especially the Roboy ROS interfaces (i.e. topics & messages).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonosco-1.0.1.tar.gz (57.8 kB view details)

Uploaded Source

Built Distribution

sonosco-1.0.1-py3-none-any.whl (130.3 kB view details)

Uploaded Python 3

File details

Details for the file sonosco-1.0.1.tar.gz.

File metadata

  • Download URL: sonosco-1.0.1.tar.gz
  • Upload date:
  • Size: 57.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.12.4 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.5.2

File hashes

Hashes for sonosco-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7a4e6563e18a77d93bd436e6b3afdaca9c48bebc72d3b5016255b7b78b156451
MD5 fe751df5666739a9d025a99f8aa78c7d
BLAKE2b-256 a7f62216ec693c880176d5316628095c88dcff21e03364572261cde4c575414c

See more details on using hashes here.

File details

Details for the file sonosco-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: sonosco-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 130.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.12.4 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.5.2

File hashes

Hashes for sonosco-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dbcae42073dc2fa2eb48f2e05f8f0a2262a57773536b9cf0356c91d49ec764e7
MD5 0bfcba21f9abefa368251a2617aff7d9
BLAKE2b-256 7a354afbf9b5443d11253cbdc90d70bd0f5982b00705e14553cbf6aa6a9b293d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page