Skip to main content

DeepSpeech as a (Docker) Service for IBus

Project description

Listener (v2) Voice Dictation as a (Docker) Service for IBus

Listener is a voice dictation service for Linux desk tops which uses the Mozilla Deep Speech engine to provide the basic recognition services and focuses on providing sufficient accuracy and services to allow for coding common programming languages.

My goal with this project is to create an input method for those who have difficulty typing with their hands (such as myself), with a focus on allowing coding by voice. My personal focus is not to allow for hands free operation of the machine.

Current Status of the Project

The current state of the project is a proof of concept, what works:

  • typing content into visual studio code, kate, and google chrome
  • the start of basic punctuation capitalization et cetera driven by user editable rules files

Roadmap

  • create a docker container with a working deepspeech release [done]
  • get basic working dictation into arbitrary applications working [done]
  • create a control-panel application [started]
  • create punctuation and control short cuts and phrases [started]
  • allow for switching language models for different programming contexts and providing current-context hints such as class methods, modules, etc from the language server
  • create language models which are dictation aware, so that the common dictation short cuts such as cap X have higher priorities
  • track interaction and key press events to allow for pauses in dictation without extra spaces this will have to happen in the IBus component in order to get proper notification
  • send special keys (tab, enter, and modifiers to start with) [proof of concept done]
  • create a "correct that" GUI (with other predictions and free-form editing)
  • create a control panel allowing for one click toggling of listening
  • cut down the container to a more reasonable size
  • maybe create an DBus service for the core code [started]

Architecture

  • pacat sends audio to a named socket

  • a docker container runs Mozilla DeepSpeech hardware-accelerated by your host OS's (NVidia) graphics card

    • the container reads the audio from a pipe and reports results to a user-local event-socket
  • an interpreter process listens to the event and attempts to interpret the results according to the users rules, and eventually custom language models

  • an IBus Engine that allows the results of the recognition to be treated as regular input to the (Linux) host operating system

  • a UInput mechanism that allows for the introduction of special characters as though they were typed directly into a keyboard

Installation/Setup

See Installation Docs

Reference Docs for Devs

PyPI Version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

listener-2.0.0a1.tar.gz (35.4 kB view details)

Uploaded Source

File details

Details for the file listener-2.0.0a1.tar.gz.

File metadata

  • Download URL: listener-2.0.0a1.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.9

File hashes

Hashes for listener-2.0.0a1.tar.gz
Algorithm Hash digest
SHA256 3206e1b096b6439c322c1340098391fc2e40e15a46f6016c987b8d42e4a99d98
MD5 e822b91cffaafb6c95bc829b0d157499
BLAKE2b-256 d0a00c514e621e6db023226c2158e7a62f7294d7deb75289b1e69829d0b0ad84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page