Skip to main content

Transcribe long audio files with STT or use the streaming interface

Project description

Listen: STT Services

This program is composed of two parts:

  • A server aimed to be runned as a background service to serve STT models within the bounds of a socket.
  • A client to query the models to transcribe audio from files or directly from a live microphone stream.

The outputed wav file can be stored for later use.

You can then use the data.helper script to verify the transcription of every wav file and update the CSV training register before you start training a model.

Requirements

Installation

Once you have a working pyaudio for your version of python, install listen.

pip install stt-listen
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/listen.git

Usage

 listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
                   [-w SAVE_WAV]

Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone

options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Path to the audio file to run (WAV format)
  --aggressive {0,1,2,3}
                        Determines how aggressive filtering out non-speech is.
                        (Integer between 0-3)
  -d MIC_DEVICE, --mic_device MIC_DEVICE
                        Device input index (Int) as listed by
                        pyaudio.PyAudio.get_device_info_by_index(). If not
                        provided, falls back to PyAudio.get_default_device().
  -w SAVE_WAV, --save_wav SAVE_WAV
                        Path to directory where to save recorded sentences
  --debug               Show debug info

Start the server

To use listen, you need a socket with STT models at the ready.

Example to enable as service.

cp ./listen.service.example /usr/lib/systemd/user/listen.service
systemctl --user enable --now listen.service

Models for STT and punctuation will be downloaded the first time your run the server.

Or manually using python

python -m listen.STT.as_service

Get authorization to listen

You need to authorize the system to listen first. Change the service configuration as follows.

# ~/.assistant/stt.toml
...
[stt]
is_allowed = true
...

Then start the server and use listen to start transcribing audio.

Use the client

Transcribe a file

You can quickly transcribe a wav file.

 listen -f savewav_2022-04-11_17-18-08_578756.wav
Filename                       Duration(s)         
savewav_2022-04-11_17-18-08_578756.wav 3.580                cat savewav_2022-04-11_17-18-08_578756.txt
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────
        File: savewav_2022-04-11_17-18-08_578756.txt
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1    Bonjour.
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────

Transcribe from a live microphone stream

You can also query the models in real time from a microphone.

 listen
You can speak now.
Bonjour.
^C
Stopped listening.

Supported languages

By default, the server uses the system's language according to the environment variable $LANG.

You can manually specify a supported language for the server to use.

LANG="en_US.UTF-8" python -m listen.STT.as_service

Have a look at stt-models-locals to see the complete list.

If the provided $LANG is not supported by any STT model, english is used as a failback.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stt-listen-2.4.2.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

stt_listen-2.4.2-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file stt-listen-2.4.2.tar.gz.

File metadata

  • Download URL: stt-listen-2.4.2.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for stt-listen-2.4.2.tar.gz
Algorithm Hash digest
SHA256 69f8e307d53f801e04fc0a43800e9a6c0447cbecde3722cc9bbd1aacc81deda6
MD5 e8752e9e89b7b78115ecc04506d59a6b
BLAKE2b-256 69854166501cfcb5a342cafefe6b6df936d45a663a6251e76994715dc4f46553

See more details on using hashes here.

File details

Details for the file stt_listen-2.4.2-py3-none-any.whl.

File metadata

  • Download URL: stt_listen-2.4.2-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for stt_listen-2.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 819861821d3a9aed787feb96e4163cccaebb16e1a99e4405e193955849b373dc
MD5 0063b8407c6ae1c41009047027615ed3
BLAKE2b-256 ea0cd5ca2472c209abb7c38105255c40b1887655d4c3f26dc790bc0b54a8b2d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page