Transcribe long audio files with ASR or use the streaming interface
Project description
Listen: STT Services
This program is composed of two parts:
- A server aimed to be runned as a background service to serve ASR models within the bounds of a socket.
- A client to query the models to transcribe audio from files or directly from a live microphone stream.
The outputed wav file can be stored for later use.
You can then use the data.helper script to verify the transcription of every wav file and update the CSV training register before you start training a model.
Requirements
Installation
Once you have a working pyaudio for your version of python, install listen.
pip install stt-listen
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/listen.git
Usage
❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
[-w SAVE_WAV]
Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone
options:
-h, --help show this help message and exit
-f FILE, --file FILE Path to the audio file to run (WAV format)
--aggressive {0,1,2,3}
Determines how aggressive filtering out non-speech is.
(Integer between 0-3)
-d MIC_DEVICE, --mic_device MIC_DEVICE
Device input index (Int) as listed by
pyaudio.PyAudio.get_device_info_by_index(). If not
provided, falls back to PyAudio.get_default_device().
-w SAVE_WAV, --save_wav SAVE_WAV
Path to directory where to save recorded sentences
--debug Show debug info
Start the server
To use listen, you need a socket with STT models at the ready.
Example to enable as service.
cp ./listen.service.example /usr/lib/systemd/user/listen.service
systemctl --user enable --now listen.service
Models for faster-whisper will be downloaded the first time your run the server.
Or manually with uvicorn.
uvicorn listen.Whisper.as_service:app --reload --port 5063
Get authorization to listen
You need to authorize the system to listen first. Change the service configuration as follows.
# ~/.assistant/stt.toml
...
[stt]
is_allowed = true
...
Then start the server and use listen to start transcribing audio.
Use the client
Transcribe a file
You can quickly transcribe a wav file.
❯ listen -f audio.wav
Filename Duration(s)
audio.wav 3.580
❯ cat audio.txt
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: audio.txt
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Bonjour.
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────
Transcribe from a live microphone stream
You can also query the models in real time from a microphone.
❯ listen
You can speak now.
Bonjour.
^C
Stopped listening.
Supported languages
By default, the server uses the system's language according to the environment variable $LANG.
You can manually specify a supported language for the server to use.
LANG="en_US.UTF-8" uvicorn listen.Whisper.as_service:app --reload --port 5063
This will look for a good model for this language.
You can also directly specify a model to load.
ASR_MODEL_ID="bofenghuang/whisper-large-v3-french" uvicorn listen.Whisper.as_service:app --reload --port 5063
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stt_listen-4.0.1b2.tar.gz.
File metadata
- Download URL: stt_listen-4.0.1b2.tar.gz
- Upload date:
- Size: 39.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9c7fe60ea205dc5ff29ad12c18746146c90825291d73ba67a2e84e643bee8cc
|
|
| MD5 |
49e0794a0c55af476527c6f6b677496e
|
|
| BLAKE2b-256 |
b222cbe7fc2769cd0f4a32dbaf25e37a3988bbef44df1fbc005b7cc6a8bb56eb
|
File details
Details for the file stt_listen-4.0.1b2-py3-none-any.whl.
File metadata
- Download URL: stt_listen-4.0.1b2-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1585a5b67722fb44aeb42e21de994dd39eba53d6cc1cf3b35f6a34739fb0f692
|
|
| MD5 |
4ac3ab4f7f476c85d7b463ad7ecdde60
|
|
| BLAKE2b-256 |
44c31e7fea07d2e7afb9af7fa46d28a72cfaa3d26bdef0916a7786597d6a60a5
|