Easy to use implementation of Vosk local speech recognition
Project description
Noah's Local Speech Recognition
This project provides a local, privacy-preserving speech recognition tool using the Vosk speech recognition toolkit and microphone input. It supports keyword detection, timestamped transcription, logging, pause/resume functionality, and model auto-downloading.
Features
- Offline speech recognition (no internet needed)
- Auto-download of Vosk models if not present
- Realtime transcription with timestamps
- Automatic transcript logging to disk
- Support for pause/resume listening
Requirements
- Python 3.9+
voskpyaudiotqdmrequests
Setup:
pip install noahs_local_speech_recognition
On Linux (including Raspberry Pi),
pyaudiostill depends on system-level PortAudio. Make sure to installportaudio19-devandpyaudiois working before pip installingnoahs_local_speech_recognitionwhich relies onpyaudioto avoid errors.
Linux Setup Instructions
If you're using Linux, the script attempts to install the required PortAudio libraries automatically. However, you may still need to:
-
Run the script that imports
noahs_local_speech_recognitionwithsudo: -
Manually install dependencies if issues arise:
sudo apt-get update sudo apt-get install -y portaudio19-dev python3-pyaudio pip uninstall pyaudio pip install pyaudio
These steps ensure that pyaudio is compiled correctly against the system's PortAudio library.
Demo
To test the speech recognizer locally, simply run this script:
from noahs_local_speech_recognition import get_text_after_keyword, list_microphones, start_speech_listening, stop_speech_listening, get_speech_log, get_speech_log_entry, set_speech_log_response, remove_speech_log_entry, pause_speech_listening, resume_speech_listening
import time
print("Listing available microphone devices:")
list_microphones()
print("\nStarting speech recognition with default input device...")
start_speech_listening(name="robot", stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")
try:
while True:
time.sleep(2)
print("_________________________")
entry = get_speech_log_entry()
if entry and entry["response"] is None:
heard = entry["content"]
print(f"I HEARD: {heard}")
speech_log1 = get_speech_log()
print(f"SPEECH LOG BEFORE SET RESPONSE: {speech_log1}")
set_speech_log_response("I have heard what you said")
speech_log2 = get_speech_log()
print(f"SPEECH LOG AFTER SET RESPONSE: {speech_log2}")
if entry and entry["content"] in ["goodbye", "good bye", "bye", "quit", "end", "exit"]:
break
except KeyboardInterrupt:
pass
finally:
stop_speech_listening()
This will:
- Auto-download the Vosk "vosk-model-small-en-us-0.15" model if missing.
- Show available audio devices.
- Look for defualt microphone since device_index=None.
- Begin listening and transcribing speech in real-time based on hearing name="robot" keyword
- Log results with timestamps to a file named like
transcript_YYYY-MM-DD_HH-MM-SS.txt. - Display speech_log before and after having response set in the console.
- Use set_speech_log_response() to flag entries as having been responded to.
Changing Models
You can modify the model by altering the model_name passed to start_speech_listening():
start_speech_listening(name="robot", stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")
Other available models:
vosk-model-small-en-us-0.15vosk-model-en-us-0.22vosk-model-en-us-0.22-lgraph
Keyword Mode vs Always Listen Mode
To use Always Listen Mode just use name=None in start_speech_listening():
start_speech_listening(name=None, stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")
This will transcribe everything, not just things said after the name keyword.
Known Issues
- Use on debian Linux may require manually setting up
pyaudioprior to import - Automaitic Model download links are hardcoded and limited to 3 options.
License
MIT License
Acknowledgements
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file noahs_local_speech_recognition-0.1.4.tar.gz.
File metadata
- Download URL: noahs_local_speech_recognition-0.1.4.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
feca19c0ec840a60b64e948decdf7f0b7d95e76b162269137fcf2f1d2df8c691
|
|
| MD5 |
547b38a766e58ee47be01ceff37a1621
|
|
| BLAKE2b-256 |
f2057de711c55106d72e8e88c9ea8f2e0bf293777db8912825834f61085f213d
|
File details
Details for the file noahs_local_speech_recognition-0.1.4-py3-none-any.whl.
File metadata
- Download URL: noahs_local_speech_recognition-0.1.4-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc2eb099e748917f6ce57181bb0c25d036d15ac160b9f2d19b2d2b2027869ade
|
|
| MD5 |
fa3939b2b10f0ac5cc58fef5c3976177
|
|
| BLAKE2b-256 |
8816f489e90e84511cb06d066642a3f8254fd95d4423f770ce43156823ca7951
|