Skip to main content

Easy to use implementation of Vosk local speech recognition

Project description

Noah's Local Speech Recognition

This project provides a local, privacy-preserving speech recognition tool using the Vosk speech recognition toolkit and microphone input. It supports keyword detection, timestamped transcription, logging, pause/resume functionality, and model auto-downloading.


Features

  • Offline speech recognition (no internet needed)
  • Auto-download of Vosk models if not present
  • Realtime transcription with timestamps
  • Automatic transcript logging to disk
  • Support for pause/resume listening

Requirements

  • Python 3.9+
  • vosk
  • pyaudio
  • tqdm
  • requests

Setup:

pip install noahs_local_speech_recognition

On Linux (including Raspberry Pi), pyaudio still depends on system-level PortAudio. Make sure to install portaudio19-dev and pyaudio is working before pip installing noahs_local_speech_recognition which relies on pyaudio to avoid errors.


Linux Setup Instructions

If you're using Linux, the script attempts to install the required PortAudio libraries automatically. However, you may still need to:

  1. Run the script that imports noahs_local_speech_recognition with sudo:

  2. Manually install dependencies if issues arise:

    sudo apt-get update
    sudo apt-get install -y portaudio19-dev python3-pyaudio
    pip uninstall pyaudio
    pip install pyaudio
    

These steps ensure that pyaudio is compiled correctly against the system's PortAudio library.


Demo

To test the speech recognizer locally, simply run this script:

from noahs_local_speech_recognition import get_text_after_keyword, list_microphones, start_speech_listening, stop_speech_listening, get_speech_log, get_speech_log_entry, set_speech_log_response, remove_speech_log_entry, pause_speech_listening, resume_speech_listening

import time

print("Listing available microphone devices:")
list_microphones()

print("\nStarting speech recognition with default input device...")
start_speech_listening(name="robot", stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")

try:
    while True:
        time.sleep(2)
        print("_________________________")
        entry = get_speech_log_entry()
        if entry and entry["response"] is None:
            heard = entry["content"]
            print(f"I HEARD: {heard}")
            speech_log1 = get_speech_log()
            print(f"SPEECH LOG BEFORE SET RESPONSE: {speech_log1}")
            set_speech_log_response("I have heard what you said")
            speech_log2 = get_speech_log()
            print(f"SPEECH LOG AFTER SET RESPONSE: {speech_log2}")
        if entry and entry["content"] in ["goodbye", "good bye", "bye", "quit", "end", "exit"]:
            break
except KeyboardInterrupt:
    pass
finally:
    stop_speech_listening()

This will:

  • Auto-download the Vosk "vosk-model-small-en-us-0.15" model if missing.
  • Show available audio devices.
  • Look for defualt microphone since device_index=None.
  • Begin listening and transcribing speech in real-time based on hearing name="robot" keyword
  • Log results with timestamps to a file named like transcript_YYYY-MM-DD_HH-MM-SS.txt.
  • Display speech_log before and after having response set in the console.
  • Use set_speech_log_response() to flag entries as having been responded to.

Changing Models

You can modify the model by altering the model_name passed to start_speech_listening():

start_speech_listening(name="robot", stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")

Other available models:

  • vosk-model-small-en-us-0.15
  • vosk-model-en-us-0.22
  • vosk-model-en-us-0.22-lgraph

Keyword Mode vs Always Listen Mode

To use Always Listen Mode just use name=None in start_speech_listening():

start_speech_listening(name=None, stop_talking_delay=2, device_index=None, model_name="vosk-model-small-en-us-0.15")

This will transcribe everything, not just things said after the name keyword.


Known Issues

  • Use on debian Linux may require manually setting up pyaudio prior to import
  • Automaitic Model download links are hardcoded and limited to 3 options.

License

MIT License


Acknowledgements

  • Vosk Speech Recognition Toolkit
  • TQDM for progress bars

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noahs_local_speech_recognition-0.1.4.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file noahs_local_speech_recognition-0.1.4.tar.gz.

File metadata

File hashes

Hashes for noahs_local_speech_recognition-0.1.4.tar.gz
Algorithm Hash digest
SHA256 feca19c0ec840a60b64e948decdf7f0b7d95e76b162269137fcf2f1d2df8c691
MD5 547b38a766e58ee47be01ceff37a1621
BLAKE2b-256 f2057de711c55106d72e8e88c9ea8f2e0bf293777db8912825834f61085f213d

See more details on using hashes here.

File details

Details for the file noahs_local_speech_recognition-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for noahs_local_speech_recognition-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bc2eb099e748917f6ce57181bb0c25d036d15ac160b9f2d19b2d2b2027869ade
MD5 fa3939b2b10f0ac5cc58fef5c3976177
BLAKE2b-256 8816f489e90e84511cb06d066642a3f8254fd95d4423f770ce43156823ca7951

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page