Skip to main content

Real time speech to text

Project description

PyListener

PyListener is tool for near real-time voice processing and speech to text conversion, it can be pretty fast to slightly sluggish depending on the compute and memory availability of the environment, I suggest using it in situations where a delay of ~1 second is reasonable, e.g. AI assistants, voice command processing etc.

Watch a demo

Installation

Use the package manager pip to install foobar.

pip install py-listener

Basic Usage

from listener import Listener

# prints what the speaker is saying, look at all
# parameters of the constructor to find out more features
listener = Listener(speech_handler=print)

# start listening
listener.listen()

# NOTE: listening is done from a separate thread, so you must
# have other operations to keep the interpreter going, or it will
# quit. if your code has no other operations, just run a loop like
# below.

# --------------------
# import time

# while True:
#     time.sleep(1)
# -----------------------

# stops listening
listener.stop()

# starts listening again
# listener.listen()

Documentation

There is only one class in the package, the Listener.

It starts collecting audio data after instantation into n second chunks, n is a number passed as an argument, it checks if the audio chunk contains any human voice in it and if there is human voice, it collects that chunk for later processing (conversion to text or any other user-defined processing) and discards the chunk otherwise.

Constructor parameters

  • speech_handler: a function that is called with the text for the human voice in the recorded audio as the only argument, speech_handler(string speech).

  • on_listening_start: a parameterless function that is called right after the Listener object starts collecting audio.

  • time_window: an integer that specifies the chunk size of the collected audio in seconds, 2 is the default.

  • no_channels: the number of audio channels to be used for recording, 1 is the default.

  • has_voice: a function that is called on the recorded audio chunks to determine if they have human voice in them, it gets the audio chunk in a numpy.ndarray object as the only argument, Silero is used by default to do this, has_voice(numpy.ndarrray chunk).

  • voice_handler: a function that is used to process an utterance, a continuous segment of speech, it gets a list of audio chunks as the only argument, voice_handler(list<numpy.ndarray>).

  • voice_to_speech: a function used to convert human voice to text, whisper is used by default to do this, voice_to_speech(list<numpy.ndarray>).

  • use_fp16: a boolean flag indicating if the the voice detection and speech-to-text models should use half precision arithmetic to save memory and reduce latency, the default is True if CUDA is available, it has no effect on CPUs at the time of this writing so it's set to False by default on CPU environments.

  • en_only: a flag indicating only english language is going to be used in the collected audio, this is used to determine the best whisper model to use to convert speech to text.

  • show_model_download: a flag specifying if a progress bar should be displayed when downloading models.

  • device: this the device where the speech detection and speech to text conversion models run, the default is cuda if available, else cpu.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_listener-0.1.3.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_listener-0.1.3-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file py_listener-0.1.3.tar.gz.

File metadata

  • Download URL: py_listener-0.1.3.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for py_listener-0.1.3.tar.gz
Algorithm Hash digest
SHA256 674717a3bf3f71e6cdf6a2138a3db6684b9f6579d83b677258406b237dc26f28
MD5 63e9b05f305db36ba99f01beae8d250b
BLAKE2b-256 c497117ccd7a56ad56178ea1676b37da63c645f41ee628fdad2d2c3e2a4c4a67

See more details on using hashes here.

File details

Details for the file py_listener-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: py_listener-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for py_listener-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23fde1961daaebb0d1dcb417def8b45183a4c7709b2e13536f924ce7479272bc
MD5 ce3a09aa5b0c6f6c6d7217e168336870
BLAKE2b-256 f954aa76a9fc5942b30ae9310e9267362f236af4f8dcd336b5b061954e04cb39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page