Skip to main content

Execute scripts with Whisper for your microphone

Project description

Whisper Mic

This repo is based on the work done here by OpenAI. This repo allows you use use a mic to run scripts. This repo copies some of the README from original project.

Video Tutorial

See the video tutorial for this repo here. This is a fork of here the video may not be relevant

Professional Assistance

If are in need of paid professional help, that is available through this email

Setup

Now a pip package!

  1. Create a venv of your choice.
  2. Run pip install whisper-voice-commands

Example usage

whisper-voice-commands --model tiny --script_path ~youruser/scripts/ --english --ambient --dynamic_energy

Check whisper-voice-commands --help for more details

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Microphone Demo

You can use the model with a microphone using the whisper-voice-commands program. Use -h to see flag options.

Some of the more important flags are the --model and --english flags.

Troubleshooting

If you are having issues with the cli.py not running try the following:

sudo apt install portaudio19-dev python3-pyaudio

Contributing

Currently, this is just a cli demo. I forsee that this pip package could become more than that for example:

from whisper_mic.mic import WhisperMic
mic = WhisperMic(timeout=5)
command = mic.listen()

License

The model weights of Whisper are released under the MIT License. See their repo for more information.

This code under this repo is under the MIT license. See LICENSE for further details.

Thanks

Until recently, access to high performing speech to text models was only available through paid serviecs. With this release, I am excited for the many applications that will come.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_voice_commands-0.0.6.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_voice_commands-0.0.6-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file whisper_voice_commands-0.0.6.tar.gz.

File metadata

  • Download URL: whisper_voice_commands-0.0.6.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for whisper_voice_commands-0.0.6.tar.gz
Algorithm Hash digest
SHA256 25baa7fd03f670cafc72744660763b40e9981b381aea7aad7b486dd21e724392
MD5 a1f62299f717cfcb136f5dd11cb5fc31
BLAKE2b-256 a02ad9ee20ce7310bcc7271e8ff3760814a0f1361b792f983c64416e82d37edb

See more details on using hashes here.

File details

Details for the file whisper_voice_commands-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_voice_commands-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 da8daef5af45ead33ed95589068d49a4dfd206fa4a49d4d40e4039a1fe5dcfb8
MD5 a434c7ed90961fa91e46e773c94c90d3
BLAKE2b-256 4c211d3b931b855deafb8dc002bf85d90c40f377f875c1732223ea93f52667a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page