Execute scripts with Whisper for your microphone
Project description
Whisper Mic
This repo is based on the work done here by OpenAI. This repo allows you use use a mic to run scripts. This repo copies some of the README from original project.
Video Tutorial
See the video tutorial for this repo here. This is a fork of here the video may not be relevant
Professional Assistance
If are in need of paid professional help, that is available through this email
Setup
Now a pip package!
- Create a venv of your choice.
- Run
pip install whisper-voice-commands
Example usage
whisper-voice-commands --model tiny --script_path ~youruser/scripts/ --english --ambient --dynamic_energy
Check whisper-voice-commands --help for more details
Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|---|---|---|---|---|---|
| tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
| base | 74 M | base.en |
base |
~1 GB | ~16x |
| small | 244 M | small.en |
small |
~2 GB | ~6x |
| medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
| large | 1550 M | N/A | large |
~10 GB | 1x |
For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.
Microphone Demo
You can use the model with a microphone using the whisper-voice-commands program. Use -h to see flag options.
Some of the more important flags are the --model and --english flags.
Troubleshooting
If you are having issues with the cli.py not running try the following:
sudo apt install portaudio19-dev python3-pyaudio
Contributing
Currently, this is just a cli demo. I forsee that this pip package could become more than that for example:
from whisper_mic.mic import WhisperMic
mic = WhisperMic(timeout=5)
command = mic.listen()
License
The model weights of Whisper are released under the MIT License. See their repo for more information.
This code under this repo is under the MIT license. See LICENSE for further details.
Thanks
Until recently, access to high performing speech to text models was only available through paid serviecs. With this release, I am excited for the many applications that will come.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_voice_commands-0.0.6.tar.gz.
File metadata
- Download URL: whisper_voice_commands-0.0.6.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25baa7fd03f670cafc72744660763b40e9981b381aea7aad7b486dd21e724392
|
|
| MD5 |
a1f62299f717cfcb136f5dd11cb5fc31
|
|
| BLAKE2b-256 |
a02ad9ee20ce7310bcc7271e8ff3760814a0f1361b792f983c64416e82d37edb
|
File details
Details for the file whisper_voice_commands-0.0.6-py3-none-any.whl.
File metadata
- Download URL: whisper_voice_commands-0.0.6-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da8daef5af45ead33ed95589068d49a4dfd206fa4a49d4d40e4039a1fe5dcfb8
|
|
| MD5 |
a434c7ed90961fa91e46e773c94c90d3
|
|
| BLAKE2b-256 |
4c211d3b931b855deafb8dc002bf85d90c40f377f875c1732223ea93f52667a6
|