A simple dictation program using Whisper, pynput, and pystray. After installing, run with 'whisptray'. A tray icon will appear in the system tray. Click it to toggle dictation. Double click to exit.
Project description
whisptray App
A simple dictation program that uses OpenAI's Whisper for speech-to-text,
pynput for simulating keyboard input, and pystray for a system tray icon.
Features
- Real-time dictation using Whisper.
- Types recognized text into the currently active application.
- System tray icon to toggle dictation and exit the application.
- Configurable Whisper model and audio parameters via command-line arguments.
Installation
-
Clone this repository:
git clone https://github.com/yourusername/whisptray_app.git # Replace with your repo URL cd whisptray_app
-
It is recommended to use a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use `source .venv\Scripts\activate`
-
Install the package: This will install the
whisptraycommand and its dependencies.pip install .
Or for development (allows editing the code without reinstalling):
pip install -e .[dev]
-
Linux System Dependency (PortAudio for PyAudio):
PyAudiois a dependency for microphone access and requires the PortAudio library. If installation in the previous step fails orPyAudiohas issues, you may need to install the development headers.- Debian/Ubuntu-based systems:
sudo apt-get update && sudo apt-get install portaudio19-dev
- For other distributions, please consult their package manager for the appropriate PortAudio development package.
- Debian/Ubuntu-based systems:
-
System Dependency (ffmpeg for Whisper): Ensure
ffmpegis installed on your system, as Whisper requires it for audio processing.- Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
- Ubuntu/Debian:
-
System Dependency (AppIndicator & PyGObject for Tray Icon on Linux): For the system tray icon to function reliably on many Linux desktop environments (especially those using GNOME Shell),
pystrayworks best with theAppIndicatorbackend. This requiresPyGObject(Python bindings for GObject) and theAppIndicatorGObject introspection bindings.-
Debian/Ubuntu-based systems (e.g., Ubuntu 22.04 LTS): You'll need to install
gir1.2-appindicator3-0.1and the PyGObject development files. The specific PyGObject package might depend on your distribution version.sudo apt-get update && sudo apt-get install gir1.2-appindicator3-0.1 python3-gi python3-gi-cairo gir1.2-gtk-3.0
If you encounter issues related to
libgirepository, you might also need:sudo apt-get install libgirepository1.0-dev
Or for newer systems, potentially
libgirepository2.0-dev. -
Other Linux Distributions: Please search your distribution's package manager for the equivalents of:
appindicator3orlibappindicator3(e.g.,libappindicator-gtk3on Fedora)PyGObjectorpython-gobject(e.g.,python3-gobjecton Fedora)- The GObject Introspection development files (
gobject-introspectionor similar).
After installing these system packages, you might need to reinstall the Python dependencies if you are using a virtual environment to ensure they pick up the new system libraries:
pip install pystray --force-reinstall
(Note: The specific Python packages to reinstall might vary.
pystrayitself doesn't directly link to these system libraries at install time in a way that always necessitates reinstalling it, but ensuringPyGObjectis correctly picked up by Python is key. Often, activating the virtual environment after system package installation is sufficient.) -
Usage
Once installed, you can run the application using the whisptray command:
whisptray
A tray icon will appear. Click the icon to see options:
- Toggle Dictation: Starts or stops the dictation.
- Exit: Closes the application.
Command-line Arguments
You can customize the behavior using command-line arguments:
whisptray --model small --energy_threshold 1200
Available arguments:
--model: Whisper model to use (choices: "tiny", "base", "small", "medium", "large", "turbo" - default: "turbo").--non_english: Use the multilingual model variant (if applicable for the chosen size).--energy_threshold: Energy level for mic to detect (default: 1000).--record_timeout: How real-time the recording is in seconds (default: 2.0).--phrase_timeout: Silence duration before a new phrase is considered (default: 3.0).--default_microphone(Linux only): Name or part of the name of the microphone to use (default: 'pulse'). Usewhisptray --default_microphone listto see available microphones.
Development
To set up for development:
- Clone the repository (if you haven't already).
- Create and activate a virtual environment.
- Install in editable mode:
pip install -e . - (Optional) Install development tools:
pip install -e .[dev](if you add adevextra inpyproject.tomlfor linters, formatters, etc.)
License
This project is licensed under the MIT License - see the LICENSE file for details (though a LICENSE file hasn't been created yet in this session, pyproject.toml specifies MIT).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file whisptray-0.1.0.tar.gz.
File metadata
- Download URL: whisptray-0.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a560838fcfe4ea361d17e9331eea1aa1ebcbbdd1aff98a516d09373c39e4b7c8
|
|
| MD5 |
5a1d9754b714ef7f834a7deec8e8f5ce
|
|
| BLAKE2b-256 |
687a7e77b0716aa0b788b7a4e540475a0500c2348dab362402205612229e1072
|