Vaani is an open-source, AI-powered speech-to-text desktop app. Vaani (वाणी) refers to "speech" or "voice" in Sanskrit.

These details have not been verified by PyPI

Project links

Project description

Vaani - Private, Offline, Universal AI Speech-to-Text Desktop App 🎤

Vaani (वाणी), meaning "speech" or "voice" in Sanskrit, is an open-source, AI-powered desktop application that provides private, offline, real-time speech-to-text transcription. Use your voice to type into any application on your desktop – web browsers, text editors, email clients, chat apps, and more. Your voice data is processed entirely on your local machine, ensuring your conversations and dictations remain confidential.

Vaani leverages the efficiency of faster-whisper and the flexibility of the PySide6 (Qt) framework to create a seamless, secure, and universal dictation experience.

📽️ Demo

✨ Features

🌐 Universal Input: Dictate directly into virtually any application or text field that accepts keyboard input on your desktop. Works seamlessly across browsers, documents, code editors, chat clients, etc.
🔒 Privacy First - Offline Processing: All transcription happens locally on your computer. Your voice data is never sent to the cloud.
Real-time Transcription: Speak and watch your words appear in almost any active application.
High-Quality AI Model: Powered by faster-whisper, offering various model sizes (tiny to large) for a balance between speed and accuracy. Models are downloaded once and run locally.
GPU Acceleration: Supports CUDA (NVIDIA GPUs) for significantly faster local transcription (CPU fallback available).
System Tray Control: Runs conveniently in the system tray for easy access.
Configurable Global Hotkeys: Start/stop listening, open settings, test microphone, and more with customizable keyboard shortcuts (using the keyboard library).
Visual Recording Indicator: An optional, movable, always-on-top window shows when Vaani is actively listening, including a real-time audio energy meter.
Robust Text Insertion (Windows): Uses multiple techniques (pywin32 APIs, fallback methods) for reliable text input across different Windows applications.
Comprehensive Settings:
- Select audio input device.
- Choose Whisper model size and processing device (CPU/CUDA).
- Adjust audio parameters (silence thresholds, padding).
- Configure hotkeys.
- Toggle noise reduction.
Microphone Testing Utility: Includes a tool to visually test your microphone input and perform a sample transcription.
Optional Noise Reduction: Helps filter out background noise for clearer transcriptions (requires noisereduce).

🔒 Privacy & Offline Operation

A core design principle of Vaani is user privacy.

Local Processing: Speech recognition is performed entirely on your device using the faster-whisper library and downloaded models.
No Cloud Dependency: Unlike many commercial speech-to-text services, Vaani does not require an internet connection for its core transcription functionality (only for the initial model download if not present) and never sends your audio data to external servers.
Confidentiality: Your dictations, conversations, or sensitive information spoken while Vaani is active remain on your computer.

🎯 Target Audience & Platform

This project is currently aimed primarily at Python developers and technical users comfortable with installing Python packages and potentially troubleshooting environment issues.

Vaani has been primarily developed and tested on Windows. While the core components use cross-platform libraries (PySide6, faster-whisper), features like global hotkeys (keyboard) and the primary text insertion method (pywin32) have platform-specific behaviors or requirements:

Windows: Best supported platform currently. Text insertion is most robust.
Linux/macOS: May require additional setup (especially for pyaudio). Global hotkeys via the keyboard library require root/administrator privileges and might interfere with system settings. Text insertion relies on the pyperclip/pyautogui fallback, which may have limitations. Contributions to improve cross-platform support are welcome!

🛠️ Prerequisites

Python: Version 3.10 or higher.
pip: Python's package installer (usually comes with Python).
Audio Backend (PyAudio): pyaudio installation can sometimes be tricky. You might need:
- Windows: Usually works out-of-the-box if installing from wheels. Might require Microsoft Visual C++ Build Tools if building from source.
- Linux: portaudio19-dev (Debian/Ubuntu) or portaudio-devel (Fedora).
- macOS: portaudio (via Homebrew: brew install portaudio).
(Optional) NVIDIA GPU & CUDA: For GPU acceleration:
- A CUDA-compatible NVIDIA GPU.
- NVIDIA CUDA Toolkit installed and configured correctly (Vaani attempts to find it via the path specified in Settings, falling back to PATH environment variable on Windows). faster-whisper requires specific CUDA versions - check their documentation for details.
(Linux/macOS) Root/Admin Privileges: Required for the keyboard library to capture global hotkeys.

🚀 Installation

Install from PyPI:

pip install vaani-speech-to-text

Install from GitHub:

git clone https://github.com/webstruck/vaani-speech-to-text.git
cd vaani-speech-to-text
uv pip install -r pyproject.toml

Note: The first time you run Vaani or select a new model size, the faster-whisper library will download the required model files (this requires an internet connection). Subsequent uses of that model will be fully offline.

▶️ Usage

Once installed, run the application from your terminal:

vaani

The application icon will appear in your system tray.

Default Hotkeys:

Toggle Listening: Ctrl+Alt+Z
Open Settings: Ctrl+Alt+Q
Test Microphone: Ctrl+Alt+T
Toggle Debug Mode: Ctrl+Alt+D (Saves audio chunks locally if enabled)
Exit Application: Ctrl+Alt+X

Right-click the system tray icon for menu options (Start/Stop, Settings, Test Mic, Exit).

Updates

LLM-Based Text Processing

The latest update introduces LLM-based text processing capabilities using Ollama:

Enhanced Transcription Quality: Uses local language models to improve grammar, spelling, and overall text quality of transcriptions
Easy Integration: Works with locally-running Ollama models like Mistral, Llama, etc. Works best with Gemma 3 Quantization aware trained models (QAT) gemma3:1b-it-qat which is just 1 GB.
Fault-Tolerant Design: Falls back to basic text processing if LLM is not available or times out

To use this feature:

Install Ollama on your system
Run a model of your choice (e.g., ollama run gemma3:1b-it-qat)

⚙️ Configuration

Settings are stored in a settings.json file located in:

Windows: %USERPROFILE%\.speech_to_text_app (e.g., C:\Users\YourName\.speech_to_text_app)
Linux/macOS: ~/.speech_to_text_app (e.g., /home/yourname/.speech_to_text_app)

You can configure various options through the Settings dialog (accessible via hotkey or tray menu), including:

Audio input device
Whisper model size (tiny, base, small, medium, large)
Processing device (cpu or cuda)
CUDA Toolkit path (if needed)
Silence detection parameters
Hotkeys
UI options (visual indicator)

📦 Dependencies

Key dependencies include:

PySide6: For the graphical user interface.
faster-whisper: The core local AI transcription engine.
keyboard: For global hotkey management.
pyaudio: For microphone access.
numpy, scipy: For audio data manipulation and processing.
matplotlib: For the microphone test waveform display.
pywin32: (Windows only) For robust text insertion.
pyperclip, pyautogui: Fallback text insertion.
noisereduce: Optional background noise reduction.

(See pyproject.toml for specific version requirements)

🩺 Troubleshooting

Hotkeys Not Working:
- Linux/macOS: Ensure you are running the application with sudo or as root (required by the keyboard library). This has security implications, be aware!
- All Platforms: Check for conflicts with other applications using global hotkeys. Ensure the key names in the settings match those expected by the keyboard library.
Audio Device Issues:
- Ensure the correct microphone is selected in Settings -> Audio.
- Verify pyaudio installed correctly for your OS (see Prerequisites).
- Check OS-level microphone permissions for Python or the terminal.
- Try the "Test Microphone" utility.
Text Not Inserting:
- On Windows, Vaani tries multiple methods. Some specific applications (games, remote desktops, apps running with elevated privileges) might still resist input simulation.
- On Linux/macOS, relies on pyperclip/pyautogui, which might not work in all environments (e.g., Wayland without specific setup).
Slow Transcription:
- If using CPU, select a smaller model size (e.g., tiny, base, small).
- If you have a compatible NVIDIA GPU, ensure CUDA is set up correctly and selected as the device in Settings. Remember, all processing is local, so performance depends entirely on your hardware.
Model Download Failed: Ensure you have an internet connection the first time you select a specific model size. Check for firewall issues blocking the download.

🗺️ Future Plans / Roadmap

Create user-friendly installers for Windows (and potentially other platforms).
Language selection for transcription (requires corresponding Whisper models).
Add more post-processing text options.
Improve cross-platform compatibility (text insertion, hotkey alternatives).

🙌 Contributing

Contributions are welcome! If you'd like to help improve Vaani, please feel free to:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature or bugfix/YourBugfix).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/YourFeature).
Open a Pull Request.

Please report bugs or suggest features using the GitHub Issues tab.

📄 License

Distributed under the Apache-2.0 License. See LICENSE file for more information.

🙏 Acknowledgements

The team behind Whisper and faster-whisper for enabling high-quality, local transcription.
The developers of Qt and the PySide6 project.
All the creators of the dependent Python libraries.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Apr 20, 2025

0.1.2

Apr 17, 2025

0.1.1

Apr 8, 2025

0.1.0

Apr 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vaani_speech_to_text-0.1.3.tar.gz (57.0 kB view details)

Uploaded Apr 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vaani_speech_to_text-0.1.3-py3-none-any.whl (60.7 kB view details)

Uploaded Apr 20, 2025 Python 3

File details

Details for the file vaani_speech_to_text-0.1.3.tar.gz.

File metadata

Download URL: vaani_speech_to_text-0.1.3.tar.gz
Upload date: Apr 20, 2025
Size: 57.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for vaani_speech_to_text-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b9295269c58dc572c314adc0c8265c310b13831d5891418faafaa6af0d770348`
MD5	`82ab210afb0f0d1667978755674a4354`
BLAKE2b-256	`d05c476b525302ceb0a4132344f0051cf7159eb4eb26020d031a2ef2a4151f4e`

See more details on using hashes here.

File details

Details for the file vaani_speech_to_text-0.1.3-py3-none-any.whl.

File metadata

Download URL: vaani_speech_to_text-0.1.3-py3-none-any.whl
Upload date: Apr 20, 2025
Size: 60.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for vaani_speech_to_text-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a18f0d3684fb1a0b8e15db070ab1242a9898fc55f8fc63d1106240eece6da1f`
MD5	`fc4ed0441e4637350f554d385d160083`
BLAKE2b-256	`e25508b4962cb1c67b9aa9e4605db7726781ed6cfb527e7d6a60304b4135f8ff`

See more details on using hashes here.

vaani-speech-to-text 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vaani - Private, Offline, Universal AI Speech-to-Text Desktop App 🎤

📽️ Demo

✨ Features

🔒 Privacy & Offline Operation

🎯 Target Audience & Platform

🛠️ Prerequisites

🚀 Installation

▶️ Usage

Updates

LLM-Based Text Processing

⚙️ Configuration

📦 Dependencies

🩺 Troubleshooting

🗺️ Future Plans / Roadmap

🙌 Contributing

📄 License

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes