Maivi - My AI Voice Input: Real-time voice-to-text with hotkey support
Project description
Maivi - My AI Voice Input ๐ค
Real-time voice-to-text transcription with hotkey support
Maivi (My AI Voice Input) is a cross-platform desktop application that turns your voice into text using state-of-the-art AI models. Simply press Alt+Q (Option+Q on macOS) to start recording, and press again to stop. Your transcription appears in real-time and is automatically copied to your clipboard.
โจ Features
- ๐ค Hotkey Recording - Toggle recording with Alt+Q (Option+Q on macOS)
- โก Real-time Transcription - See text appear as you speak
- ๐ Clipboard Integration - Automatic copy to clipboard
- ๐ช Floating Overlay - Live transcription in a sleek overlay window
- ๐ Smart Chunk Merging - Advanced overlap-based merging eliminates duplicates
- ๐ป CPU-Only - No GPU required (though GPU acceleration is supported)
- ๐ High Accuracy - Powered by NVIDIA Parakeet TDT 0.6B model (~6-9% WER)
- ๐ Fast - ~0.36x RTF (processes 7s audio in 2.5s on CPU)
๐ Quick Start
Installation
CPU-only (Recommended - much faster, 100MB vs 2GB+):
pip install maivi --extra-index-url https://download.pytorch.org/whl/cpu
Or with GPU support (if you have NVIDIA GPU):
pip install maivi --extra-index-url https://download.pytorch.org/whl/cu121
Standard install (may download large CUDA files):
pip install maivi
System Requirements
Linux:
sudo apt-get install portaudio19-dev python3-pyaudio
macOS: Grant Maivi microphone, Accessibility, and Input Monitoring permissions the first time you run it (System Settings โ Privacy & Security). No additional Homebrew packages are required for audio capture.
Windows:
- PortAudio is usually included with PyAudio
Usage
GUI Mode (Recommended):
maivi
Press Alt+Q (Option+Q on macOS) to start recording, press Alt+Q again to stop. The transcription will appear in a floating overlay and be copied to your clipboard.
CLI Mode:
# Basic CLI
maivi-cli
# With live terminal UI
maia-cli --show-ui
# Custom parameters
maia-cli --window 10 --slide 5 --show-ui
Controls:
- Alt+Q (Option+Q on macOS) - Start/stop recording (toggle mode)
- Esc - Exit application
๐ How It Works
Maia uses a sophisticated streaming architecture:
- Sliding Window Recording - Captures audio in overlapping 7-second chunks every 3 seconds
- Real-time Transcription - Each chunk is transcribed by the NVIDIA Parakeet model
- Smart Merging - Chunks are merged using overlap detection (4-second overlap)
- Live Updates - The UI updates in real-time as transcription progresses
Why Overlapping Chunks?
Chunk 1: "hello world how are you"
Chunk 2: "how are you doing today"
^^^^^^^^^^^^^^
Overlap detected โ merge!
Result: "hello world how are you doing today"
This approach ensures:
- โ No words cut mid-syllable
- โ Context preserved for better accuracy
- โ Seamless merging without duplicates
- โ Fast processing (no queue buildup)
โ๏ธ Configuration
Chunk Parameters
maia-cli --window 7.0 --slide 3.0 --delay 2.0
--window: Chunk size in seconds (default: 7.0)- Larger = better quality, slower processing
--slide: Slide interval in seconds (default: 3.0)- Smaller = more overlap, higher CPU usage
- Rule: Must be >
window ร 0.36to avoid queue buildup
--delay: Processing start delay in seconds (default: 2.0)
Advanced Options
# Speed adjustment (experimental)
maia-cli --speed 1.5
# Custom UI width
maia-cli --show-ui --ui-width 50
# Disable pause detection
maia-cli --no-pause-breaks
# Stream to file (for voice commands)
maia-cli --output-file transcription.txt
๐ฆ Building Executables
Maivi can be packaged as standalone executables for easy distribution:
# Install build dependencies
pip install maivi[build]
# Build executable
pyinstaller --onefile --windowed \
--name maivi \
--add-data "src/maia:maia" \
src/maia/__main__.py
Pre-built executables are available in Releases.
๐๏ธ Development
Setup Development Environment
# Clone repository
git clone https://github.com/MaximeRivest/maivi.git
cd maivi
# Install in development mode
pip install -e .[dev]
# Run tests
pytest
Project Structure
maia/
โโโ src/maia/
โ โโโ __init__.py
โ โโโ __main__.py # GUI entry point
โ โโโ core/
โ โ โโโ streaming_recorder.py
โ โ โโโ chunk_merger.py
โ โ โโโ pause_detector.py
โ โโโ gui/
โ โ โโโ qt_gui.py
โ โโโ cli/
โ โ โโโ cli.py
โ โ โโโ server.py
โ โ โโโ terminal_ui.py
โ โโโ utils/
โโโ tests/
โโโ docs/
โโโ pyproject.toml
โโโ README.md
โโโ LICENSE
๐ Troubleshooting
"No overlap found" warnings
This is expected behavior when there are long pauses (5+ seconds of silence). The system adds "..." gap markers to indicate the pause.
Queue buildup (transcription continues after stopping)
Check that processing time < slide interval:
- Processing:
window_seconds ร 0.36(RTF) - Should be <
slide_seconds - Default:
7 ร 0.36 = 2.52s < 3sโ
Model download issues
The first run downloads the NVIDIA Parakeet model (~600MB) from HuggingFace. If download fails:
- Check internet connection
- Verify HuggingFace is accessible
- Clear cache:
rm -rf ~/.cache/huggingface/
Qt/GUI crashes
If the GUI crashes on Linux:
# Check Qt installation
python -c "from PySide6 import QtWidgets; print('Qt OK')"
# Fall back to CLI mode
maia-cli --show-ui
๐ Performance
Memory:
- Model: ~2GB RAM
- Audio buffer: ~1MB
- Total: ~2.5GB RAM
CPU:
- Idle: <5% CPU
- Recording: 30-40% of 1 core
- Transcription: 100% of 1 core (during processing)
Latency:
- First transcription: 2s (start delay)
- Updates: Every 3s (slide interval)
- Completion: 1-3s after recording stops
Accuracy:
- Model WER: ~5-8%
- Overlap merging: <1% word loss
- Total effective WER: ~6-9%
๐บ๏ธ Roadmap
v0.2 - Platform Support:
- Test and verify macOS support
- Test and verify Windows support
- Platform-specific installers (.app, .exe)
v0.3 - Features:
- Configurable hotkeys via GUI
- Multi-language support
- Custom model selection
- Voice commands support
v0.4 - Optimization:
- GPU acceleration (CUDA)
- Export formats (JSON, SRT)
- Text editor integration
- Plugin system
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- Built with NVIDIA NeMo ASR toolkit
- Uses Parakeet TDT 0.6B model
- GUI powered by PySide6
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ฌ Support
- ๐ซ Create an issue
- ๐ก Feature requests
- ๐ Bug reports
Made with โค๏ธ by Maxime Rivest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maivi-0.4.0.tar.gz.
File metadata
- Download URL: maivi-0.4.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2da3986f26414317d4f24565d459e5b2d381942f2b1e98779a909276fb60454
|
|
| MD5 |
90a0b73112aac8145d5bec7cd117747d
|
|
| BLAKE2b-256 |
ca05401319b89ff32d0f8b7ca297791c9bdd0fdfebf58736ecd3dd3296e96821
|
File details
Details for the file maivi-0.4.0-py3-none-any.whl.
File metadata
- Download URL: maivi-0.4.0-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
020ebc25bc19a59e1aa6c5efc90b4083cb23eb9101b0fe5fc69f255029e79929
|
|
| MD5 |
7e09b9cd21cc9145a6ed38a86f749502
|
|
| BLAKE2b-256 |
0857cf822554b8cbc9b7bbeedc4a833ba7750033a322490c4709fddc3dab3b88
|