Skip to main content

Maivi - My AI Voice Input: Real-time voice-to-text with hotkey support

Project description

Maivi - My AI Voice Input ๐ŸŽค

Real-time voice-to-text transcription with hotkey support

Maivi (My AI Voice Input) is a cross-platform desktop application that turns your voice into text using state-of-the-art AI models. Simply press Alt+Q to start recording, and press again to stop. Your transcription appears in real-time and is automatically copied to your clipboard.

License Python Platform

โœจ Features

  • ๐ŸŽค Hotkey Recording - Toggle recording with Alt+Q
  • โšก Real-time Transcription - See text appear as you speak
  • ๐Ÿ“‹ Clipboard Integration - Automatic copy to clipboard
  • ๐ŸชŸ Floating Overlay - Live transcription in a sleek overlay window
  • ๐Ÿ”„ Smart Chunk Merging - Advanced overlap-based merging eliminates duplicates
  • ๐Ÿ’ป CPU-Only - No GPU required (though GPU acceleration is supported)
  • ๐ŸŒ High Accuracy - Powered by NVIDIA Parakeet TDT 0.6B model (~6-9% WER)
  • ๐Ÿš€ Fast - ~0.36x RTF (processes 7s audio in 2.5s on CPU)

๐Ÿš€ Quick Start

Installation

pip install maivi

System Requirements

Linux:

sudo apt-get install portaudio19-dev python3-pyaudio

macOS:

brew install portaudio

Windows:

  • PortAudio is usually included with PyAudio

Usage

GUI Mode (Recommended):

maivi

Press Alt+Q to start recording, press Alt+Q again to stop. The transcription will appear in a floating overlay and be copied to your clipboard.

CLI Mode:

# Basic CLI
maivi-cli

# With live terminal UI
maia-cli --show-ui

# Custom parameters
maia-cli --window 10 --slide 5 --show-ui

Controls:

  • Alt+Q - Start/stop recording (toggle mode)
  • Esc - Exit application

๐Ÿ“– How It Works

Maia uses a sophisticated streaming architecture:

  1. Sliding Window Recording - Captures audio in overlapping 7-second chunks every 3 seconds
  2. Real-time Transcription - Each chunk is transcribed by the NVIDIA Parakeet model
  3. Smart Merging - Chunks are merged using overlap detection (4-second overlap)
  4. Live Updates - The UI updates in real-time as transcription progresses

Why Overlapping Chunks?

Chunk 1: "hello world how are you"
Chunk 2: "how are you doing today"
          ^^^^^^^^^^^^^^
          Overlap detected โ†’ merge!

Result: "hello world how are you doing today"

This approach ensures:

  • โœ… No words cut mid-syllable
  • โœ… Context preserved for better accuracy
  • โœ… Seamless merging without duplicates
  • โœ… Fast processing (no queue buildup)

โš™๏ธ Configuration

Chunk Parameters

maia-cli --window 7.0 --slide 3.0 --delay 2.0
  • --window: Chunk size in seconds (default: 7.0)
    • Larger = better quality, slower processing
  • --slide: Slide interval in seconds (default: 3.0)
    • Smaller = more overlap, higher CPU usage
    • Rule: Must be > window ร— 0.36 to avoid queue buildup
  • --delay: Processing start delay in seconds (default: 2.0)

Advanced Options

# Speed adjustment (experimental)
maia-cli --speed 1.5

# Custom UI width
maia-cli --show-ui --ui-width 50

# Disable pause detection
maia-cli --no-pause-breaks

# Stream to file (for voice commands)
maia-cli --output-file transcription.txt

๐Ÿ“ฆ Building Executables

Maivi can be packaged as standalone executables for easy distribution:

# Install build dependencies
pip install maivi[build]

# Build executable
pyinstaller --onefile --windowed \
  --name maivi \
  --add-data "src/maia:maia" \
  src/maia/__main__.py

Pre-built executables are available in Releases.

๐Ÿ—๏ธ Development

Setup Development Environment

# Clone repository
git clone https://github.com/MaximeRivest/maivi.git
cd maivi

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

Project Structure

maia/
โ”œโ”€โ”€ src/maia/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ __main__.py           # GUI entry point
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ streaming_recorder.py
โ”‚   โ”‚   โ”œโ”€โ”€ chunk_merger.py
โ”‚   โ”‚   โ””โ”€โ”€ pause_detector.py
โ”‚   โ”œโ”€โ”€ gui/
โ”‚   โ”‚   โ””โ”€โ”€ qt_gui.py
โ”‚   โ”œโ”€โ”€ cli/
โ”‚   โ”‚   โ”œโ”€โ”€ cli.py
โ”‚   โ”‚   โ”œโ”€โ”€ server.py
โ”‚   โ”‚   โ””โ”€โ”€ terminal_ui.py
โ”‚   โ””โ”€โ”€ utils/
โ”œโ”€โ”€ tests/
โ”œโ”€โ”€ docs/
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ LICENSE

๐Ÿ› Troubleshooting

"No overlap found" warnings

This is expected behavior when there are long pauses (5+ seconds of silence). The system adds "..." gap markers to indicate the pause.

Queue buildup (transcription continues after stopping)

Check that processing time < slide interval:

  • Processing: window_seconds ร— 0.36 (RTF)
  • Should be < slide_seconds
  • Default: 7 ร— 0.36 = 2.52s < 3s โœ…

Model download issues

The first run downloads the NVIDIA Parakeet model (~600MB) from HuggingFace. If download fails:

  • Check internet connection
  • Verify HuggingFace is accessible
  • Clear cache: rm -rf ~/.cache/huggingface/

Qt/GUI crashes

If the GUI crashes on Linux:

# Check Qt installation
python -c "from PySide6 import QtWidgets; print('Qt OK')"

# Fall back to CLI mode
maia-cli --show-ui

๐Ÿ“Š Performance

Memory:

  • Model: ~2GB RAM
  • Audio buffer: ~1MB
  • Total: ~2.5GB RAM

CPU:

  • Idle: <5% CPU
  • Recording: 30-40% of 1 core
  • Transcription: 100% of 1 core (during processing)

Latency:

  • First transcription: 2s (start delay)
  • Updates: Every 3s (slide interval)
  • Completion: 1-3s after recording stops

Accuracy:

  • Model WER: ~5-8%
  • Overlap merging: <1% word loss
  • Total effective WER: ~6-9%

๐Ÿ—บ๏ธ Roadmap

v0.2 - Platform Support:

  • Test and verify macOS support
  • Test and verify Windows support
  • Platform-specific installers (.app, .exe)

v0.3 - Features:

  • Configurable hotkeys via GUI
  • Multi-language support
  • Custom model selection
  • Voice commands support

v0.4 - Optimization:

  • GPU acceleration (CUDA)
  • Export formats (JSON, SRT)
  • Text editor integration
  • Plugin system

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ’ฌ Support


Made with โค๏ธ by Maxime Rivest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maivi-0.1.0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maivi-0.1.0-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file maivi-0.1.0.tar.gz.

File metadata

  • Download URL: maivi-0.1.0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for maivi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2d53aae6c64593a41a5fd834969c9adb057980cc73c803f8fe37c2a3ead71d7d
MD5 2b8ebc2b61592e4135ed053e99e7d526
BLAKE2b-256 df7ce1cde90d70bbeb9a2b73463f3804bd5301478c0844a25332d2c50bd2736b

See more details on using hashes here.

File details

Details for the file maivi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: maivi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for maivi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d53b5de8c1792b1fb899c1c3de96565fbb40fe420f28d70490206538f1f2c0c7
MD5 a562faa03d7ba36b8780c23f4f655222
BLAKE2b-256 7982bca14df5844bce189095d0716c63d1e92f1adfa22397a1a85d16fcbf1a9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page