Voice Recognition Bridge for Linux - Speak naturally, control your system, type hands-free
Project description
Termivox
Voice Recognition Bridge for Linux โ Speak naturally, control your system, type hands-free.
๐ฏ Overview
Termivox is a Linux-based voice recognition system that transforms your speech into text and system commands. Using offline voice recognition (Vosk), it provides:
- Hands-free dictation - Speak and watch your words appear
- Voice-controlled system commands - Copy, paste, click, scroll by voice
- Multi-language support - English and French recognition
- Toggle control - Pause/resume recognition instantly like a guitar pedal
- Privacy-first - All processing happens locally, no cloud required
โจ Features
๐ค Voice Recognition
- Offline speech-to-text powered by Vosk
- Bilingual support: English (
en) and French (fr) - Punctuation by voice - Say "comma", "period", "question mark"
- Edit commands - "new line", "tab", "new paragraph"
- System commands - "copy", "paste", "click", "scroll up/down"
๐ค AI Enhancement (NEW!)
Transform raw speech into natural, fluent text with AI-powered refinement:
- Multi-provider support - Google Gemini or OpenAI GPT
- Intelligent understanding - Handles natural speech patterns, hesitations, mixed languages
- Multilingual mastery - Perfect French/English detection and grammar
- Smart punctuation - Voice commands applied intelligently
- Context preservation - Maintains your intent and style
- Buffering modes - Realtime, sentence, or paragraph-based refinement
How it works:
Your speech โ Vosk transcription โ AI refinement โ Perfect text output
The AI understands:
- Natural speaking rhythm (pauses, "euh", "um")
- Mixed French/English in same sentence
- Technical terms preservation (Termivox, toggle, etc.)
- Voice punctuation commands ("comma", "virgule", "period")
Example transformations:
๐ค "ok lร j'suis dans le mรฉtro euh attends... oui bref fais un paragraphe pour dire que Termivox fonctionne parfaitement virgule et que je vais l'utiliser pour รฉcrire mes notes"
โจ "Termivox fonctionne parfaitement, et je vais l'utiliser pour รฉcrire mes notes."
๐๏ธ Toggle Control
Control voice recognition ON/OFF with multiple interfaces:
โจ๏ธ Global Hotkey
- Press
Ctrl+Alt+Vfrom anywhere to toggle - Customizable key combination
- Works across all applications
๐ฑ๏ธ Desktop Widget
- Minimal floating window (160ร70px)
- One-click toggle button
- Visual status: "LISTENING" (green) / "MUTED" (gray)
- Draggable, always-on-top
- Never steals cursor focus
๐๏ธ System Tray Icon
- Green/red status indicator
- Click to toggle
- Right-click menu
๐ฎ Hardware Support (Coming Soon)
- USB foot pedal support
- MIDI controller integration
- Custom button devices
๐ฆ Installation
Prerequisites
System Requirements:
- Linux (tested on Ubuntu 24.04)
- Python 3.8+
- Microphone input
System Dependencies:
sudo apt install python3-pyaudio xdotool sox portaudio19-dev -y
Quick Install (Recommended)
Using pipx (isolated installation):
# Install pipx if needed
sudo apt install pipx
pipx ensurepath
# Install Termivox (includes AI support)
pipx install termivox
# Run first-time setup
termivox init
Using pip (global/venv installation):
# Install Termivox (includes AI support)
pip install termivox
# Run first-time setup
termivox init
From Source (Development)
-
Clone the repository:
git clone https://github.com/Gerico1007/termivox.git cd termivox
-
Create virtual environment:
python3 -m venv termivox-env source termivox-env/bin/activate
-
Install in development mode:
# Install with all dependencies (includes AI support) pip install -e .
-
Run first-time setup:
termivox init
First-Time Setup Wizard
The termivox init command provides an interactive setup wizard that:
- โ Checks system dependencies
- ๐ Lets you choose language (English/French)
- ๐ฅ Downloads voice recognition model
- ๐ค Optionally configures AI enhancement
- Choose provider (Gemini/OpenAI)
- Add API key
- ๐ Creates configuration files
Example:
$ termivox init
============================================================
๐ค Termivox - First-Time Setup Wizard
============================================================
Welcome to Termivox!
This wizard will help you set up voice recognition on your system.
๐ฆ Checking dependencies...
โ All dependencies found
๐ Choose voice recognition language:
โ 1. English (en)
2. French (fr)
Choice [1-2] (default: 1): 1
๐ฅ Downloading voice model (en)...
โ Voice model downloaded successfully
๐ค AI Enhancement Setup
AI enhancement refines your voice transcription:
โข Corrects grammar naturally
โข Handles bilingual input (French/English)
โข Removes filler words
โข Processes voice commands
Enable AI enhancement? [Y/n]: y
Choose AI provider:
โ 1. Google Gemini (recommended, free tier available)
2. OpenAI GPT (requires paid account)
3. Skip for now
Choice [1-3] (default: 1): 1
๐ GEMINI API Key
Get your API key at: https://makersuite.google.com/app/apikey
Enter your GEMINI API key: AIza...
โ Created .env file
โ Created config file
============================================================
โ
Setup Complete!
============================================================
Next steps:
1. Run: termivox
2. Press Ctrl+Alt+V to toggle voice recognition
3. Speak naturally - your words will be typed!
๐ค AI Enhancement: GEMINI (enabled)
For help: termivox --help
๐ Usage
Quick Start
After installation, simply run:
termivox
CLI Commands
First-time setup:
termivox init # Interactive setup wizard
Normal operation:
termivox # Run with default settings
termivox --lang fr # Use French
termivox --no-toggle # Disable toggle (always-on mode)
AI configuration:
termivox --ai # Configure AI enhancement
Help and version:
termivox --help # Show help
termivox --version # Show version
From source (development):
source termivox-env/bin/activate
python src/cli.py # Main entry point
python src/cli.py init # Run setup wizard
Toggle Control
Once Termivox is running, control it using:
Hotkey:
- Press
Ctrl+Alt+Vโ Pauses/resumes voice recognition - Works from any window, keeps cursor position
Widget:
- Click the floating "LISTENING" or "MUTED" button
- Drag the title bar to reposition
- Right-click to close widget
Indicator:
- Green = Voice recognition ACTIVE (listening)
- Gray/Red = Voice recognition MUTED (paused)
Voice Commands
Dictation:
"Hello world" โ types: Hello world
Punctuation:
"Hello comma world period" โ types: Hello, world.
Available punctuation:
- comma, period, question mark, exclamation mark
- colon, semicolon, dash, quote, apostrophe
Editing:
"new line" โ โต
"new paragraph" โ โตโต
"tab" โ โฅ
System Commands:
"copy" โ Ctrl+C
"paste" โ Ctrl+V
"select all" โ Ctrl+A
"click" โ Mouse click
"scroll up" โ Scroll wheel up
"scroll down" โ Scroll wheel down
Language Selection
English (default):
./run.sh
# or
python src/main.py --lang en
French:
python src/main.py --lang fr
โ๏ธ Configuration
Edit config/settings.json to customize behavior:
{
"interfaces": {
"hotkey": {
"enabled": true,
"key": "ctrl+alt+v" // Change hotkey here
},
"tray": {
"enabled": false // Enable system tray icon
},
"widget": {
"enabled": true, // Desktop widget
"position": {"x": 100, "y": 100},
"size": {"width": 160, "height": 70},
"always_on_top": true
}
},
"voice": {
"language": "en", // Default language
"auto_space": true // Auto-add spaces
},
"ai": {
"enabled": true, // Enable AI enhancement
"provider": "gemini", // "gemini" or "openai"
"model": null, // null = use default model
"buffer_mode": "sentence", // "realtime", "sentence", "paragraph"
"buffer_size": 50 // Max characters before forcing refinement
}
}
AI Configuration Options
Providers:
"gemini"- Google Gemini (default: gemini-2.0-flash-exp)"openai"- OpenAI GPT (default: gpt-4o-mini)
Buffer Modes:
"realtime"- Refine every phrase immediately (slower, most accurate)"sentence"- Wait for sentence completion (balanced)"paragraph"- Wait for paragraph breaks (faster, less frequent)
Environment Variables:
# In .env file
GEMINI_API_KEY=your_gemini_key_here
OPENAI_API_KEY=your_openai_key_here
Custom Hotkey Examples:
"ctrl+shift+v""ctrl+alt+t""super+v"
๐ Project Structure
termivox/
โโโ src/
โ โโโ main.py # Main entry point with toggle support
โ โโโ test_voice_script.py # Standalone testing utility
โ โโโ voice/
โ โ โโโ recognizer.py # Vosk voice recognition engine
โ โ โโโ __init__.py
โ โโโ ai/ # AI enhancement layer (NEW!)
โ โ โโโ ai_service.py # Multi-provider AI abstraction
โ โ โโโ __init__.py
โ โโโ bridge/
โ โ โโโ xdotool_bridge.py # System command executor
โ โ โโโ __init__.py
โ โโโ ui/ # Toggle control interfaces
โ โ โโโ toggle_controller.py # Central state management
โ โ โโโ hotkey_interface.py # Global hotkey listener
โ โ โโโ tray_interface.py # System tray icon
โ โ โโโ widget_interface.py # Desktop widget
โ โ โโโ hardware_interface.py # Hardware button stub
โ โ โโโ config_loader.py # Configuration system
โ โ โโโ __init__.py
โ โโโ utils/
โ โโโ __init__.py
โโโ config/
โ โโโ settings.json # User configuration
โโโ voice_models/ # Vosk language models
โ โโโ vosk-model-small-en-us-0.15/
โโโ .env.example # API key template (NEW!)
โโโ requirements.txt # Python dependencies
โโโ run.sh # Launch script
โโโ download_model.py # Model downloader
โโโ README.md
๐ ๏ธ Dependencies
Python Packages:
Vosk- Offline speech recognitionpyaudio- Microphone inputnumpy- Audio processingpynput- Global hotkey supportpystray- System tray iconPillow- Icon generationxdotool- System command executiongoogle-generativeai- Gemini AI (optional)openai- OpenAI GPT (optional)
System Packages:
python3-pyaudio- PyAudio bindingsxdotool- Keyboard/mouse automationsox- Audio utilitiesportaudio19-dev- Audio development headers
๐จ Toggle Widget Design
Minimal Professional Aesthetic:
โโโโโโโโโโโโโโโโโโโโโโโ
โ TERMIVOX โ โ โ Dark title bar (draggable)
โโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ LISTENING โ โ Green button (active state)
โ โ
โโโโโโโโโโโโโโโโโโโโโโโ
Features:
- Compact: 160ร70 pixels
- Unfocusable: Never steals cursor
- Draggable: Reposition anywhere
- Color-coded: Green (ON) / Gray (OFF)
- Always-on-top: Stays visible
๐งช Testing
Test voice recognition without typing:
source termivox-env/bin/activate
python src/test_voice_script.py --lang en
Test with toggle control:
./run.sh
# Then try:
# 1. Speak something
# 2. Press Ctrl+Alt+V
# 3. Speak again (should not type)
# 4. Press Ctrl+Alt+V
# 5. Speak (should type again)
Test different languages:
python src/test_voice_script.py --lang fr # French
python src/test_voice_script.py --lang en # English
๐ Troubleshooting
Hotkey doesn't work:
- Check terminal for errors
- Try different hotkey in
config/settings.json - Ensure pynput is installed:
pip list | grep pynput
No voice recognition:
- Check microphone:
arecord -l - Test PyAudio:
python -c "import pyaudio; print('OK')" - Verify Vosk model downloaded in
voice_models/
Widget not visible:
- Enable in config:
"widget": {"enabled": true} - Check if tkinter available:
python -c "import tkinter"
System tray icon missing:
- Desktop environment may not support system tray
- Use widget or hotkey instead
- Try enabling:
"tray": {"enabled": true}
๐ค Contributing
Contributions welcome! Areas for enhancement:
- Additional language models
- Custom wake word detection
- Audio feedback on toggle
- Hardware button integration
- Voice command macros
- GUI configuration tool
To contribute:
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
๐ License
MIT License - See LICENSE file for details
๐ Acknowledgments
- Vosk - Offline speech recognition engine
- pynput - Cross-platform input control
- pystray - System tray integration
- xdotool - X11 automation
๐ฎ Roadmap
- AI-powered transcription enhancement (Gemini, OpenAI)
- Multilingual AI understanding (French/English)
- Voice command macros
- Custom wake word support
- GUI settings editor
- Hardware button integration (foot pedal, MIDI)
- Audio feedback options
- Additional language models (Spanish, German, etc.)
- Plugin system for custom commands
- Cloud sync for settings (optional)
- Real-time AI streaming (word-by-word refinement)
โ ๏ธ Nyro - Structural foundation, modular architecture ๐ฟ Aureon - Flow preservation, accessibility focus ๐ธ JamAI - Musical encoding, harmonic design
Built with recursive intention. Speak, toggle, flow.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file termivox-0.1.3.tar.gz.
File metadata
- Download URL: termivox-0.1.3.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fead412bf33b072ae9a576326aadea9976d5dc5485875240ef80c18082b0e03
|
|
| MD5 |
8ba33eaf6cb84ac8ca16dfff622300e9
|
|
| BLAKE2b-256 |
f24358c03c55720e129edea51725a173403fc8d1d01c5952edd57781f9751ec8
|
File details
Details for the file termivox-0.1.3-py3-none-any.whl.
File metadata
- Download URL: termivox-0.1.3-py3-none-any.whl
- Upload date:
- Size: 40.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0921d90f623f1e7f32d592b23a71b869c45409efb193b1e3c09633eef6b20422
|
|
| MD5 |
90e88c969c25fa48cc9b59aa1d5601a9
|
|
| BLAKE2b-256 |
0423b627382fb045f381362aea247e447015c26391f9e1dcab42e8fc259c53b6
|