Skip to main content

Voice Recognition Bridge for Linux - Speak naturally, control your system, type hands-free

Project description

Termivox

Voice Recognition Bridge for Linux โ€” Speak naturally, control your system, type hands-free.


๐ŸŽฏ Overview

Termivox is a Linux-based voice recognition system that transforms your speech into text and system commands. Using offline voice recognition (Vosk), it provides:

  • Hands-free dictation - Speak and watch your words appear
  • Voice-controlled system commands - Copy, paste, click, scroll by voice
  • Multi-language support - English and French recognition
  • Toggle control - Pause/resume recognition instantly like a guitar pedal
  • Privacy-first - All processing happens locally, no cloud required

โœจ Features

๐ŸŽค Voice Recognition

  • Offline speech-to-text powered by Vosk
  • Bilingual support: English (en) and French (fr)
  • Punctuation by voice - Say "comma", "period", "question mark"
  • Edit commands - "new line", "tab", "new paragraph"
  • System commands - "copy", "paste", "click", "scroll up/down"

๐Ÿค– AI Enhancement (NEW!)

Transform raw speech into natural, fluent text with AI-powered refinement:

  • Multi-provider support - Google Gemini or OpenAI GPT
  • Intelligent understanding - Handles natural speech patterns, hesitations, mixed languages
  • Multilingual mastery - Perfect French/English detection and grammar
  • Smart punctuation - Voice commands applied intelligently
  • Context preservation - Maintains your intent and style
  • Buffering modes - Realtime, sentence, or paragraph-based refinement

How it works:

Your speech โ†’ Vosk transcription โ†’ AI refinement โ†’ Perfect text output

The AI understands:

  • Natural speaking rhythm (pauses, "euh", "um")
  • Mixed French/English in same sentence
  • Technical terms preservation (Termivox, toggle, etc.)
  • Voice punctuation commands ("comma", "virgule", "period")

Example transformations:

๐ŸŽค "ok lร  j'suis dans le mรฉtro euh attends... oui bref fais un paragraphe pour dire que Termivox fonctionne parfaitement virgule et que je vais l'utiliser pour รฉcrire mes notes"

โœจ "Termivox fonctionne parfaitement, et je vais l'utiliser pour รฉcrire mes notes."

๐ŸŽ›๏ธ Toggle Control

Control voice recognition ON/OFF with multiple interfaces:

โŒจ๏ธ Global Hotkey

  • Press Ctrl+Alt+V from anywhere to toggle
  • Customizable key combination
  • Works across all applications

๐Ÿ–ฑ๏ธ Desktop Widget

  • Minimal floating window (160ร—70px)
  • One-click toggle button
  • Visual status: "LISTENING" (green) / "MUTED" (gray)
  • Draggable, always-on-top
  • Never steals cursor focus

๐ŸŽ›๏ธ System Tray Icon

  • Green/red status indicator
  • Click to toggle
  • Right-click menu

๐ŸŽฎ Hardware Support (Coming Soon)

  • USB foot pedal support
  • MIDI controller integration
  • Custom button devices

๐Ÿ“ฆ Installation

Prerequisites

System Requirements:

  • Linux (tested on Ubuntu 24.04)
  • Python 3.8+
  • Microphone input

System Dependencies:

sudo apt install python3-pyaudio xdotool sox portaudio19-dev -y

Quick Install (Recommended)

Using pipx (isolated installation):

# Install pipx if needed
sudo apt install pipx
pipx ensurepath

# Install Termivox (includes AI support)
pipx install termivox

# Run first-time setup
termivox init

Using pip (global/venv installation):

# Install Termivox (includes AI support)
pip install termivox

# Run first-time setup
termivox init

From Source (Development)

  1. Clone the repository:

    git clone https://github.com/Gerico1007/termivox.git
    cd termivox
    
  2. Create virtual environment:

    python3 -m venv termivox-env
    source termivox-env/bin/activate
    
  3. Install in development mode:

    # Install with all dependencies (includes AI support)
    pip install -e .
    
  4. Run first-time setup:

    termivox init
    

First-Time Setup Wizard

The termivox init command provides an interactive setup wizard that:

  1. โœ… Checks system dependencies
  2. ๐ŸŒ Lets you choose language (English/French)
  3. ๐Ÿ“ฅ Downloads voice recognition model
  4. ๐Ÿค– Optionally configures AI enhancement
    • Choose provider (Gemini/OpenAI)
    • Add API key
  5. ๐Ÿ“ Creates configuration files

Example:

$ termivox init

============================================================
๐ŸŽค Termivox - First-Time Setup Wizard
============================================================

Welcome to Termivox!
This wizard will help you set up voice recognition on your system.

๐Ÿ“ฆ Checking dependencies...
โœ“ All dependencies found

๐ŸŒ Choose voice recognition language:
  โ†’ 1. English (en)
    2. French (fr)

Choice [1-2] (default: 1): 1

๐Ÿ“ฅ Downloading voice model (en)...
โœ“ Voice model downloaded successfully

๐Ÿค– AI Enhancement Setup

AI enhancement refines your voice transcription:
  โ€ข Corrects grammar naturally
  โ€ข Handles bilingual input (French/English)
  โ€ข Removes filler words
  โ€ข Processes voice commands

Enable AI enhancement? [Y/n]: y

Choose AI provider:
  โ†’ 1. Google Gemini (recommended, free tier available)
    2. OpenAI GPT (requires paid account)
    3. Skip for now

Choice [1-3] (default: 1): 1

๐Ÿ“ GEMINI API Key
Get your API key at: https://makersuite.google.com/app/apikey

Enter your GEMINI API key: AIza...

โœ“ Created .env file
โœ“ Created config file

============================================================
โœ… Setup Complete!
============================================================

Next steps:
  1. Run: termivox
  2. Press Ctrl+Alt+V to toggle voice recognition
  3. Speak naturally - your words will be typed!

๐Ÿค– AI Enhancement: GEMINI (enabled)

For help: termivox --help

๐Ÿš€ Usage

Quick Start

After installation, simply run:

termivox

CLI Commands

First-time setup:

termivox init                    # Interactive setup wizard

Normal operation:

termivox                         # Run with default settings
termivox --lang fr               # Use French
termivox --no-toggle             # Disable toggle (always-on mode)

AI configuration:

termivox --ai                    # Configure AI enhancement

Help and version:

termivox --help                  # Show help
termivox --version               # Show version

From source (development):

source termivox-env/bin/activate
python src/cli.py               # Main entry point
python src/cli.py init          # Run setup wizard

Toggle Control

Once Termivox is running, control it using:

Hotkey:

  • Press Ctrl+Alt+V โ†’ Pauses/resumes voice recognition
  • Works from any window, keeps cursor position

Widget:

  • Click the floating "LISTENING" or "MUTED" button
  • Drag the title bar to reposition
  • Right-click to close widget

Indicator:

  • Green = Voice recognition ACTIVE (listening)
  • Gray/Red = Voice recognition MUTED (paused)

Voice Commands

Dictation:

"Hello world" โ†’ types: Hello world

Punctuation:

"Hello comma world period" โ†’ types: Hello, world.

Available punctuation:

  • comma, period, question mark, exclamation mark
  • colon, semicolon, dash, quote, apostrophe

Editing:

"new line"       โ†’ โ†ต
"new paragraph"  โ†’ โ†ตโ†ต
"tab"            โ†’ โ‡ฅ

System Commands:

"copy"           โ†’ Ctrl+C
"paste"          โ†’ Ctrl+V
"select all"     โ†’ Ctrl+A
"click"          โ†’ Mouse click
"scroll up"      โ†’ Scroll wheel up
"scroll down"    โ†’ Scroll wheel down

Language Selection

English (default):

./run.sh
# or
python src/main.py --lang en

French:

python src/main.py --lang fr

โš™๏ธ Configuration

Edit config/settings.json to customize behavior:

{
  "interfaces": {
    "hotkey": {
      "enabled": true,
      "key": "ctrl+alt+v"        // Change hotkey here
    },
    "tray": {
      "enabled": false            // Enable system tray icon
    },
    "widget": {
      "enabled": true,            // Desktop widget
      "position": {"x": 100, "y": 100},
      "size": {"width": 160, "height": 70},
      "always_on_top": true
    }
  },
  "voice": {
    "language": "en",             // Default language
    "auto_space": true            // Auto-add spaces
  },
  "ai": {
    "enabled": true,              // Enable AI enhancement
    "provider": "gemini",         // "gemini" or "openai"
    "model": null,                // null = use default model
    "buffer_mode": "sentence",    // "realtime", "sentence", "paragraph"
    "buffer_size": 50             // Max characters before forcing refinement
  }
}

AI Configuration Options

Providers:

  • "gemini" - Google Gemini (default: gemini-2.0-flash-exp)
  • "openai" - OpenAI GPT (default: gpt-4o-mini)

Buffer Modes:

  • "realtime" - Refine every phrase immediately (slower, most accurate)
  • "sentence" - Wait for sentence completion (balanced)
  • "paragraph" - Wait for paragraph breaks (faster, less frequent)

Environment Variables:

# In .env file
GEMINI_API_KEY=your_gemini_key_here
OPENAI_API_KEY=your_openai_key_here

Custom Hotkey Examples:

  • "ctrl+shift+v"
  • "ctrl+alt+t"
  • "super+v"

๐Ÿ“ Project Structure

termivox/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.py                    # Main entry point with toggle support
โ”‚   โ”œโ”€โ”€ test_voice_script.py       # Standalone testing utility
โ”‚   โ”œโ”€โ”€ voice/
โ”‚   โ”‚   โ”œโ”€โ”€ recognizer.py          # Vosk voice recognition engine
โ”‚   โ”‚   โ””โ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ ai/                        # AI enhancement layer (NEW!)
โ”‚   โ”‚   โ”œโ”€โ”€ ai_service.py          # Multi-provider AI abstraction
โ”‚   โ”‚   โ””โ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ bridge/
โ”‚   โ”‚   โ”œโ”€โ”€ xdotool_bridge.py      # System command executor
โ”‚   โ”‚   โ””โ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ ui/                        # Toggle control interfaces
โ”‚   โ”‚   โ”œโ”€โ”€ toggle_controller.py   # Central state management
โ”‚   โ”‚   โ”œโ”€โ”€ hotkey_interface.py    # Global hotkey listener
โ”‚   โ”‚   โ”œโ”€โ”€ tray_interface.py      # System tray icon
โ”‚   โ”‚   โ”œโ”€โ”€ widget_interface.py    # Desktop widget
โ”‚   โ”‚   โ”œโ”€โ”€ hardware_interface.py  # Hardware button stub
โ”‚   โ”‚   โ”œโ”€โ”€ config_loader.py       # Configuration system
โ”‚   โ”‚   โ””โ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ settings.json              # User configuration
โ”œโ”€โ”€ voice_models/                  # Vosk language models
โ”‚   โ””โ”€โ”€ vosk-model-small-en-us-0.15/
โ”œโ”€โ”€ .env.example                   # API key template (NEW!)
โ”œโ”€โ”€ requirements.txt               # Python dependencies
โ”œโ”€โ”€ run.sh                         # Launch script
โ”œโ”€โ”€ download_model.py              # Model downloader
โ””โ”€โ”€ README.md

๐Ÿ› ๏ธ Dependencies

Python Packages:

  • Vosk - Offline speech recognition
  • pyaudio - Microphone input
  • numpy - Audio processing
  • pynput - Global hotkey support
  • pystray - System tray icon
  • Pillow - Icon generation
  • xdotool - System command execution
  • google-generativeai - Gemini AI (optional)
  • openai - OpenAI GPT (optional)

System Packages:

  • python3-pyaudio - PyAudio bindings
  • xdotool - Keyboard/mouse automation
  • sox - Audio utilities
  • portaudio19-dev - Audio development headers

๐ŸŽจ Toggle Widget Design

Minimal Professional Aesthetic:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TERMIVOX         โ— โ”‚  โ† Dark title bar (draggable)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                     โ”‚
โ”‚    LISTENING        โ”‚  โ† Green button (active state)
โ”‚                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Features:

  • Compact: 160ร—70 pixels
  • Unfocusable: Never steals cursor
  • Draggable: Reposition anywhere
  • Color-coded: Green (ON) / Gray (OFF)
  • Always-on-top: Stays visible

๐Ÿงช Testing

Test voice recognition without typing:

source termivox-env/bin/activate
python src/test_voice_script.py --lang en

Test with toggle control:

./run.sh
# Then try:
# 1. Speak something
# 2. Press Ctrl+Alt+V
# 3. Speak again (should not type)
# 4. Press Ctrl+Alt+V
# 5. Speak (should type again)

Test different languages:

python src/test_voice_script.py --lang fr  # French
python src/test_voice_script.py --lang en  # English

๐Ÿ› Troubleshooting

Hotkey doesn't work:

  • Check terminal for errors
  • Try different hotkey in config/settings.json
  • Ensure pynput is installed: pip list | grep pynput

No voice recognition:

  • Check microphone: arecord -l
  • Test PyAudio: python -c "import pyaudio; print('OK')"
  • Verify Vosk model downloaded in voice_models/

Widget not visible:

  • Enable in config: "widget": {"enabled": true}
  • Check if tkinter available: python -c "import tkinter"

System tray icon missing:

  • Desktop environment may not support system tray
  • Use widget or hotkey instead
  • Try enabling: "tray": {"enabled": true}

๐Ÿค Contributing

Contributions welcome! Areas for enhancement:

  • Additional language models
  • Custom wake word detection
  • Audio feedback on toggle
  • Hardware button integration
  • Voice command macros
  • GUI configuration tool

To contribute:

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open Pull Request

๐Ÿ“„ License

MIT License - See LICENSE file for details


๐Ÿ™ Acknowledgments

  • Vosk - Offline speech recognition engine
  • pynput - Cross-platform input control
  • pystray - System tray integration
  • xdotool - X11 automation

๐Ÿ”ฎ Roadmap

  • AI-powered transcription enhancement (Gemini, OpenAI)
  • Multilingual AI understanding (French/English)
  • Voice command macros
  • Custom wake word support
  • GUI settings editor
  • Hardware button integration (foot pedal, MIDI)
  • Audio feedback options
  • Additional language models (Spanish, German, etc.)
  • Plugin system for custom commands
  • Cloud sync for settings (optional)
  • Real-time AI streaming (word-by-word refinement)

โ™ ๏ธ Nyro - Structural foundation, modular architecture ๐ŸŒฟ Aureon - Flow preservation, accessibility focus ๐ŸŽธ JamAI - Musical encoding, harmonic design

Built with recursive intention. Speak, toggle, flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

termivox-0.1.3.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

termivox-0.1.3-py3-none-any.whl (40.0 kB view details)

Uploaded Python 3

File details

Details for the file termivox-0.1.3.tar.gz.

File metadata

  • Download URL: termivox-0.1.3.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for termivox-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1fead412bf33b072ae9a576326aadea9976d5dc5485875240ef80c18082b0e03
MD5 8ba33eaf6cb84ac8ca16dfff622300e9
BLAKE2b-256 f24358c03c55720e129edea51725a173403fc8d1d01c5952edd57781f9751ec8

See more details on using hashes here.

File details

Details for the file termivox-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: termivox-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 40.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for termivox-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0921d90f623f1e7f32d592b23a71b869c45409efb193b1e3c09633eef6b20422
MD5 90e88c969c25fa48cc9b59aa1d5601a9
BLAKE2b-256 0423b627382fb045f381362aea247e447015c26391f9e1dcab42e8fc259c53b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page