Voice-driven AI assistant with modern UI, system audio capture, and ChatGPT-like streaming responses.
Project description
VoxAI
🚀 Features
- 🎧 System Audio Capture
Captures clean audio from your computer (meetings, browser, apps) - no external microphone noise or room interference. - 🎮 Manual Control
Start/Stop buttons let you define exactly the boundaries of your prompt—perfect for long, multi-sentence queries. - 🔄 One-Shot Transcription
Whisper processes the entire recording in a single call—no fragmented sentences. - ⚡ Live AI Streaming
ChatGPT-like streaming responses with structured formatting, headings, and bullet points. - 🎯 Principal Engineer Level Responses
Technical explanations at senior engineering level with architecture insights and business implications. - 🔧 Configurable Model
Swap between Gemini variants via environment (no code changes). - 🚀 Zero-Install UI Bootstrapping
On first runvoxaiauto-installs Electron dependencies—included in the PyPI package—so you only ever needpip install voxaiandvoxai. - 🎨 Modern UI
Beautiful gradient interface with real-time status indicators, typing animations, and professional styling.
🎯 Quickstart
⚙️ Prerequisites
- Python ≥3.7
- Node.js ≥14 (for Electron UI)
That's it! VoxAI automatically installs Electron dependencies on first run.
Important: VoxAI captures system audio only (meetings, browser, apps) - no external microphone.
Install & Run
1) Install from PyPI
pip install voxai
2) Configure environment
cp .env.example .env
Edit .env:
GENAI_API_KEY=sk-…
GENAI_MODEL=gemini-1.5-flash
3) Launch the app
voxai
On first launch, VoxAI will automatically run npm install inside its bundled electron/ folder, then open a desktop window.
📖 Usage
Once the UI opens:
Start: Click Start to begin recording.
Speak: Listen to question`s from zoom, teams or any other tool aloud—no length limit.
Stop: Click Stop. Whisper transcribes your entire clip to the Transcript pane.
Ask AI: Click Ask AI. Gemini’s answer streams live into the Answer pane.
Copy & Share: Everything is plain text—copy it, paste it, or feed it into your own RAG/finetuning pipeline.
🎧 How It Works
-
🎤 Audio Source Detection
VoxAI automatically detects your system audio setup and displays it in the beautiful UI with real-time status. -
🎬 Start Recording
Click Start Recording to capture clean audio from your computer (meetings, browser, apps). -
✋ Stop & Transcribe
Click Stop Recording. Whisper transcribes the entire audio clip and shows it under Transcript. -
🤖 Ask AI
Click Ask AI to send the transcript to Gemini. Get ChatGPT-like streaming responses with structured formatting. -
📋 Copy & Share
Professional responses with headings, bullet points, and technical depth—perfect for documentation or sharing.
🎨 Modern UI Features
- 🌈 Beautiful Design: Gradient backgrounds with glassmorphism effects
- 💫 Real-time Animations: Typing indicators, loading states, and smooth transitions
- 📊 Status Indicators: Color-coded status dots (🟢 ready, 🔴 recording, 🟡 thinking)
- 📱 Responsive Layout: Works perfectly on different screen sizes
- 🎯 Professional Formatting: AI responses with headings, bullets, and technical structure
🛠 Configuration
Set in .env:
| Variable | Description | Example |
|---|---|---|
GENAI_API_KEY |
Google Generative AI (Gemini) API key | sk-… |
GENAI_MODEL |
Gemini model to use | gemini-1.5-flash |
See .env.example for reference.
🛠️ Developer Guide
Install from source
git clone https://github.com/rtiwariops/voxai.git
cd voxai
pip install -e .
npm install -g electron
voxai
🔧 System Audio Setup
VoxAI captures computer audio only (no microphone) - perfect for meeting recordings, browser audio, and app sounds without room noise.
macOS Setup
# 1. Install BlackHole
brew install blackhole-2ch
# 2. Install Ladiocast
# Download from: https://existential.audio/ladiocast/
Configure Ladiocast:
- Input 1: Set to your audio source (Built-in Input or system audio)
- Main: Route Input 1 to Main (for speakers)
- Aux 1: Set to BlackHole 2ch
- Enable: Route Input 1 to Aux 1
System Settings:
- Input: BlackHole 2ch (VoxAI reads from here)
- Output: Built-in Output (you hear from here)
Windows (Built-in)
WASAPI loopback support is built into Windows 10+ - VoxAI will auto-detect.
Linux (Built-in)
PulseAudio monitor devices are auto-detected - no setup needed.
Result: You hear audio through speakers + VoxAI captures clean computer audio.
⚙️ Features
- Smart Cross-Platform Audio Detection
- Manual Control for long-form Q&A
- One-Shot, Full-Clip Transcription
- Live Token-by-Token AI Streaming
- Configurable Gemini Model (
.env) - Electron-Based Desktop UI
📜 License
MIT License © 2025 Ravi (Robbie) Tiwari
Released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxai-0.3.0.tar.gz.
File metadata
- Download URL: voxai-0.3.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50799eaecb96fdc86da3e83c793ab4f3698d3f310f4b4d8387c0aa4997df4dc4
|
|
| MD5 |
f4f2b5c28dc76141d8e1ab972c9f4cfe
|
|
| BLAKE2b-256 |
a7d8275fd45eeb9d4a9fe44d8ea0ae7f23776faedfc4568b064839dc17be783c
|
File details
Details for the file voxai-0.3.0-py3-none-any.whl.
File metadata
- Download URL: voxai-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63207aa630e51f23e80e5b220c742424b95b027005e5a7b144a09e458a7005bf
|
|
| MD5 |
c52dfece94303941810ddf5c8c1f1a3e
|
|
| BLAKE2b-256 |
a10cce2a84ffac87af149d491462bd58e90ecd33316e877c42d2a79523d2b41b
|