Voice-driven AI assistant for real-time transcription and Gemini integration.
Project description
VoxAI
🚀 Features
- Universal Audio Capture
Record mic, system audio, or any combination via BlackHole (macOS) or equivalent loopback drivers. - Manual Control
Start/Stop buttons let you define exactly the boundaries of your prompt—perfect for long, multi-sentence queries. - One-Shot Transcription
Whisper processes the entire recording in a single call—no fragmented sentences. - Live AI Streaming
Gemini’s response appears token by token, just like in ChatGPT’s streaming interface. - Configurable Model
Swap between Gemini variants via environment (no code changes). - Zero-Install UI Bootstrapping
On first runvoxaiauto-installs the minimal Electron UI source—included in the PyPI package—so you only ever needpip install voxaiandvoxai.
🎯 Quickstart
⚙️ Prerequisites
-
Python ≥3.7
-
Node.js ≥14
-
Electron (global)
npm install -g electron
-
macOS Audio Loopback (BlackHole 2ch + Ladiocast):
-
Install BlackHole 2ch::
brew install blackhole-2ch
-
Create a Multi-Output Device
- Open Audio MIDI Setup
- Click “+” → Create Multi-Output Device
- Check both BlackHole 2ch and MacBook Pro Speakers
-
Select Multi-Output Device
System Preferences → Sound → Output → Multi-Output Device -
Configure Ladiocast
- Input 1: BlackHole 2ch → route to Main
- Main Output: MacBook Pro Speakers
- Mute other channels
This will route all system and microphone audio into VoxAI.
-
Install & Run
1) Install from PyPI
pip install voxai
2) Configure environment
cp .env.example .env
Edit .env:
GENAI_API_KEY=sk-…
GENAI_MODEL=gemini-1.5-flash
3) Launch the app
voxai
On first launch, VoxAI will automatically run npm install inside its bundled electron/ folder, then open a desktop window.
📖 Usage
Once the UI opens:
Start: Click Start to begin recording.
Speak: Listen to question`s from zoom, teams or any other tool aloud—no length limit.
Stop: Click Stop. Whisper transcribes your entire clip to the Transcript pane.
Ask AI: Click Ask AI. Gemini’s answer streams live into the Answer pane.
Copy & Share: Everything is plain text—copy it, paste it, or feed it into your own RAG/finetuning pipeline.
🎧 How It Works
-
Start Recording
Click Start to capture audio from your input device (e.g., BlackHole). -
Stop & Transcribe
Click Stop. Whisper transcribes and shows the full audio clip under Transcript. -
Ask AI
Click Ask AI to send the transcript to Gemini. Answers stream live in the Answer panel. -
Copy & Share
Output is plain text—reuse it in RAG/finetune workflows or anywhere else.
🛠 Configuration
Set in .env:
| Variable | Description | Example |
|---|---|---|
GENAI_API_KEY |
Google Generative AI (Gemini) API key | sk-… |
GENAI_MODEL |
Gemini model to use | gemini-1.5-flash |
See .env.example for reference.
🛠️ Developer Guide
Install from source
git clone https://github.com/rtiwariops/voxai.git
cd voxai
pip install -e .
npm install -g electron
voxai
⚙️ Features
- Universal Audio Capture (BlackHole, etc.)
- Manual Control for long-form Q&A
- One-Shot, Full-Clip Transcription
- Live Token-by-Token AI Streaming
- Configurable Gemini Model (
.env) - Electron-Based Desktop UI
📜 License
MIT License © 2025 Ravi (Robbie) Tiwari
Released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxai-0.1.71.tar.gz.
File metadata
- Download URL: voxai-0.1.71.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00671fbf9900b6c250a7670e83f231b4e2a7f2249b05841ce4c57ef6e4508833
|
|
| MD5 |
1106fe561c5bd6c88a23e5a148ba7c2c
|
|
| BLAKE2b-256 |
016d8f833cc01711081b94aebf0442ce0578f02327e2995df960a48b0c29ddb6
|
File details
Details for the file voxai-0.1.71-py3-none-any.whl.
File metadata
- Download URL: voxai-0.1.71-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
007e0e19707bf2db60e5800049b2438b6e7f102325fff3ffb83ae93d0536a19a
|
|
| MD5 |
f37a104069bc5efd82c3616039fcc868
|
|
| BLAKE2b-256 |
bd97d288f0682738295f252aac4da3daa748ed0c20505a2d6633db39eebe316a
|