Skip to main content

Voice-driven AI assistant for real-time transcription and Gemini integration.

Project description

VoxAI

PyPI version
License MIT

VoxAI Logo

🚀 Features

  • Universal Audio Capture
    Record mic, system audio, or any combination via BlackHole (macOS) or equivalent loopback drivers.
  • Manual Control
    Start/Stop buttons let you define exactly the boundaries of your prompt—perfect for long, multi-sentence queries.
  • One-Shot Transcription
    Whisper processes the entire recording in a single call—no fragmented sentences.
  • Live AI Streaming
    Gemini’s response appears token by token, just like in ChatGPT’s streaming interface.
  • Configurable Model
    Swap between Gemini variants via environment (no code changes).
  • Zero-Install UI Bootstrapping
    On first run voxai auto-installs the minimal Electron UI source—included in the PyPI package—so you only ever need pip install voxai and voxai.

🎯 Quickstart

⚙️ Prerequisites

  • Python ≥3.7

  • Node.js ≥14

  • Electron (global)

    npm install -g electron
    
  • macOS Audio Loopback (BlackHole 2ch + Ladiocast):

    1. Install BlackHole 2ch::

      brew install blackhole-2ch

    2. Create a Multi-Output Device

      • Open Audio MIDI Setup
      • Click “+” → Create Multi-Output Device
      • Check both BlackHole 2ch and MacBook Pro Speakers
    3. Select Multi-Output Device
      System Preferences → Sound → Output → Multi-Output Device

    4. Configure Ladiocast

      • Input 1: BlackHole 2ch → route to Main
      • Main Output: MacBook Pro Speakers
      • Mute other channels

    This will route all system and microphone audio into VoxAI.

Install & Run

1) Install from PyPI

pip install voxai

2) Configure environment

cp .env.example .env

Edit .env:

GENAI_API_KEY=sk-…

GENAI_MODEL=gemini-1.5-flash

3) Launch the app

voxai

On first launch, VoxAI will automatically run npm install inside its bundled electron/ folder, then open a desktop window.


📖 Usage

Once the UI opens:

Start: Click Start to begin recording.

Speak: Listen to question`s from zoom, teams or any other tool aloud—no length limit.

Stop: Click Stop. Whisper transcribes your entire clip to the Transcript pane.

Ask AI: Click Ask AI. Gemini’s answer streams live into the Answer pane.

Copy & Share: Everything is plain text—copy it, paste it, or feed it into your own RAG/finetuning pipeline.


🎧 How It Works

  • Start Recording
    Click Start to capture audio from your input device (e.g., BlackHole).

  • Stop & Transcribe
    Click Stop. Whisper transcribes and shows the full audio clip under Transcript.

  • Ask AI
    Click Ask AI to send the transcript to Gemini. Answers stream live in the Answer panel.

  • Copy & Share
    Output is plain text—reuse it in RAG/finetune workflows or anywhere else.


🛠 Configuration

Set in .env:

Variable Description Example
GENAI_API_KEY Google Generative AI (Gemini) API key sk-…
GENAI_MODEL Gemini model to use gemini-1.5-flash

See .env.example for reference.


🛠️ Developer Guide

Install from source

git clone https://github.com/rtiwariops/voxai.git

cd voxai

pip install -e .

npm install -g electron

voxai

⚙️ Features

  • Universal Audio Capture (BlackHole, etc.)
  • Manual Control for long-form Q&A
  • One-Shot, Full-Clip Transcription
  • Live Token-by-Token AI Streaming
  • Configurable Gemini Model (.env)
  • Electron-Based Desktop UI

📜 License

MIT License © 2025 Ravi (Robbie) Tiwari

Released under the MIT License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxai-0.2.4.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxai-0.2.4-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file voxai-0.2.4.tar.gz.

File metadata

  • Download URL: voxai-0.2.4.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for voxai-0.2.4.tar.gz
Algorithm Hash digest
SHA256 e6974aa9b4b334071f6544bf5786e898a7b7014e69b20f1d863e06c920aff9bd
MD5 2d55b285e423d51b14e76296f36f8fd6
BLAKE2b-256 40e2aec4762a08fc67e215559b0b7e1655522bbc2a46bd591a0a8a15ea34d903

See more details on using hashes here.

File details

Details for the file voxai-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: voxai-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for voxai-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 538cfa8de63b17059bca6dd7baf942f69d0ee068b3af99e4d45a3c0f558c6424
MD5 8dd8ab5910b31babc3191f8e3d125834
BLAKE2b-256 d55dbf856b503ff806690fb3f4436a556f470df2bbd1bb6fcd136b36f2e95348

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page