Skip to main content

macOS menu bar app that captures voice, transcribes, and enhances text with AI

Project description

Vaani · वाणी

वाणी — Sanskrit for speech, voice; the goddess of language and learning.

Voice to polished text, right at your cursor — anywhere on macOS.

Vaani is a macOS menu bar app that listens when you hold a hotkey, transcribes your speech with OpenAI Whisper, enhances it with Claude AI, and pastes the result directly at your cursor. No switching apps, no copy-pasting — just speak and the text appears.


The Problem

Typing is slow. Dictation on macOS gives you raw, unedited speech dumps — filler words, broken sentences, no punctuation. Third-party dictation tools are either expensive subscriptions, cloud-locked, or produce output that still needs manual cleanup.

Professionals who write a lot — engineers, writers, product managers, support teams — spend significant time translating their thoughts into polished prose. The gap between what you think and what ends up on screen is real friction.

Why We Built Vaani

We wanted voice to be a first-class input method, not an afterthought. The goal: speak naturally, get back something you'd actually send or commit. Vaani sits silently in your menu bar and activates on a global hotkey — no window to focus, no app to switch to. It works in any text field, terminal, IDE, browser, Slack, email client, or document editor.

The name "Vaani" (वाणी) comes from Sanskrit, meaning speech or voice — the goddess of language and learning.


Quick Start

curl -sSL https://raw.githubusercontent.com/ankushbhardwxj/vaani/main/install.sh | sh
vaani start

The install script installs Vaani via pip, sets up the vaani command on your PATH, and downloads required language models.

On first launch, Vaani walks you through entering your API keys and granting macOS permissions. Then:

  1. Hold Alt (or your configured hotkey) and speak
  2. Release — Vaani transcribes and enhances your speech
  3. Polished text appears at your cursor in ~2–4 seconds

API keys needed: OpenAI (transcription) · Anthropic (enhancement). Keys are stored in macOS Keychain — never written to disk in plaintext.


How It Works

Hold hotkey → Mic capture → VAD trims silence → Gain normalization
   → OpenAI Whisper (transcribe) → Claude Haiku (enhance) → Paste at cursor

Every step runs in the background. The menu bar icon shows your current state: idle, recording, or processing.

Pipeline Detail

Step Technology What it does
Audio capture sounddevice / PortAudio Streams mic input at 16kHz mono
Voice activity detection Silero VAD (PyTorch) Strips silence; handles whisper-level audio
Gain normalization RMS-based Amplifies quiet audio so VAD works on whispers
Transcription OpenAI Whisper API Accurate STT across accents and background noise
Enhancement Anthropic Claude Haiku Polishes grammar, tone, and structure
Output pynput + pbcopy/pbpaste Saves clipboard → pastes → restores clipboard
Name formatting spaCy NER (en_core_web_sm) Detects person names, optionally prefixes with @

Enhancement Modes

Switch modes from the menu bar dropdown:

Mode What it does
Minimal Fix grammar and remove filler words with minimal rewrites
Professional Formal rewrite for business communication, emails, and docs
Casual Friendly, conversational tone for chats and informal writing
Code Code-aware formatting, shell commands, and code generation from speech
Funny Witty, humorous rewrite that still delivers your message

All modes automatically detect list-like speech and format as bullet points.


Requirements

  • macOS 12+ (uses native menu bar, Keychain, clipboard APIs)
  • Python 3.10+
  • API Keys: OpenAI · Anthropic

macOS Permissions

Grant these in System Settings → Privacy & Security on first run:

Permission Why
Microphone Audio recording (auto-prompted on first use)
Accessibility Simulating Cmd+V to paste text
Input Monitoring Detecting the global hotkey from any app

Configuration

Config file: ~/.vaani/config.yaml

hotkey: "alt"                   # Global hotkey to hold while speaking
active_mode: professional       # Default enhancement mode
sounds_enabled: true            # Audio feedback on start/stop
vad_threshold: 0.05             # Lower = more sensitive (good for whispers)
sample_rate: 16000              # Audio sample rate (Hz)
max_recording_seconds: 600      # Auto-stop after 10 minutes
stt_model: whisper-1            # OpenAI transcription model
llm_model: claude-haiku-4-5-20251001  # Anthropic enhancement model
microphone_device: null         # null = system default, or set device index
paste_restore_delay_ms: 100     # How long to wait before restoring clipboard
launch_at_login: false          # Start Vaani automatically on login

Configuration reloads automatically when the file changes — no restart needed.


Custom Prompts

Override any prompt by creating files in ~/.vaani/prompts/:

~/.vaani/prompts/
├── system.txt          # Override the base system prompt
├── context.txt         # Add personal context (your writing style, name, role)
└── modes/
    ├── minimal.txt     # Override the minimal mode prompt
    ├── professional.txt
    ├── casual.txt
    ├── code.txt
    └── funny.txt

User prompts take priority over built-in defaults. Use context.txt to tell Vaani about you — your name, your company, common terms you use — so the output matches your voice.


Privacy

Data Where it goes How it's stored
Audio Sent to OpenAI for transcription Not stored by Vaani
Transcribed text Sent to Anthropic for enhancement Not stored by Vaani
API keys macOS Keychain only Never written to disk in plaintext
Transcription history Local SQLite database AES-256 encrypted (Fernet)

No data is retained on Vaani's servers because there are no Vaani servers. All cloud calls go directly from your machine to OpenAI and Anthropic under your API account, subject to their data retention policies.


Architecture

┌─────────────────────────────────────────────────────────────┐
│  Main Thread (macOS requirement)                            │
│  ┌─────────────┐   ┌───────────────────────────────────┐   │
│  │ HotkeyListener│  │ VaaniMenuBar (rumps event loop)   │   │
│  │ (pynput)    │   │ Status icon · Mode selector       │   │
│  └──────┬──────┘   └──────────────────────────────────-┘   │
│         │ on_press / on_release                              │
└─────────┼───────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│  StateMachine: IDLE → RECORDING → PROCESSING → IDLE        │
└─────────────────────────────────────────────────────────────┘
          │
          ▼  (daemon threads)
┌──────────────────────────────────────────────────────────┐
│  AudioRecorder → process_audio() → transcribe() → enhance() → paste_text()  │
│  sounddevice     Silero VAD        Whisper API    Claude     pynput           │
│                  + gain norm       + WAV encode   Haiku      + pbcopy         │
└──────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  HistoryStore (SQLite)      │
│  Fernet-encrypted records   │
└─────────────────────────────┘

Uninstall

pip uninstall vaani
rm -rf ~/.vaani
sudo rm -f /usr/local/bin/vaani

Development

git clone https://github.com/ankushbhardwxj/vaani
cd vaani
python -m venv .venv && source .venv/bin/activate
pip install -e ".[test]"
python -m spacy download en_core_web_sm

# Run tests
pytest

# Run in foreground (with live logs)
vaani start --foreground

Running Tests

pytest                  # Run all tests
pytest -v               # Verbose output
pytest tests/test_audio.py  # Single module

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vaani-0.2.7.tar.gz (201.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vaani-0.2.7-py3-none-any.whl (186.1 kB view details)

Uploaded Python 3

File details

Details for the file vaani-0.2.7.tar.gz.

File metadata

  • Download URL: vaani-0.2.7.tar.gz
  • Upload date:
  • Size: 201.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for vaani-0.2.7.tar.gz
Algorithm Hash digest
SHA256 f7ff330926c8bf1746fb9d2dc3cf3c19c3109029c20414a979cf55845ac9fb5b
MD5 a2364a16b27d9c0f19c8eb8c5407e1be
BLAKE2b-256 da447e28ea712291dcd36bb7bfeea82d388f47e30e182ba61ba7c5e3ea93aec6

See more details on using hashes here.

File details

Details for the file vaani-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: vaani-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 186.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for vaani-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 db68c4c6b78b325a053081686f469e997315af10c1b33587c28e2c7b350190ad
MD5 1a388592d152b6495f09eaaa3c6d08a6
BLAKE2b-256 0632837f0a2c10cbfda9891b81c957445f6118ae85ac6a9b504c4d19ab57419b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page