Skip to main content

macOS menu bar app that captures voice, transcribes, and enhances text with AI

Project description

Vaani · वाणी

वाणी — Sanskrit for speech, voice; the goddess of language and learning.

Voice to polished text, right at your cursor — anywhere on macOS.

Vaani is a macOS menu bar app that listens when you hold a hotkey, transcribes your speech with OpenAI Whisper, enhances it with Claude AI, and pastes the result directly at your cursor. No switching apps, no copy-pasting — just speak and the text appears.


The Problem

Typing is slow. Dictation on macOS gives you raw, unedited speech dumps — filler words, broken sentences, no punctuation. Third-party dictation tools are either expensive subscriptions, cloud-locked, or produce output that still needs manual cleanup.

Professionals who write a lot — engineers, writers, product managers, support teams — spend significant time translating their thoughts into polished prose. The gap between what you think and what ends up on screen is real friction.

Why We Built Vaani

We wanted voice to be a first-class input method, not an afterthought. The goal: speak naturally, get back something you'd actually send or commit. Vaani sits silently in your menu bar and activates on a global hotkey — no window to focus, no app to switch to. It works in any text field, terminal, IDE, browser, Slack, email client, or document editor.

The name "Vaani" (वाणी) comes from Sanskrit, meaning speech or voice — the goddess of language and learning.


Quick Start

pip install vaani
python -m spacy download en_core_web_sm
vaani start

On first launch, Vaani walks you through entering your API keys and granting macOS permissions. Then:

  1. Hold Alt (or your configured hotkey) and speak
  2. Release — Vaani transcribes and enhances your speech
  3. Polished text appears at your cursor in ~2–4 seconds

API keys needed: OpenAI (transcription) · Anthropic (enhancement). Keys are stored in macOS Keychain — never written to disk in plaintext.


How It Works

Hold hotkey → Mic capture → VAD trims silence → Gain normalization
   → OpenAI Whisper (transcribe) → Claude Haiku (enhance) → Paste at cursor

Every step runs in the background. The menu bar icon shows your current state: idle, recording, or processing.

Pipeline Detail

Step Technology What it does
Audio capture sounddevice / PortAudio Streams mic input at 16kHz mono
Voice activity detection Silero VAD (PyTorch) Strips silence; handles whisper-level audio
Gain normalization RMS-based Amplifies quiet audio so VAD works on whispers
Transcription OpenAI Whisper API Accurate STT across accents and background noise
Enhancement Anthropic Claude Haiku Polishes grammar, tone, and structure
Output pynput + pbcopy/pbpaste Saves clipboard → pastes → restores clipboard
Name formatting spaCy NER (en_core_web_sm) Detects person names, optionally prefixes with @

Enhancement Modes

Switch modes from the menu bar dropdown:

Mode What it does
Cleanup Fix grammar and remove filler words with minimal rewrites
Professional Formal rewrite for business communication, emails, and docs
Casual Friendly, conversational tone for chats and informal writing
Bullets Convert your speech into organized bullet points

Requirements

  • macOS 12+ (uses native menu bar, Keychain, clipboard APIs)
  • Python 3.10+
  • API Keys: OpenAI · Anthropic

macOS Permissions

Grant these in System Settings → Privacy & Security on first run:

Permission Why
Microphone Audio recording (auto-prompted on first use)
Accessibility Simulating Cmd+V to paste text
Input Monitoring Detecting the global hotkey from any app

Configuration

Config file: ~/.vaani/config.yaml

hotkey: "alt"                   # Global hotkey to hold while speaking
active_mode: professional       # Default enhancement mode
sounds_enabled: true            # Audio feedback on start/stop
vad_threshold: 0.05             # Lower = more sensitive (good for whispers)
sample_rate: 16000              # Audio sample rate (Hz)
max_recording_seconds: 600      # Auto-stop after 10 minutes
stt_model: whisper-1            # OpenAI transcription model
llm_model: claude-haiku-4-5-20251001  # Anthropic enhancement model
microphone_device: null         # null = system default, or set device index
paste_restore_delay_ms: 100     # How long to wait before restoring clipboard
launch_at_login: false          # Start Vaani automatically on login

Configuration reloads automatically when the file changes — no restart needed.


Custom Prompts

Override any prompt by creating files in ~/.vaani/prompts/:

~/.vaani/prompts/
├── system.txt          # Override the base system prompt
├── context.txt         # Add personal context (your writing style, name, role)
└── modes/
    ├── cleanup.txt     # Override the cleanup mode prompt
    ├── professional.txt
    ├── casual.txt
    └── bullets.txt

User prompts take priority over built-in defaults. Use context.txt to tell Vaani about you — your name, your company, common terms you use — so the output matches your voice.


Privacy

Data Where it goes How it's stored
Audio Sent to OpenAI for transcription Not stored by Vaani
Transcribed text Sent to Anthropic for enhancement Not stored by Vaani
API keys macOS Keychain only Never written to disk in plaintext
Transcription history Local SQLite database AES-256 encrypted (Fernet)

No data is retained on Vaani's servers because there are no Vaani servers. All cloud calls go directly from your machine to OpenAI and Anthropic under your API account, subject to their data retention policies.


Architecture

┌─────────────────────────────────────────────────────────────┐
│  Main Thread (macOS requirement)                            │
│  ┌─────────────┐   ┌───────────────────────────────────┐   │
│  │ HotkeyListener│  │ VaaniMenuBar (rumps event loop)   │   │
│  │ (pynput)    │   │ Status icon · Mode selector       │   │
│  └──────┬──────┘   └──────────────────────────────────-┘   │
│         │ on_press / on_release                              │
└─────────┼───────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│  StateMachine: IDLE → RECORDING → PROCESSING → IDLE        │
└─────────────────────────────────────────────────────────────┘
          │
          ▼  (daemon threads)
┌──────────────────────────────────────────────────────────┐
│  AudioRecorder → process_audio() → transcribe() → enhance() → paste_text()  │
│  sounddevice     Silero VAD        Whisper API    Claude     pynput           │
│                  + gain norm       + WAV encode   Haiku      + pbcopy         │
└──────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  HistoryStore (SQLite)      │
│  Fernet-encrypted records   │
└─────────────────────────────┘

Development

git clone https://github.com/ankushbhardwxj/vaani
cd vaani
python -m venv .venv && source .venv/bin/activate
pip install -e ".[test]"
python -m spacy download en_core_web_sm

# Run tests
pytest

# Run in foreground (with live logs)
vaani start --foreground

Running Tests

pytest                  # Run all tests
pytest -v               # Verbose output
pytest tests/test_audio.py  # Single module

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vaani-0.2.0.tar.gz (195.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vaani-0.2.0-py3-none-any.whl (178.5 kB view details)

Uploaded Python 3

File details

Details for the file vaani-0.2.0.tar.gz.

File metadata

  • Download URL: vaani-0.2.0.tar.gz
  • Upload date:
  • Size: 195.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for vaani-0.2.0.tar.gz
Algorithm Hash digest
SHA256 410e4f552a795958a4d59086b21a83793fc696427291590036657332cd82d91f
MD5 9ea278d1d23e7c2815afb7ccdcda5f45
BLAKE2b-256 80c8823f0ba21c0cfcea170b17c71b9a0704a7416fe63fc12d270dccbd4d3fbb

See more details on using hashes here.

File details

Details for the file vaani-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vaani-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 178.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for vaani-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc9c80f60ff15ae0dbdcfe51fa2417e364d1aa6983d39f6f259664ab533d7e4c
MD5 3401db6e2a521bd7f46d34a72794af77
BLAKE2b-256 e71042030a6db398979167a259475b7495d329bda594ccc4d35ed1e709603fbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page