Skip to main content

WhOSSpr Flow - Open Source Speech-to-Text for macOS

Project description

WhOSSpr Flow

Open Source Speech-to-Text for macOS - A clone of Whispr Flow.

Features

Feature Description
Local Whisper transcription Uses OpenAI Whisper models locally for privacy
Configurable model sizes Choose from tiny, base, small, medium, or large models
Keyboard shortcuts Hold-to-dictate or toggle dictation modes
Universal text injection Works with any application (browsers, terminals, editors)
Optional LLM enhancement Improve transcribed text with OpenAI-compatible APIs
JSON configuration Easy setup via config files or command-line parameters

Quick Start

# 1. Install WhOSSpr (no system dependencies needed)
uv sync

# 2. Create default configuration
uv run whosspr config --init

# 3. Check permissions (grant when prompted)
uv run whosspr check

# 4. Start dictation service
uv run whosspr start

Default Shortcuts

Shortcut Action
Ctrl+Cmd+1 (hold) Hold to record, release to transcribe
Ctrl+Cmd+2 Toggle dictation on/off

Configuration

Copy config.example.json to whosspr.json and customize:

{
  "whisper": {
    "model_size": "base",
    "language": "en",
    "device": "auto"
  },
  "shortcuts": {
    "hold_to_dictate": "ctrl+cmd+1",
    "toggle_dictation": "ctrl+cmd+2"
  },
  "enhancement": {
    "enabled": false,
    "api_base_url": "https://api.openai.com/v1",
    "api_key": "",
    "model": "gpt-4o-mini"
  }
}

Whisper Model Sizes

Model Size Speed Accuracy VRAM
tiny 39M Fastest Basic ~1GB
base 74M Fast Good ~1GB
small 244M Medium Better ~2GB
medium 769M Slow Great ~5GB
large 1.5B Slowest Best ~10GB
turbo 809M Fast High ~6GB

Recommendation: Start with base for a balance of speed and accuracy.

Requirements

Requirement Details
OS macOS 10.14+ (optimized for Apple Silicon)
Python 3.12+
Permissions Microphone access, Accessibility access
RAM 2GB+ (more for larger models)

Installation

Using uv (recommended)

uv sync

Using pip

pip install -e .

Permissions Setup

WhOSSpr requires two macOS permissions:

Permission Purpose How to Grant
Microphone Record audio System Preferences → Privacy → Microphone → Enable Terminal
Accessibility Insert text System Preferences → Privacy → Accessibility → Add Terminal

Verify with:

uv run whosspr check

Usage

Starting the Service

uv run whosspr start

Command-line Options

Option Description
--model Whisper model size (tiny/base/small/medium/large/turbo)
--language Language code (e.g., en, es, fr)
--device Device for inference (auto/cpu/mps/cuda)
--enhancement Enable LLM text enhancement
--api-key API key for enhancement

Examples

# Use small model with Spanish
uv run whosspr start --model small --language es

# Use MPS (Apple Silicon GPU)
uv run whosspr start --device mps

# Enable enhancement
uv run whosspr start --enhancement --api-key sk-xxx

Text Enhancement

WhOSSpr can improve transcribed text using an OpenAI-compatible API:

# Using OpenAI
export OPENAI_API_KEY=sk-your-api-key
uv run whosspr start --enhancement

# Using local LLM (Ollama)
uv run whosspr start --enhancement \
  --api-key ollama \
  --api-base-url http://localhost:11434/v1

Troubleshooting

Problem Solution
Permission denied Run whosspr check, grant permissions, restart terminal
No audio input Check microphone connection and permissions
Text not appearing Verify Accessibility permission, try different app
Model download fails Check internet, try --model tiny
High CPU/memory Use smaller model, try --device mps on Apple Silicon

Development

Running Tests

# All automated tests
uv run pytest

# With coverage
uv run pytest --cov=whosspr

# Manual E2E tests (interactive)
WHOSSPR_MANUAL_TESTS=1 uv run pytest tests/test_e2e_manual.py -v -s

License

Apache 2.0

Architecture

This document describes the architecture of WhOSSpr Flow, an open-source speech-to-text application for macOS.

Overview

WhOSSpr Flow captures audio from the microphone, transcribes it using OpenAI Whisper, optionally enhances the text with an LLM, and inserts the result into the active application.

flowchart LR
    A[User presses shortcut] --> B[Record audio]
    B --> C[Transcribe with Whisper]
    C --> D{Enhancement enabled?}
    D -->|Yes| E[Enhance with LLM]
    D -->|No| F[Insert text]
    E --> F

Module Structure

Module Lines Description
cli.py 284 Command-line interface (Typer)
controller.py 255 Main orchestration logic
enhancer.py 206 LLM text enhancement (OpenAI API)
config.py 180 Configuration schema and loading
keyboard.py 173 Global keyboard shortcuts (pynput)
transcriber.py 116 Speech-to-text (Whisper)
recorder.py 115 Audio recording (sounddevice)
permissions.py 58 macOS permission checks
inserter.py 53 Text insertion via clipboard
__init__.py 4 Package version
Total 1444

Module Responsibilities

Module Responsibility
cli.py Argument parsing, config loading, permission checks, service lifecycle, Rich console feedback
config.py Type-safe Pydantic schema, JSON loading/saving, defaults, config file discovery
controller.py State management (IDLE→RECORDING→PROCESSING), shortcut-to-recording connection, transcription pipeline
recorder.py Callback-based non-blocking recording, 16kHz float32 audio, start/stop/cancel, duration tracking
transcriber.py Lazy model loading, device auto-detection (CUDA/MPS/CPU), model size support, memory management
keyboard.py Shortcut parsing ("ctrl+cmd+1"), hold/toggle modes, modifier normalization, callback invocation
inserter.py Copy to clipboard, paste with Cmd+V, universal application support
enhancer.py OpenAI-compatible API, API key resolution, custom prompts, grammar/punctuation improvement
permissions.py Microphone access check, accessibility access check, pass/fail status

Data Flow

flowchart TD
    subgraph Input
        KB[Keyboard Shortcuts]
    end
    
    subgraph Controller
        CTRL[Controller]
    end
    
    subgraph Recording
        REC[Recorder]
        AUDIO[(Audio numpy)]
    end
    
    subgraph Processing
        TRANS[Transcriber]
        ENH[Enhancer]
        TEXT[(Text)]
    end
    
    subgraph Output
        INS[Inserter]
    end
    
    KB --> CTRL
    CTRL --> REC
    REC --> AUDIO
    AUDIO --> TRANS
    TRANS --> TEXT
    TEXT --> ENH
    ENH --> INS
    TEXT --> INS

State Machine

stateDiagram-v2
    [*] --> IDLE
    IDLE --> RECORDING: shortcut pressed
    RECORDING --> IDLE: cancelled / too short
    RECORDING --> PROCESSING: shortcut released
    PROCESSING --> IDLE: complete / error

Design Principles

Principle Description
Simple Modules Each module has single responsibility, none exceeds ~300 lines
Sequential Processing Record→transcribe→enhance→insert runs sequentially (user waits anyway)
Loose Coupling Controller imports others; other modules don't import each other (except config)
Direct Initialization Components created when needed, no lazy patterns or factories
Callbacks for UI Controller uses on_state/on_text/on_error callbacks to separate UI from logic

Threading Model

Component Threading
sounddevice Handles audio callback internally
pynput Runs keyboard listener in separate thread
Processing Sequential - no background threads for transcription

This simplifies debugging and reduces race conditions.

Configuration Schema

Section Field Type Default
whisper model_size ModelSize base
whisper language str en
whisper device DeviceType auto
shortcuts hold_to_dictate str ctrl+cmd+1
shortcuts toggle_dictation str ctrl+cmd+2
enhancement enabled bool false
enhancement api_key str ""
enhancement model str gpt-4o-mini
audio sample_rate int 16000
audio channels int 1

Test Structure

Test File Coverage
test_config.py Config loading/saving
test_recorder.py Audio recording
test_transcriber.py Whisper wrapper
test_keyboard.py Shortcut parsing/handling
test_controller.py Orchestration logic
test_enhancer.py LLM enhancement
test_cli.py CLI commands
test_e2e_manual.py Interactive tests (require user)

Dependencies

Package Purpose
sounddevice Audio recording (no portaudio headers needed)
openai-whisper Local speech-to-text
pynput Global keyboard shortcuts
pyperclip Clipboard operations
typer + rich CLI framework
pydantic Configuration validation
openai LLM API client

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whossper-0.1.0.tar.gz (107.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whossper-0.1.0-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file whossper-0.1.0.tar.gz.

File metadata

  • Download URL: whossper-0.1.0.tar.gz
  • Upload date:
  • Size: 107.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for whossper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 07eb375ea01583b4a120e05557b06e0a539b3f81ef52de0c057a626e26e15f23
MD5 645de45150e3a7bacd312a83168bb802
BLAKE2b-256 a5accce62e599d19c98eebcc1ba87f6de08b6f482bb466d400c0cd8a35a87f35

See more details on using hashes here.

File details

Details for the file whossper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: whossper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for whossper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 323a57489325b2c68ab2fd3ea296e931402bac1a12b2c829d8c513e5ed73ca69
MD5 7ec4039c67aa491b55667db732e2314f
BLAKE2b-256 52eb011dd5144ac2c383afc2af7364b406e54bfe08fe82e1c4c0e2354e90d999

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page