WhOSSpr Flow - Open Source Speech-to-Text for macOS
Project description
WhOSSpr Flow
Open Source Speech-to-Text for macOS - A clone of Whispr Flow.
Features
| Feature | Description |
|---|---|
| Local Whisper transcription | Uses OpenAI Whisper models locally for privacy |
| Configurable model sizes | Choose from tiny, base, small, medium, or large models |
| Keyboard shortcuts | Hold-to-dictate or toggle dictation modes |
| Universal text injection | Works with any application (browsers, terminals, editors) |
| Optional LLM enhancement | Improve transcribed text with OpenAI-compatible APIs |
| JSON configuration | Easy setup via config files or command-line parameters |
Quick Start
# 1. Install WhOSSpr (no system dependencies needed)
uv sync
# 2. Create default configuration
uv run whosspr config --init
# 3. Check permissions (grant when prompted)
uv run whosspr check
# 4. Start dictation service
uv run whosspr start
Default Shortcuts
| Shortcut | Action |
|---|---|
Ctrl+Cmd+1 (hold) |
Hold to record, release to transcribe |
Ctrl+Cmd+2 |
Toggle dictation on/off |
Configuration
Copy config.example.json to whosspr.json and customize:
{
"whisper": {
"model_size": "base",
"language": "en",
"device": "auto"
},
"shortcuts": {
"hold_to_dictate": "ctrl+cmd+1",
"toggle_dictation": "ctrl+cmd+2"
},
"enhancement": {
"enabled": false,
"api_base_url": "https://api.openai.com/v1",
"api_key": "",
"model": "gpt-4o-mini"
}
}
Whisper Model Sizes
| Model | Size | Speed | Accuracy | VRAM |
|---|---|---|---|---|
| tiny | 39M | Fastest | Basic | ~1GB |
| base | 74M | Fast | Good | ~1GB |
| small | 244M | Medium | Better | ~2GB |
| medium | 769M | Slow | Great | ~5GB |
| large | 1.5B | Slowest | Best | ~10GB |
| turbo | 809M | Fast | High | ~6GB |
Recommendation: Start with base for a balance of speed and accuracy.
Requirements
| Requirement | Details |
|---|---|
| OS | macOS 10.14+ (optimized for Apple Silicon) |
| Python | 3.12+ |
| Permissions | Microphone access, Accessibility access |
| RAM | 2GB+ (more for larger models) |
Installation
Using uv (recommended)
uv sync
Using pip
pip install -e .
Permissions Setup
WhOSSpr requires two macOS permissions:
| Permission | Purpose | How to Grant |
|---|---|---|
| Microphone | Record audio | System Preferences → Privacy → Microphone → Enable Terminal |
| Accessibility | Insert text | System Preferences → Privacy → Accessibility → Add Terminal |
Verify with:
uv run whosspr check
Usage
Starting the Service
uv run whosspr start
Command-line Options
| Option | Description |
|---|---|
--model |
Whisper model size (tiny/base/small/medium/large/turbo) |
--language |
Language code (e.g., en, es, fr) |
--device |
Device for inference (auto/cpu/mps/cuda) |
--enhancement |
Enable LLM text enhancement |
--api-key |
API key for enhancement |
Examples
# Use small model with Spanish
uv run whosspr start --model small --language es
# Use MPS (Apple Silicon GPU)
uv run whosspr start --device mps
# Enable enhancement
uv run whosspr start --enhancement --api-key sk-xxx
Text Enhancement
WhOSSpr can improve transcribed text using an OpenAI-compatible API:
# Using OpenAI
export OPENAI_API_KEY=sk-your-api-key
uv run whosspr start --enhancement
# Using local LLM (Ollama)
uv run whosspr start --enhancement \
--api-key ollama \
--api-base-url http://localhost:11434/v1
Troubleshooting
| Problem | Solution |
|---|---|
| Permission denied | Run whosspr check, grant permissions, restart terminal |
| No audio input | Check microphone connection and permissions |
| Text not appearing | Verify Accessibility permission, try different app |
| Model download fails | Check internet, try --model tiny |
| High CPU/memory | Use smaller model, try --device mps on Apple Silicon |
Development
Running Tests
# All automated tests
uv run pytest
# With coverage
uv run pytest --cov=whosspr
# Manual E2E tests (interactive)
WHOSSPR_MANUAL_TESTS=1 uv run pytest tests/test_e2e_manual.py -v -s
License
Apache 2.0
Architecture
This document describes the architecture of WhOSSpr Flow, an open-source speech-to-text application for macOS.
Overview
WhOSSpr Flow captures audio from the microphone, transcribes it using OpenAI Whisper, optionally enhances the text with an LLM, and inserts the result into the active application.
flowchart LR
A[User presses shortcut] --> B[Record audio]
B --> C[Transcribe with Whisper]
C --> D{Enhancement enabled?}
D -->|Yes| E[Enhance with LLM]
D -->|No| F[Insert text]
E --> F
Module Structure
| Module | Lines | Description |
|---|---|---|
cli.py |
284 | Command-line interface (Typer) |
controller.py |
255 | Main orchestration logic |
enhancer.py |
206 | LLM text enhancement (OpenAI API) |
config.py |
180 | Configuration schema and loading |
keyboard.py |
173 | Global keyboard shortcuts (pynput) |
transcriber.py |
116 | Speech-to-text (Whisper) |
recorder.py |
115 | Audio recording (sounddevice) |
permissions.py |
58 | macOS permission checks |
inserter.py |
53 | Text insertion via clipboard |
__init__.py |
4 | Package version |
| Total | 1444 |
Module Responsibilities
| Module | Responsibility |
|---|---|
cli.py |
Argument parsing, config loading, permission checks, service lifecycle, Rich console feedback |
config.py |
Type-safe Pydantic schema, JSON loading/saving, defaults, config file discovery |
controller.py |
State management (IDLE→RECORDING→PROCESSING), shortcut-to-recording connection, transcription pipeline |
recorder.py |
Callback-based non-blocking recording, 16kHz float32 audio, start/stop/cancel, duration tracking |
transcriber.py |
Lazy model loading, device auto-detection (CUDA/MPS/CPU), model size support, memory management |
keyboard.py |
Shortcut parsing ("ctrl+cmd+1"), hold/toggle modes, modifier normalization, callback invocation |
inserter.py |
Copy to clipboard, paste with Cmd+V, universal application support |
enhancer.py |
OpenAI-compatible API, API key resolution, custom prompts, grammar/punctuation improvement |
permissions.py |
Microphone access check, accessibility access check, pass/fail status |
Data Flow
flowchart TD
subgraph Input
KB[Keyboard Shortcuts]
end
subgraph Controller
CTRL[Controller]
end
subgraph Recording
REC[Recorder]
AUDIO[(Audio numpy)]
end
subgraph Processing
TRANS[Transcriber]
ENH[Enhancer]
TEXT[(Text)]
end
subgraph Output
INS[Inserter]
end
KB --> CTRL
CTRL --> REC
REC --> AUDIO
AUDIO --> TRANS
TRANS --> TEXT
TEXT --> ENH
ENH --> INS
TEXT --> INS
State Machine
stateDiagram-v2
[*] --> IDLE
IDLE --> RECORDING: shortcut pressed
RECORDING --> IDLE: cancelled / too short
RECORDING --> PROCESSING: shortcut released
PROCESSING --> IDLE: complete / error
Design Principles
| Principle | Description |
|---|---|
| Simple Modules | Each module has single responsibility, none exceeds ~300 lines |
| Sequential Processing | Record→transcribe→enhance→insert runs sequentially (user waits anyway) |
| Loose Coupling | Controller imports others; other modules don't import each other (except config) |
| Direct Initialization | Components created when needed, no lazy patterns or factories |
| Callbacks for UI | Controller uses on_state/on_text/on_error callbacks to separate UI from logic |
Threading Model
| Component | Threading |
|---|---|
| sounddevice | Handles audio callback internally |
| pynput | Runs keyboard listener in separate thread |
| Processing | Sequential - no background threads for transcription |
This simplifies debugging and reduces race conditions.
Configuration Schema
| Section | Field | Type | Default |
|---|---|---|---|
whisper |
model_size |
ModelSize | base |
whisper |
language |
str | en |
whisper |
device |
DeviceType | auto |
shortcuts |
hold_to_dictate |
str | ctrl+cmd+1 |
shortcuts |
toggle_dictation |
str | ctrl+cmd+2 |
enhancement |
enabled |
bool | false |
enhancement |
api_key |
str | "" |
enhancement |
model |
str | gpt-4o-mini |
audio |
sample_rate |
int | 16000 |
audio |
channels |
int | 1 |
Test Structure
| Test File | Coverage |
|---|---|
test_config.py |
Config loading/saving |
test_recorder.py |
Audio recording |
test_transcriber.py |
Whisper wrapper |
test_keyboard.py |
Shortcut parsing/handling |
test_controller.py |
Orchestration logic |
test_enhancer.py |
LLM enhancement |
test_cli.py |
CLI commands |
test_e2e_manual.py |
Interactive tests (require user) |
Dependencies
| Package | Purpose |
|---|---|
| sounddevice | Audio recording (no portaudio headers needed) |
| openai-whisper | Local speech-to-text |
| pynput | Global keyboard shortcuts |
| pyperclip | Clipboard operations |
| typer + rich | CLI framework |
| pydantic | Configuration validation |
| openai | LLM API client |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whossper-0.1.0.tar.gz.
File metadata
- Download URL: whossper-0.1.0.tar.gz
- Upload date:
- Size: 107.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07eb375ea01583b4a120e05557b06e0a539b3f81ef52de0c057a626e26e15f23
|
|
| MD5 |
645de45150e3a7bacd312a83168bb802
|
|
| BLAKE2b-256 |
a5accce62e599d19c98eebcc1ba87f6de08b6f482bb466d400c0cd8a35a87f35
|
File details
Details for the file whossper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: whossper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
323a57489325b2c68ab2fd3ea296e931402bac1a12b2c829d8c513e5ed73ca69
|
|
| MD5 |
7ec4039c67aa491b55667db732e2314f
|
|
| BLAKE2b-256 |
52eb011dd5144ac2c383afc2af7364b406e54bfe08fe82e1c4c0e2354e90d999
|