WhOSSpr Flow - Open Source Speech-to-Text for macOS

These details have not been verified by PyPI

Project description

WhOSSpr Flow

Open Source Speech-to-Text for macOS - A clone of Whispr Flow.

Features

Feature	Description
Local Whisper transcription	Uses OpenAI Whisper models locally for privacy
Configurable model sizes	Choose from tiny, base, small, medium, or large models
Keyboard shortcuts	Hold-to-dictate or toggle dictation modes
Universal text injection	Works with any application (browsers, terminals, editors)
Optional LLM enhancement	Improve transcribed text with OpenAI-compatible APIs
JSON configuration	Easy setup via config files or command-line parameters

Quick Start

# 1. Install WhOSSpr (no system dependencies needed)
uv sync

# 2. Create default configuration
uv run whosspr config --init

# 3. Check permissions (grant when prompted)
uv run whosspr check

# 4. Start dictation service
uv run whosspr start

Default Shortcuts

Shortcut	Action
`Ctrl+Cmd+1` (hold)	Hold to record, release to transcribe
`Ctrl+Cmd+2`	Toggle dictation on/off

Configuration

Copy config.example.json to whosspr.json and customize:

{
  "whisper": {
    "model_size": "base",
    "language": "en",
    "device": "auto"
  },
  "shortcuts": {
    "hold_to_dictate": "ctrl+cmd+1",
    "toggle_dictation": "ctrl+cmd+2"
  },
  "enhancement": {
    "enabled": false,
    "api_base_url": "https://api.openai.com/v1",
    "api_key": "",
    "model": "gpt-4o-mini"
  }
}

Whisper Model Sizes

Model	Size	Speed	Accuracy	VRAM
tiny	39M	Fastest	Basic	~1GB
base	74M	Fast	Good	~1GB
small	244M	Medium	Better	~2GB
medium	769M	Slow	Great	~5GB
large	1.5B	Slowest	Best	~10GB
turbo	809M	Fast	High	~6GB

Recommendation: Start with base for a balance of speed and accuracy.

Requirements

Requirement	Details
OS	macOS 10.14+ (optimized for Apple Silicon)
Python	3.12+
Permissions	Microphone access, Accessibility access
RAM	2GB+ (more for larger models)

Installation

Using uv (recommended)

uv sync

Using pip

pip install -e .

Permissions Setup

WhOSSpr requires two macOS permissions:

Permission	Purpose	How to Grant
Microphone	Record audio	System Preferences → Privacy → Microphone → Enable Terminal
Accessibility	Insert text	System Preferences → Privacy → Accessibility → Add Terminal

Verify with:

uv run whosspr check

Usage

Starting the Service

uv run whosspr start

Command-line Options

Option	Description
`--model`	Whisper model size (tiny/base/small/medium/large/turbo)
`--language`	Language code (e.g., en, es, fr)
`--device`	Device for inference (auto/cpu/mps/cuda)
`--enhancement`	Enable LLM text enhancement
`--api-key`	API key for enhancement

Examples

# Use small model with Spanish
uv run whosspr start --model small --language es

# Use MPS (Apple Silicon GPU)
uv run whosspr start --device mps

# Enable enhancement
uv run whosspr start --enhancement --api-key sk-xxx

Text Enhancement

WhOSSpr can improve transcribed text using an OpenAI-compatible API:

# Using OpenAI
export OPENAI_API_KEY=sk-your-api-key
uv run whosspr start --enhancement

# Using local LLM (Ollama)
uv run whosspr start --enhancement \
  --api-key ollama \
  --api-base-url http://localhost:11434/v1

Troubleshooting

Problem	Solution
Permission denied	Run `whosspr check`, grant permissions, restart terminal
No audio input	Check microphone connection and permissions
Text not appearing	Verify Accessibility permission, try different app
Model download fails	Check internet, try `--model tiny`
High CPU/memory	Use smaller model, try `--device mps` on Apple Silicon

Development

Running Tests

# All automated tests
uv run pytest

# With coverage
uv run pytest --cov=whosspr

# Manual E2E tests (interactive)
WHOSSPR_MANUAL_TESTS=1 uv run pytest tests/test_e2e_manual.py -v -s

License

Apache 2.0

Architecture

This document describes the architecture of WhOSSpr Flow, an open-source speech-to-text application for macOS.

Overview

WhOSSpr Flow captures audio from the microphone, transcribes it using OpenAI Whisper, optionally enhances the text with an LLM, and inserts the result into the active application.

flowchart LR
    A[User presses shortcut] --> B[Record audio]
    B --> C[Transcribe with Whisper]
    C --> D{Enhancement enabled?}
    D -->|Yes| E[Enhance with LLM]
    D -->|No| F[Insert text]
    E --> F

Module Structure

Module	Lines	Description
`cli.py`	284	Command-line interface (Typer)
`controller.py`	255	Main orchestration logic
`enhancer.py`	206	LLM text enhancement (OpenAI API)
`config.py`	180	Configuration schema and loading
`keyboard.py`	173	Global keyboard shortcuts (pynput)
`transcriber.py`	116	Speech-to-text (Whisper)
`recorder.py`	115	Audio recording (sounddevice)
`permissions.py`	58	macOS permission checks
`inserter.py`	53	Text insertion via clipboard
`__init__.py`	4	Package version
Total	1444

Module Responsibilities

Module	Responsibility
`cli.py`	Argument parsing, config loading, permission checks, service lifecycle, Rich console feedback
`config.py`	Type-safe Pydantic schema, JSON loading/saving, defaults, config file discovery
`controller.py`	State management (IDLE→RECORDING→PROCESSING), shortcut-to-recording connection, transcription pipeline
`recorder.py`	Callback-based non-blocking recording, 16kHz float32 audio, start/stop/cancel, duration tracking
`transcriber.py`	Lazy model loading, device auto-detection (CUDA/MPS/CPU), model size support, memory management
`keyboard.py`	Shortcut parsing ("ctrl+cmd+1"), hold/toggle modes, modifier normalization, callback invocation
`inserter.py`	Copy to clipboard, paste with Cmd+V, universal application support
`enhancer.py`	OpenAI-compatible API, API key resolution, custom prompts, grammar/punctuation improvement
`permissions.py`	Microphone access check, accessibility access check, pass/fail status

Data Flow

flowchart TD
    subgraph Input
        KB[Keyboard Shortcuts]
    end
    
    subgraph Controller
        CTRL[Controller]
    end
    
    subgraph Recording
        REC[Recorder]
        AUDIO[(Audio numpy)]
    end
    
    subgraph Processing
        TRANS[Transcriber]
        ENH[Enhancer]
        TEXT[(Text)]
    end
    
    subgraph Output
        INS[Inserter]
    end
    
    KB --> CTRL
    CTRL --> REC
    REC --> AUDIO
    AUDIO --> TRANS
    TRANS --> TEXT
    TEXT --> ENH
    ENH --> INS
    TEXT --> INS

State Machine

stateDiagram-v2
    [*] --> IDLE
    IDLE --> RECORDING: shortcut pressed
    RECORDING --> IDLE: cancelled / too short
    RECORDING --> PROCESSING: shortcut released
    PROCESSING --> IDLE: complete / error

Design Principles

Principle	Description
Simple Modules	Each module has single responsibility, none exceeds ~300 lines
Sequential Processing	Record→transcribe→enhance→insert runs sequentially (user waits anyway)
Loose Coupling	Controller imports others; other modules don't import each other (except config)
Direct Initialization	Components created when needed, no lazy patterns or factories
Callbacks for UI	Controller uses on_state/on_text/on_error callbacks to separate UI from logic

Threading Model

Component	Threading
sounddevice	Handles audio callback internally
pynput	Runs keyboard listener in separate thread
Processing	Sequential - no background threads for transcription

This simplifies debugging and reduces race conditions.

Configuration Schema

Section	Field	Type	Default
`whisper`	`model_size`	ModelSize	`base`
`whisper`	`language`	str	`en`
`whisper`	`device`	DeviceType	`auto`
`shortcuts`	`hold_to_dictate`	str	`ctrl+cmd+1`
`shortcuts`	`toggle_dictation`	str	`ctrl+cmd+2`
`enhancement`	`enabled`	bool	`false`
`enhancement`	`api_key`	str	`""`
`enhancement`	`model`	str	`gpt-4o-mini`
`audio`	`sample_rate`	int	`16000`
`audio`	`channels`	int	`1`

Test Structure

Test File	Coverage
`test_config.py`	Config loading/saving
`test_recorder.py`	Audio recording
`test_transcriber.py`	Whisper wrapper
`test_keyboard.py`	Shortcut parsing/handling
`test_controller.py`	Orchestration logic
`test_enhancer.py`	LLM enhancement
`test_cli.py`	CLI commands
`test_e2e_manual.py`	Interactive tests (require user)

Dependencies

Package	Purpose
sounddevice	Audio recording (no portaudio headers needed)
openai-whisper	Local speech-to-text
pynput	Global keyboard shortcuts
pyperclip	Clipboard operations
typer + rich	CLI framework
pydantic	Configuration validation
openai	LLM API client

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whossper-0.1.0.tar.gz (107.0 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whossper-0.1.0-py3-none-any.whl (24.4 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file whossper-0.1.0.tar.gz.

File metadata

Download URL: whossper-0.1.0.tar.gz
Upload date: Jan 31, 2026
Size: 107.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.13

File hashes

Hashes for whossper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`07eb375ea01583b4a120e05557b06e0a539b3f81ef52de0c057a626e26e15f23`
MD5	`645de45150e3a7bacd312a83168bb802`
BLAKE2b-256	`a5accce62e599d19c98eebcc1ba87f6de08b6f482bb466d400c0cd8a35a87f35`

See more details on using hashes here.

File details

Details for the file whossper-0.1.0-py3-none-any.whl.

File metadata

Download URL: whossper-0.1.0-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.13

File hashes

Hashes for whossper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`323a57489325b2c68ab2fd3ea296e931402bac1a12b2c829d8c513e5ed73ca69`
MD5	`7ec4039c67aa491b55667db732e2314f`
BLAKE2b-256	`52eb011dd5144ac2c383afc2af7364b406e54bfe08fe82e1c4c0e2354e90d999`

See more details on using hashes here.

whossper 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

WhOSSpr Flow

Features

Quick Start

Default Shortcuts

Configuration

Whisper Model Sizes

Requirements

Installation

Using uv (recommended)

Using pip

Permissions Setup

Usage

Starting the Service

Command-line Options

Examples

Text Enhancement

Troubleshooting

Development

Running Tests

License

Architecture

Overview

Module Structure

Module Responsibilities

Data Flow

State Machine

Design Principles

Threading Model

Configuration Schema

Test Structure

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes