Skip to main content

Local voice-to-text with Whisper + LLM cleanup

Project description

voice2text | Buy Me A Coffee

PyPI Downloads macOS Works on my machine

Local voice-to-text with Whisper + LLM cleanup. Push-to-talk (Right ⌘), pastes at cursor.

Voice-to-text tools like Whisper Flow, MacWhisper, and VoiceInk are becoming increasingly popular. It's a testament to our times that in 2025, ~270 lines of Python with local Whisper and a small ollama language model (Qwen 2.5-3B) can deliver a comparable experience on consumer hardware. Such tooling would have been unimaginable 3 years ago. This project is a proof of concept to demonstrate just that.

Note: Before anyone suggests splitting this into modules and submodules — this is an intentional design choice to keep everything in a single readable file.

Note 2: This is macOS-only by design. We use:

  • mlx-whisper — optimized for Apple Silicon
  • osascript — for simulating Cmd+V paste via System Events
  • pbcopy/pbpaste — macOS clipboard
  • nowplaying-cli — macOS media control
  • System Preferences URLs for permissions

You're welcome to fork this and make it work on Linux or Windows!

Prerequisites

Skip this if using pixi — it handles ollama automatically.

brew install ollama
ollama pull qwen2.5:3b

Install

uvx (easiest)

uvx --from voice2text v2t

Or from GitHub:

uvx --from git+https://github.com/lucharo/voice2text v2t

pip

pip install voice2text
v2t

Development install

git clone https://github.com/lucharo/voice2text.git
cd voice2text
uv sync
uv run v2t

Pixi

Pixi handles the ollama dependency automatically:

git clone https://github.com/lucharo/voice2text.git
cd voice2text
pixi run ollama pull qwen2.5:3b
pixi run v2t

Note: We don't publish to conda-forge/pixi channels yet, but may in the future.

Usage

v2t                      # strict mode (restructures sentences)
v2t --casual             # light cleanup (punctuation only)
v2t --pause-music        # pause media while recording (macOS only, requires nowplaying-cli via brew)

Hold Right Command to record, release to transcribe and paste.

Strict vs Casual Mode

Raw transcription Strict Casual
"Hey um I'll see you tomorrow at 9 actually no make it 10" "Hey, I'll see you tomorrow at 10." "Hey, I'll see you tomorrow at 9, actually no, make it 10."
"So basically I was thinking we could um you know maybe try the other approach" "I was thinking we could try the other approach." "So basically, I was thinking we could maybe try the other approach."

Strict (default): Removes filler words, restructures for clarity, condenses.

Casual: Only adds punctuation and removes "um/uh", keeps your phrasing.

--pause-music (macOS only)

Pauses any playing media while recording and resumes after. Requires:

brew install nowplaying-cli

Not available via pixi/conda-forge for now, maybe will publish later!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice2text-0.1.1.tar.gz (123.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice2text-0.1.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file voice2text-0.1.1.tar.gz.

File metadata

  • Download URL: voice2text-0.1.1.tar.gz
  • Upload date:
  • Size: 123.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for voice2text-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1c1ac999d5566855aa71a854d938e68d8bc7af4e70b9853aa2f59785ce0d8325
MD5 05208359a3811e931ac8b2d5dcf3cd18
BLAKE2b-256 35fdb7ba9e05f5d1e39fd7bbd1355c1c53a26ed14485bd06afd6f4ec5c0bd544

See more details on using hashes here.

File details

Details for the file voice2text-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: voice2text-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for voice2text-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5bd5970cbab12e4e7ca965bd7c9823fb3e9f483e1448985404664eb0eadbe0a
MD5 c6a0fa0253378b473fae960e8cfc4f6c
BLAKE2b-256 d48c9662dc69b911dd7d5c106f90e18bfd70f4ad84fd21a344bf5a33f5dc2183

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page