Skip to main content

Voice typing for macOS and Linux - hold a hotkey, speak, release. Local & private.

Project description

HoldSpeak

HoldSpeak logo, a held key with rising soundwaves

Hold a key. Speak. It types in any app. 100% local. And it learns how you work.

License: Apache-2.0 Tests Python 3.10+ Platform: macOS | Linux

HoldSpeak is local-first voice input for macOS and Linux. Hold your hotkey, speak, release, and the text appears in whatever app you're in. No cloud, no account, no telemetry. Nothing leaves your machine except the model endpoint you choose to point at. Use it on its own as a voice-typing tool, or grow into meeting intelligence, project-aware dictation, and the AIPI-Lite companion device.

Status: 0.x, early but real. HoldSpeak is on PyPI (pip install holdspeak). The features are mature; APIs, config, and defaults can still change while it is pre-1.0. Upgrades are safe by default (your data is backed up first). Feedback and contributions welcome.

Why it's different

  • 100% local by default. Whisper transcription and your own LLM. Nothing is sent anywhere unless you deliberately point it at a cloud endpoint. See Security & privacy.
  • It gets better at your voice, and shows you the proof. Every dictation is saved: what you said, what it typed, where it routed, how long it took. Fix a wrong one with a single tap and it learns; a "What HoldSpeak learned" digest shows the honest "learned from N similar" count; replay an utterance through the updated pipeline and watch the routing change. Local, off by default for routing, no hidden retraining. See the learning loop.
  • Your voice gets the afterlife your meetings already have. A dictation doesn't vanish the second it's typed. It's saved, searchable, and reviewable, the same way a recorded meeting is.
  • 14 real LLM-backed meeting plugins. Architecture diagrams, ADRs, risk registers, incident timelines, decisions, and stakeholder updates, all pulled out of the transcript. See meeting intelligence.
  • Bring your own model. GGUF in-process, MLX on Apple Silicon, or any OpenAI-compatible endpoint. See Models.
  • Ambient desktop presence, if you want it. A native, focus-safe HUD shows whether it's listening, transcribing, or typing while you dictate into another app. Off by default. See Desktop Presence.
  • AIPI-Lite companion, if you have one. A small device for meeting-capture controls, and for speaking a reply to your coding agent from another room. See the workflow.

What it does, at a glance

Voice typing Meeting intelligence Project-aware typing
Pixel art microphone with hold-to-talk waves Pixel art meeting notebook with action items Pixel art code editor connected to local context
Hold the hotkey, speak, release. The text goes into the active app. Punctuation commands ("period", "comma") and "clipboard" substitution work out of the box. Capture mic and system audio together, get a live transcript with speaker labels, and let the AI pull out topics, actions, and artifacts you can review at /history. Rough speech runs through intent classification, project-KB enrichment, and an LLM rewrite before it lands, tuned for Codex, Claude, the terminal, the browser, or your editor.

See it learn

Animated pixel art operator working at a terminal while companion and task cards update

Speech turns into transcript context, reviewable actions, summaries, and replies for your coding agent, while the local runtime stays in charge. Because every dictation is recorded, you can look back at what it heard, fix a mistake in one tap (which teaches it), and replay the utterance through the updated pipeline. Instead of trusting that it improved, you watch it happen. See the full walkthrough.

The HoldSpeak dictation Journal: a said-to-typed timeline of recent dictations, each card showing the spoken transcript, the typed result, its routing target, and a per-utterance latency strip; one row marked corrected.

The dictation journal. Every utterance, with what you said, what it typed, where it routed, and how long it took.

And it shows you what it learned. The Memory tab opens with a "What HoldSpeak learned" digest: how many corrections you made, how many dictations you corrected, and for each correction a real "learned from N similar" count, computed by the same matcher that nudges routing. No inflated numbers, quiet when nothing matched.

The 'What HoldSpeak learned' digest: a this-week / all-time toggle, headline counts for corrections made, dictations corrected, and utterances nudged, a breakdown by block and target, and per-correction 'learned from N similar' rows.

What HoldSpeak learned. Honest, windowed counts from the same matcher that nudges your routing.

Quickstart

Install from PyPI, check your setup, and launch the web runtime:

pip install holdspeak
holdspeak doctor   # check mic permissions and backends
holdspeak          # launch the web runtime

Prefer uv? uv pip install holdspeak.

Or use the install script (creates an isolated venv and a holdspeak launcher), or work from a clone:

# one-line install
curl -fsSL https://raw.githubusercontent.com/karolswdev/HoldSpeak/main/scripts/install.sh | bash

# or from a clone (for development)
git clone https://github.com/karolswdev/HoldSpeak.git && cd HoldSpeak
uv pip install -e .
holdspeak doctor && holdspeak

Install only the extras you need:

pip install 'holdspeak[meeting]'          # meeting mode and AI intelligence
pip install 'holdspeak[dictation-mlx]'    # intelligent dictation on Apple Silicon (MLX)
pip install 'holdspeak[dictation-llama]'  # intelligent dictation, cross-platform (GGUF)
pip install 'holdspeak[dictation-openai]' # intelligent dictation via an OpenAI-compatible endpoint

(From a clone, use the editable form instead, e.g. uv pip install -e '.[meeting]'.)

The dictation and meeting LLM is yours to choose. See docs/MODELS.md for the contract and current suggestions.

Upgrading and your data

Your whole HoldSpeak database is a single SQLite file. Before a version jump you can snapshot it with holdspeak backup, and put one back with holdspeak restore. Upgrades are safe by default: HoldSpeak backs up an older database before it touches it, and refuses to open a database written by a newer build rather than risk your data. holdspeak doctor reports the schema and config state it found. The full policy is in docs/RELEASING.md.

Platform support

Capability macOS 14+ (Apple Silicon) Linux X11 Linux Wayland
Voice typing
Global hotkey ⚠️ Best effort
Cross-app typing ⚠️ Best effort
Meeting mode
System audio capture ✅ BlackHole ✅ Pulse/PipeWire ✅ Pulse/PipeWire

Wayland often blocks global hooks and synthetic typing, so HoldSpeak falls back to clipboard paste for injection.

Meeting intelligence

Record or save a meeting and HoldSpeak turns the transcript into structured, reviewable artifacts. It scores the transcript for intent (architecture, delivery, product, incident, comms), runs a chain of plugins, and has each one call your LLM to produce a typed artifact. The results render read-only at /history. HoldSpeak ships 14 built-in plugins, all real and backed by an LLM.

Plugins can also propose actions. An actuator proposes an external side effect, like filing a ticket or posting an update, that only runs after you approve it for that specific action. Actuators are off by default. Write your own with the Plugin Authoring guide; for endpoints and routing, see the Meeting Mode Guide.

Then close the loop. After a meeting, the "Your next move" aftercare panel at /history shows what is still open (by owner), what was decided, and what changed since the last meeting. Jump to the transcript moment that justifies any result, file an accepted action as a human-approved issue through that same actuator flow, or draft a copyable follow-up. It is read-only and local: nothing is sent, and nothing runs, without your approval. See the Meeting Mode Guide.

AIPI-Lite companion

Pixel art AIPI-Lite companion device

AIPI-Lite is an optional ESPHome-based device you can carry between rooms. Put it on Wi-Fi (a phone hotspot works), and it gives you meeting-capture controls and status feedback. With Claude/Codex hooks on, it tells you when an agent is waiting so you can speak the reply back into the coding session. Buy the hardware from the official page or the Amazon listing; firmware and bridge setup are in the AIPI-Lite Developer Workflow.

Where to go next

I want to… Read this
Browse all the docs Documentation index
Get it running and verify my setup Getting Started
Choose / configure a model Models (bring your own)
See speech become a project-grounded task The Dictation Copilot
Set up project-aware dictation for Codex / Claude Intelligent Typing Setup
Review, correct, and replay past dictations Dictation journal & replay
Use meeting mode and configure AI intelligence Meeting Mode Guide
Wire up the AIPI-Lite companion AIPI-Lite Developer Workflow
Install Claude / Codex agent hooks Agent Hook Install
Understand what's stored and what can leave my machine Security & Privacy

Configuration

Config lives at ~/.config/holdspeak/config.json, but you rarely edit it by hand. The Settings page in the web runtime exposes the hotkey, model, meeting intel, dictation pipeline, and presence options. The full reference is in Getting Started and the guides above.

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup (uv, the git hooks, the test command) and the commit-contract workflow. Recent changes are in CHANGELOG.md.

License

Licensed under the Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holdspeak-0.2.2.tar.gz (706.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

holdspeak-0.2.2-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file holdspeak-0.2.2.tar.gz.

File metadata

  • Download URL: holdspeak-0.2.2.tar.gz
  • Upload date:
  • Size: 706.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.2.tar.gz
Algorithm Hash digest
SHA256 febfd89a2547e780828314ab85701656fd3eccb9bb0db85506fb1d13319b4ee5
MD5 0098a2774e3d518bcf1a1c9720691c4c
BLAKE2b-256 3f1ac4eefa1a1676e2b2ba80eca3225d8ae5bb1e5bba6d926d751732c1a66e95

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.2.tar.gz:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file holdspeak-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: holdspeak-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 08a872bf64f3da3fb3fe66d25b17997f6ac47a23f86389fc56815e0813be0471
MD5 e4290899daa0e65e81b4e26bc3f5fd57
BLAKE2b-256 3fb00cd6191b7ffa9d24e550de17165eae5a205b3fe46f7419c23ae4758cd60d

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.2-py3-none-any.whl:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page