Voice typing for macOS and Linux - hold a hotkey, speak, release. Local & private.
Project description
HoldSpeak
Hold a key. Speak. It types in any app. 100% local. And it learns how you work.
HoldSpeak is local-first voice input for macOS and Linux. Hold your hotkey, speak, release, and the text appears in whatever app you're in. No cloud, no account, no telemetry. Nothing leaves your machine except the model endpoint you choose to point at. Use it on its own as a voice-typing tool, or grow into meeting intelligence, project-aware dictation, and the AIPI-Lite companion device.
Status: early / pre-release. The features are mature, but it isn't on PyPI yet, so you install from source (below). APIs, config, and defaults can still change. Feedback and contributions welcome.
Why it's different
- 100% local by default. Whisper transcription and your own LLM. Nothing is sent anywhere unless you deliberately point it at a cloud endpoint. See Security & privacy.
- It gets better at your voice, and shows you the proof. Every dictation is saved: what you said, what it typed, where it routed, how long it took. Fix a wrong one with a single tap and it learns; a "What HoldSpeak learned" digest shows the honest "learned from N similar" count; replay an utterance through the updated pipeline and watch the routing change. Local, off by default for routing, no hidden retraining. See the learning loop.
- Your voice gets the afterlife your meetings already have. A dictation doesn't vanish the second it's typed. It's saved, searchable, and reviewable, the same way a recorded meeting is.
- 14 real LLM-backed meeting plugins. Architecture diagrams, ADRs, risk registers, incident timelines, decisions, and stakeholder updates, all pulled out of the transcript. See meeting intelligence.
- Bring your own model. GGUF in-process, MLX on Apple Silicon, or any OpenAI-compatible endpoint. See Models.
- Ambient desktop presence, if you want it. A native, focus-safe HUD shows whether it's listening, transcribing, or typing while you dictate into another app. Off by default. See Desktop Presence.
- AIPI-Lite companion, if you have one. A small device for meeting-capture controls, and for speaking a reply to your coding agent from another room. See the workflow.
What it does, at a glance
| Voice typing | Meeting intelligence | Project-aware typing |
|---|---|---|
Hold the hotkey, speak, release. The text goes into the active app. Punctuation commands ("period", "comma") and "clipboard" substitution work out of the box. |
Capture mic and system audio together, get a live transcript with speaker labels, and let the AI pull out topics, actions, and artifacts you can review at /history. |
Rough speech runs through intent classification, project-KB enrichment, and an LLM rewrite before it lands, tuned for Codex, Claude, the terminal, the browser, or your editor. |
See it learn
Speech turns into transcript context, reviewable actions, summaries, and replies for your coding agent, while the local runtime stays in charge. Because every dictation is recorded, you can look back at what it heard, fix a mistake in one tap (which teaches it), and replay the utterance through the updated pipeline. Instead of trusting that it improved, you watch it happen. See the full walkthrough.
The dictation journal. Every utterance, with what you said, what it typed, where it routed, and how long it took.
And it shows you what it learned. The Memory tab opens with a "What HoldSpeak learned" digest: how many corrections you made, how many dictations you corrected, and for each correction a real "learned from N similar" count, computed by the same matcher that nudges routing. No inflated numbers, quiet when nothing matched.
What HoldSpeak learned. Honest, windowed counts from the same matcher that nudges your routing.
Quickstart
The install script clones the repo, doctor checks your setup, and holdspeak
launches the web runtime:
curl -fsSL https://raw.githubusercontent.com/karolswdev/HoldSpeak/main/scripts/install.sh | bash
holdspeak doctor # check mic permissions and backends
holdspeak # launch the web runtime
Or from a clone, with uv:
git clone https://github.com/karolswdev/HoldSpeak.git && cd HoldSpeak
uv pip install -e .
holdspeak doctor && holdspeak
Install only the extras you need:
uv pip install -e '.[meeting]' # meeting mode and AI intelligence
uv pip install -e '.[dictation-mlx]' # intelligent dictation on Apple Silicon (MLX)
uv pip install -e '.[dictation-llama]' # intelligent dictation, cross-platform (GGUF)
uv pip install -e '.[dictation-openai]'# intelligent dictation via an OpenAI-compatible endpoint
The dictation and meeting LLM is yours to choose. See
docs/MODELS.md for the contract and current suggestions.
Upgrading and your data
Your whole HoldSpeak database is a single SQLite file. Before a version jump you
can snapshot it with holdspeak backup, and put one back with holdspeak restore. Upgrades are safe by default: HoldSpeak backs up an older database
before it touches it, and refuses to open a database written by a newer build
rather than risk your data. holdspeak doctor reports the schema and config
state it found. The full policy is in
docs/RELEASING.md.
Platform support
| Capability | macOS 14+ (Apple Silicon) | Linux X11 | Linux Wayland |
|---|---|---|---|
| Voice typing | ✅ | ✅ | ✅ |
| Global hotkey | ✅ | ✅ | ⚠️ Best effort |
| Cross-app typing | ✅ | ✅ | ⚠️ Best effort |
| Meeting mode | ✅ | ✅ | ✅ |
| System audio capture | ✅ BlackHole | ✅ Pulse/PipeWire | ✅ Pulse/PipeWire |
Wayland often blocks global hooks and synthetic typing, so HoldSpeak falls back to clipboard paste for injection.
Meeting intelligence
Record or save a meeting and HoldSpeak turns the transcript into structured,
reviewable artifacts. It scores the transcript for intent (architecture, delivery,
product, incident, comms), runs a chain of plugins, and has each one call your LLM
to produce a typed artifact. The results render read-only at /history. HoldSpeak
ships 14 built-in plugins, all real and backed by an LLM.
Plugins can also propose actions. An actuator proposes an external side effect, like filing a ticket or posting an update, that only runs after you approve it for that specific action. Actuators are off by default. Write your own with the Plugin Authoring guide; for endpoints and routing, see the Meeting Mode Guide.
Then close the loop. After a meeting, the "Your next move" aftercare panel at
/history shows what is still open (by owner), what was decided, and what changed
since the last meeting. Jump to the transcript moment that justifies any result,
file an accepted action as a human-approved issue through that same actuator flow,
or draft a copyable follow-up. It is read-only and local: nothing is sent, and
nothing runs, without your approval. See the
Meeting Mode Guide.
AIPI-Lite companion
AIPI-Lite is an optional ESPHome-based device you can carry between rooms. Put it on Wi-Fi (a phone hotspot works), and it gives you meeting-capture controls and status feedback. With Claude/Codex hooks on, it tells you when an agent is waiting so you can speak the reply back into the coding session. Buy the hardware from the official page or the Amazon listing; firmware and bridge setup are in the AIPI-Lite Developer Workflow.
Where to go next
| I want to… | Read this |
|---|---|
| Browse all the docs | Documentation index |
| Get it running and verify my setup | Getting Started |
| Choose / configure a model | Models (bring your own) |
| See speech become a project-grounded task | The Dictation Copilot |
| Set up project-aware dictation for Codex / Claude | Intelligent Typing Setup |
| Review, correct, and replay past dictations | Dictation journal & replay |
| Use meeting mode and configure AI intelligence | Meeting Mode Guide |
| Wire up the AIPI-Lite companion | AIPI-Lite Developer Workflow |
| Install Claude / Codex agent hooks | Agent Hook Install |
| Understand what's stored and what can leave my machine | Security & Privacy |
Configuration
Config lives at ~/.config/holdspeak/config.json, but you rarely edit it by hand.
The Settings page in the web runtime exposes the hotkey, model, meeting intel,
dictation pipeline, and presence options. The full reference is in
Getting Started and the guides above.
Contributing
Contributions are welcome. See CONTRIBUTING.md for setup
(uv, the git hooks, the test command) and the commit-contract workflow. Recent
changes are in CHANGELOG.md.
License
Licensed under the Apache License 2.0. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file holdspeak-0.2.1.tar.gz.
File metadata
- Download URL: holdspeak-0.2.1.tar.gz
- Upload date:
- Size: 706.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a78bc477f19e7ccfbd3f9642eec235407df21bc970047772724befae7537a8a5
|
|
| MD5 |
be8475bbd3d1f2e4a1434d1f91064822
|
|
| BLAKE2b-256 |
762f196be30785c2a24e37a4886128bb92b22e17edb4200dad76b65806c7f979
|
Provenance
The following attestation bundles were made for holdspeak-0.2.1.tar.gz:
Publisher:
release.yml on karolswdev/HoldSpeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
holdspeak-0.2.1.tar.gz -
Subject digest:
a78bc477f19e7ccfbd3f9642eec235407df21bc970047772724befae7537a8a5 - Sigstore transparency entry: 1750671642
- Sigstore integration time:
-
Permalink:
karolswdev/HoldSpeak@24d9762834a6c8258f20c965f6147628bbba56aa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/karolswdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@24d9762834a6c8258f20c965f6147628bbba56aa -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file holdspeak-0.2.1-py3-none-any.whl.
File metadata
- Download URL: holdspeak-0.2.1-py3-none-any.whl
- Upload date:
- Size: 2.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e4ecc860c011c1d0980fac1232f4492f39d4213b592111819fd01b4cb2abfd2
|
|
| MD5 |
6d7d8ada0b1322739482a200ccf47253
|
|
| BLAKE2b-256 |
ffadf700324b4d482b53888d33db5880bf6179f7a0b12f01c7b7f7d97200e79b
|
Provenance
The following attestation bundles were made for holdspeak-0.2.1-py3-none-any.whl:
Publisher:
release.yml on karolswdev/HoldSpeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
holdspeak-0.2.1-py3-none-any.whl -
Subject digest:
5e4ecc860c011c1d0980fac1232f4492f39d4213b592111819fd01b4cb2abfd2 - Sigstore transparency entry: 1750671780
- Sigstore integration time:
-
Permalink:
karolswdev/HoldSpeak@24d9762834a6c8258f20c965f6147628bbba56aa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/karolswdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@24d9762834a6c8258f20c965f6147628bbba56aa -
Trigger Event:
workflow_dispatch
-
Statement type: