Skip to main content

Voice typing for macOS and Linux - hold a hotkey, speak, release. Local & private.

Project description

HoldSpeak

HoldSpeak logo, a held key with rising soundwaves

Hold a key. Speak. It types in any app. 100% local. And it learns how you work.

License: Apache-2.0 Tests Python 3.10+ Platform: macOS | Linux

HoldSpeak is local-first voice input for macOS and Linux. Hold your hotkey, speak, release, and the text appears in whatever app you're in. No cloud, no account, no telemetry. Nothing leaves your machine except the model endpoint you choose to point at. Use it on its own as a voice-typing tool, or grow into meeting intelligence, project-aware dictation, and the AIPI-Lite companion device.

Status: early / pre-release. The features are mature, but it isn't on PyPI yet, so you install from source (below). APIs, config, and defaults can still change. Feedback and contributions welcome.

Why it's different

  • 100% local by default. Whisper transcription and your own LLM. Nothing is sent anywhere unless you deliberately point it at a cloud endpoint. See Security & privacy.
  • It gets better at your voice, and shows you the proof. Every dictation is saved: what you said, what it typed, where it routed, how long it took. Fix a wrong one with a single tap and it learns; a "What HoldSpeak learned" digest shows the honest "learned from N similar" count; replay an utterance through the updated pipeline and watch the routing change. Local, off by default for routing, no hidden retraining. See the learning loop.
  • Your voice gets the afterlife your meetings already have. A dictation doesn't vanish the second it's typed. It's saved, searchable, and reviewable, the same way a recorded meeting is.
  • 14 real LLM-backed meeting plugins. Architecture diagrams, ADRs, risk registers, incident timelines, decisions, and stakeholder updates, all pulled out of the transcript. See meeting intelligence.
  • Bring your own model. GGUF in-process, MLX on Apple Silicon, or any OpenAI-compatible endpoint. See Models.
  • Ambient desktop presence, if you want it. A native, focus-safe HUD shows whether it's listening, transcribing, or typing while you dictate into another app. Off by default. See Desktop Presence.
  • AIPI-Lite companion, if you have one. A small device for meeting-capture controls, and for speaking a reply to your coding agent from another room. See the workflow.

What it does, at a glance

Voice typing Meeting intelligence Project-aware typing
Pixel art microphone with hold-to-talk waves Pixel art meeting notebook with action items Pixel art code editor connected to local context
Hold the hotkey, speak, release. The text goes into the active app. Punctuation commands ("period", "comma") and "clipboard" substitution work out of the box. Capture mic and system audio together, get a live transcript with speaker labels, and let the AI pull out topics, actions, and artifacts you can review at /history. Rough speech runs through intent classification, project-KB enrichment, and an LLM rewrite before it lands, tuned for Codex, Claude, the terminal, the browser, or your editor.

See it learn

Animated pixel art operator working at a terminal while companion and task cards update

Speech turns into transcript context, reviewable actions, summaries, and replies for your coding agent, while the local runtime stays in charge. Because every dictation is recorded, you can look back at what it heard, fix a mistake in one tap (which teaches it), and replay the utterance through the updated pipeline. Instead of trusting that it improved, you watch it happen. See the full walkthrough.

The HoldSpeak dictation Journal: a said-to-typed timeline of recent dictations, each card showing the spoken transcript, the typed result, its routing target, and a per-utterance latency strip; one row marked corrected.

The dictation journal. Every utterance, with what you said, what it typed, where it routed, and how long it took.

And it shows you what it learned. The Memory tab opens with a "What HoldSpeak learned" digest: how many corrections you made, how many dictations you corrected, and for each correction a real "learned from N similar" count, computed by the same matcher that nudges routing. No inflated numbers, quiet when nothing matched.

The 'What HoldSpeak learned' digest: a this-week / all-time toggle, headline counts for corrections made, dictations corrected, and utterances nudged, a breakdown by block and target, and per-correction 'learned from N similar' rows.

What HoldSpeak learned. Honest, windowed counts from the same matcher that nudges your routing.

Quickstart

The install script clones the repo, doctor checks your setup, and holdspeak launches the web runtime:

curl -fsSL https://raw.githubusercontent.com/karolswdev/HoldSpeak/main/scripts/install.sh | bash
holdspeak doctor   # check mic permissions and backends
holdspeak          # launch the web runtime

Or from a clone, with uv:

git clone https://github.com/karolswdev/HoldSpeak.git && cd HoldSpeak
uv pip install -e .
holdspeak doctor && holdspeak

Install only the extras you need:

uv pip install -e '.[meeting]'         # meeting mode and AI intelligence
uv pip install -e '.[dictation-mlx]'   # intelligent dictation on Apple Silicon (MLX)
uv pip install -e '.[dictation-llama]' # intelligent dictation, cross-platform (GGUF)
uv pip install -e '.[dictation-openai]'# intelligent dictation via an OpenAI-compatible endpoint

The dictation and meeting LLM is yours to choose. See docs/MODELS.md for the contract and current suggestions.

Upgrading and your data

Your whole HoldSpeak database is a single SQLite file. Before a version jump you can snapshot it with holdspeak backup, and put one back with holdspeak restore. Upgrades are safe by default: HoldSpeak backs up an older database before it touches it, and refuses to open a database written by a newer build rather than risk your data. holdspeak doctor reports the schema and config state it found. The full policy is in docs/RELEASING.md.

Platform support

Capability macOS 14+ (Apple Silicon) Linux X11 Linux Wayland
Voice typing
Global hotkey ⚠️ Best effort
Cross-app typing ⚠️ Best effort
Meeting mode
System audio capture ✅ BlackHole ✅ Pulse/PipeWire ✅ Pulse/PipeWire

Wayland often blocks global hooks and synthetic typing, so HoldSpeak falls back to clipboard paste for injection.

Meeting intelligence

Record or save a meeting and HoldSpeak turns the transcript into structured, reviewable artifacts. It scores the transcript for intent (architecture, delivery, product, incident, comms), runs a chain of plugins, and has each one call your LLM to produce a typed artifact. The results render read-only at /history. HoldSpeak ships 14 built-in plugins, all real and backed by an LLM.

Plugins can also propose actions. An actuator proposes an external side effect, like filing a ticket or posting an update, that only runs after you approve it for that specific action. Actuators are off by default. Write your own with the Plugin Authoring guide; for endpoints and routing, see the Meeting Mode Guide.

Then close the loop. After a meeting, the "Your next move" aftercare panel at /history shows what is still open (by owner), what was decided, and what changed since the last meeting. Jump to the transcript moment that justifies any result, file an accepted action as a human-approved issue through that same actuator flow, or draft a copyable follow-up. It is read-only and local: nothing is sent, and nothing runs, without your approval. See the Meeting Mode Guide.

AIPI-Lite companion

Pixel art AIPI-Lite companion device

AIPI-Lite is an optional ESPHome-based device you can carry between rooms. Put it on Wi-Fi (a phone hotspot works), and it gives you meeting-capture controls and status feedback. With Claude/Codex hooks on, it tells you when an agent is waiting so you can speak the reply back into the coding session. Buy the hardware from the official page or the Amazon listing; firmware and bridge setup are in the AIPI-Lite Developer Workflow.

Where to go next

I want to… Read this
Browse all the docs Documentation index
Get it running and verify my setup Getting Started
Choose / configure a model Models (bring your own)
See speech become a project-grounded task The Dictation Copilot
Set up project-aware dictation for Codex / Claude Intelligent Typing Setup
Review, correct, and replay past dictations Dictation journal & replay
Use meeting mode and configure AI intelligence Meeting Mode Guide
Wire up the AIPI-Lite companion AIPI-Lite Developer Workflow
Install Claude / Codex agent hooks Agent Hook Install
Understand what's stored and what can leave my machine Security & Privacy

Configuration

Config lives at ~/.config/holdspeak/config.json, but you rarely edit it by hand. The Settings page in the web runtime exposes the hotkey, model, meeting intel, dictation pipeline, and presence options. The full reference is in Getting Started and the guides above.

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup (uv, the git hooks, the test command) and the commit-contract workflow. Recent changes are in CHANGELOG.md.

License

Licensed under the Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holdspeak-0.2.1.tar.gz (706.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

holdspeak-0.2.1-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file holdspeak-0.2.1.tar.gz.

File metadata

  • Download URL: holdspeak-0.2.1.tar.gz
  • Upload date:
  • Size: 706.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a78bc477f19e7ccfbd3f9642eec235407df21bc970047772724befae7537a8a5
MD5 be8475bbd3d1f2e4a1434d1f91064822
BLAKE2b-256 762f196be30785c2a24e37a4886128bb92b22e17edb4200dad76b65806c7f979

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.1.tar.gz:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file holdspeak-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: holdspeak-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5e4ecc860c011c1d0980fac1232f4492f39d4213b592111819fd01b4cb2abfd2
MD5 6d7d8ada0b1322739482a200ccf47253
BLAKE2b-256 ffadf700324b4d482b53888d33db5880bf6179f7a0b12f01c7b7f7d97200e79b

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.1-py3-none-any.whl:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page