Voice typing for macOS and Linux - hold a hotkey, speak, release. Local & private.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

karolswdev

These details have not been verified by PyPI

Project description

HoldSpeak

HoldSpeak logo, a held key with rising soundwaves

Hold a key. Speak. It types in any app. 100% local. And it learns how you work.

HoldSpeak is local-first voice input for macOS and Linux. Hold your hotkey, speak, release, and the text appears in whatever app you're in. No cloud, no account, no telemetry. Nothing leaves your machine except the model endpoint you choose to point at. Use it on its own as a voice-typing tool, or grow into meeting intelligence, project-aware dictation, and the AIPI-Lite companion device.

Status: early / pre-release. The features are mature, but it isn't on PyPI yet, so you install from source (below). APIs, config, and defaults can still change. Feedback and contributions welcome.

Why it's different

100% local by default. Whisper transcription and your own LLM. Nothing is sent anywhere unless you deliberately point it at a cloud endpoint. See Security & privacy.
It gets better at your voice, and shows you the proof. Every dictation is saved: what you said, what it typed, where it routed, how long it took. Fix a wrong one with a single tap and it learns; a "What HoldSpeak learned" digest shows the honest "learned from N similar" count; replay an utterance through the updated pipeline and watch the routing change. Local, off by default for routing, no hidden retraining. See the learning loop.
Your voice gets the afterlife your meetings already have. A dictation doesn't vanish the second it's typed. It's saved, searchable, and reviewable, the same way a recorded meeting is.
14 real LLM-backed meeting plugins. Architecture diagrams, ADRs, risk registers, incident timelines, decisions, and stakeholder updates, all pulled out of the transcript. See meeting intelligence.
Bring your own model. GGUF in-process, MLX on Apple Silicon, or any OpenAI-compatible endpoint. See Models.
Ambient desktop presence, if you want it. A native, focus-safe HUD shows whether it's listening, transcribing, or typing while you dictate into another app. Off by default. See Desktop Presence.
AIPI-Lite companion, if you have one. A small device for meeting-capture controls, and for speaking a reply to your coding agent from another room. See the workflow.

What it does, at a glance

Voice typing	Meeting intelligence	Project-aware typing

Hold the hotkey, speak, release. The text goes into the active app. Punctuation commands (`"period"`, `"comma"`) and `"clipboard"` substitution work out of the box.	Capture mic and system audio together, get a live transcript with speaker labels, and let the AI pull out topics, actions, and artifacts you can review at `/history`.	Rough speech runs through intent classification, project-KB enrichment, and an LLM rewrite before it lands, tuned for Codex, Claude, the terminal, the browser, or your editor.

See it learn

Animated pixel art operator working at a terminal while companion and task cards update

Speech turns into transcript context, reviewable actions, summaries, and replies for your coding agent, while the local runtime stays in charge. Because every dictation is recorded, you can look back at what it heard, fix a mistake in one tap (which teaches it), and replay the utterance through the updated pipeline. Instead of trusting that it improved, you watch it happen. See the full walkthrough.

The HoldSpeak dictation Journal: a said-to-typed timeline of recent dictations, each card showing the spoken transcript, the typed result, its routing target, and a per-utterance latency strip; one row marked corrected.

The dictation journal. Every utterance, with what you said, what it typed, where it routed, and how long it took.

And it shows you what it learned. The Memory tab opens with a "What HoldSpeak learned" digest: how many corrections you made, how many dictations you corrected, and for each correction a real "learned from N similar" count, computed by the same matcher that nudges routing. No inflated numbers, quiet when nothing matched.

The 'What HoldSpeak learned' digest: a this-week / all-time toggle, headline counts for corrections made, dictations corrected, and utterances nudged, a breakdown by block and target, and per-correction 'learned from N similar' rows.

What HoldSpeak learned. Honest, windowed counts from the same matcher that nudges your routing.

Quickstart

The install script clones the repo, doctor checks your setup, and holdspeak launches the web runtime:

curl -fsSL https://raw.githubusercontent.com/karolswdev/HoldSpeak/main/scripts/install.sh | bash
holdspeak doctor   # check mic permissions and backends
holdspeak          # launch the web runtime

Or from a clone, with uv:

git clone https://github.com/karolswdev/HoldSpeak.git && cd HoldSpeak
uv pip install -e .
holdspeak doctor && holdspeak

Install only the extras you need:

uv pip install -e '.[meeting]'         # meeting mode and AI intelligence
uv pip install -e '.[dictation-mlx]'   # intelligent dictation on Apple Silicon (MLX)
uv pip install -e '.[dictation-llama]' # intelligent dictation, cross-platform (GGUF)
uv pip install -e '.[dictation-openai]'# intelligent dictation via an OpenAI-compatible endpoint

The dictation and meeting LLM is yours to choose. See docs/MODELS.md for the contract and current suggestions.

Upgrading and your data

Your whole HoldSpeak database is a single SQLite file. Before a version jump you can snapshot it with holdspeak backup, and put one back with holdspeak restore. Upgrades are safe by default: HoldSpeak backs up an older database before it touches it, and refuses to open a database written by a newer build rather than risk your data. holdspeak doctor reports the schema and config state it found. The full policy is in docs/RELEASING.md.

Platform support

Capability	macOS 14+ (Apple Silicon)	Linux X11	Linux Wayland
Voice typing	✅	✅	✅
Global hotkey	✅	✅	⚠️ Best effort
Cross-app typing	✅	✅	⚠️ Best effort
Meeting mode	✅	✅	✅
System audio capture	✅ BlackHole	✅ Pulse/PipeWire	✅ Pulse/PipeWire

Wayland often blocks global hooks and synthetic typing, so HoldSpeak falls back to clipboard paste for injection.

Meeting intelligence

Record or save a meeting and HoldSpeak turns the transcript into structured, reviewable artifacts. It scores the transcript for intent (architecture, delivery, product, incident, comms), runs a chain of plugins, and has each one call your LLM to produce a typed artifact. The results render read-only at /history. HoldSpeak ships 14 built-in plugins, all real and backed by an LLM.

Plugins can also propose actions. An actuator proposes an external side effect, like filing a ticket or posting an update, that only runs after you approve it for that specific action. Actuators are off by default. Write your own with the Plugin Authoring guide; for endpoints and routing, see the Meeting Mode Guide.

Then close the loop. After a meeting, the "Your next move" aftercare panel at /history shows what is still open (by owner), what was decided, and what changed since the last meeting. Jump to the transcript moment that justifies any result, file an accepted action as a human-approved issue through that same actuator flow, or draft a copyable follow-up. It is read-only and local: nothing is sent, and nothing runs, without your approval. See the Meeting Mode Guide.

AIPI-Lite companion

Pixel art AIPI-Lite companion device

AIPI-Lite is an optional ESPHome-based device you can carry between rooms. Put it on Wi-Fi (a phone hotspot works), and it gives you meeting-capture controls and status feedback. With Claude/Codex hooks on, it tells you when an agent is waiting so you can speak the reply back into the coding session. Buy the hardware from the official page or the Amazon listing; firmware and bridge setup are in the AIPI-Lite Developer Workflow.

Where to go next

I want to…	Read this
Browse all the docs	Documentation index
Get it running and verify my setup	Getting Started
Choose / configure a model	Models (bring your own)
See speech become a project-grounded task	The Dictation Copilot
Set up project-aware dictation for Codex / Claude	Intelligent Typing Setup
Review, correct, and replay past dictations	Dictation journal & replay
Use meeting mode and configure AI intelligence	Meeting Mode Guide
Wire up the AIPI-Lite companion	AIPI-Lite Developer Workflow
Install Claude / Codex agent hooks	Agent Hook Install
Understand what's stored and what can leave my machine	Security & Privacy

Configuration

Config lives at ~/.config/holdspeak/config.json, but you rarely edit it by hand. The Settings page in the web runtime exposes the hotkey, model, meeting intel, dictation pipeline, and presence options. The full reference is in Getting Started and the guides above.

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup (uv, the git hooks, the test command) and the commit-contract workflow. Recent changes are in CHANGELOG.md.

License

Licensed under the Apache License 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

karolswdev

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Jun 13, 2026

0.3.0

Jun 13, 2026

0.2.2

Jun 7, 2026

This version

0.2.1

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holdspeak-0.2.1.tar.gz (706.5 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

holdspeak-0.2.1-py3-none-any.whl (2.8 MB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file holdspeak-0.2.1.tar.gz.

File metadata

Download URL: holdspeak-0.2.1.tar.gz
Upload date: Jun 7, 2026
Size: 706.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`a78bc477f19e7ccfbd3f9642eec235407df21bc970047772724befae7537a8a5`
MD5	`be8475bbd3d1f2e4a1434d1f91064822`
BLAKE2b-256	`762f196be30785c2a24e37a4886128bb92b22e17edb4200dad76b65806c7f979`

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.1.tar.gz:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: holdspeak-0.2.1.tar.gz
- Subject digest: a78bc477f19e7ccfbd3f9642eec235407df21bc970047772724befae7537a8a5
- Sigstore transparency entry: 1750671642
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: karolswdev/HoldSpeak@24d9762834a6c8258f20c965f6147628bbba56aa
- Branch / Tag: refs/heads/main
- Owner: https://github.com/karolswdev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@24d9762834a6c8258f20c965f6147628bbba56aa
- Trigger Event: workflow_dispatch

File details

Details for the file holdspeak-0.2.1-py3-none-any.whl.

File metadata

Download URL: holdspeak-0.2.1-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for holdspeak-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e4ecc860c011c1d0980fac1232f4492f39d4213b592111819fd01b4cb2abfd2`
MD5	`6d7d8ada0b1322739482a200ccf47253`
BLAKE2b-256	`ffadf700324b4d482b53888d33db5880bf6179f7a0b12f01c7b7f7d97200e79b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for holdspeak-0.2.1-py3-none-any.whl:

Publisher: release.yml on karolswdev/HoldSpeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: holdspeak-0.2.1-py3-none-any.whl
- Subject digest: 5e4ecc860c011c1d0980fac1232f4492f39d4213b592111819fd01b4cb2abfd2
- Sigstore transparency entry: 1750671780
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: karolswdev/HoldSpeak@24d9762834a6c8258f20c965f6147628bbba56aa
- Branch / Tag: refs/heads/main
- Owner: https://github.com/karolswdev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@24d9762834a6c8258f20c965f6147628bbba56aa
- Trigger Event: workflow_dispatch

holdspeak 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

HoldSpeak

Why it's different

What it does, at a glance

See it learn

Quickstart

Upgrading and your data

Platform support

Meeting intelligence

AIPI-Lite companion

Where to go next

Configuration

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance