Local, offline voice dictation for Linux, macOS, and Windows — hold a key, speak, release

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

YazSes

Hold a key → speak → release. On-device voice dictation that types into any app, plus voice commands and macros — entirely offline. No cloud. No API key. No subscription.

YazSes is an open-source, offline voice-dictation daemon for Linux, macOS, and Windows. It transcribes your speech locally with faster-whisper and types the result into whatever window has focus. Use it when you want hands-free dictation and editor/terminal voice commands without sending audio to Google, Apple, or Microsoft.

yazses doctor — all green, fully offline

Two versions of YazSes

This repo holds one product with two implementations — not two separate apps, but two generations of the same idea. The one you install and run is Part 1 (Python), on this main branch.

	Part 1 — Python · `main`	Rust HCI exploration · `archive/rust-hci-v1`
What it is	The shipping app — voice dictation, commands, macros	An early-stage rewrite exploring deeper human–computer interaction: an on-device agent (LLM tool-use, personal memory, editor awareness)
Status	✅ Active — current product (v1.4.0, installed & maintained)	⏸️ Paused / archived — not shipped, not installable
Hold-to-talk dictation	✅	✅
Offline STT	✅ faster-whisper (CPU int8)	✅ Whisper + Moonshine v2 (~9 ms)
Voice commands	✅ regex grammar (+ optional SLM router) → key sequences	✅ via LLM tool-calls
Voice macros · Mid-Thought Undo · Punch-In · Prosody Ink · Ghost Ahead	✅	❌
Dysfluency-Friendly Mode · learning corpus + `yazses tune`	✅	❌
Friendly CLI (`-h`, examples, `yazses update`)	✅	❌
On-device LLM agent (OS tools: git commit, media, notes, screenshots…)	❌ (optional offline text cleanup only)	✅
Personal memory (encrypted on-device vector store)	❌	✅
Editor context (Neovim / VS Code)	✅ LSP context, opt-in	✅ 5-tier window detection + bridges
Screen-reader integration (AT-SPI / NVDA)	❌	✅
Packaged & distributed (PyPI, snap, APT)	✅	❌

Bottom line: if you want YazSes, use Part 1 (this branch) — an offline dictation + voice-command daemon. The Rust branch is kept only for reference; nothing on main builds, installs, or depends on it. The Rust effort aimed at a more ambitious agentic HCI layer but was left in early stages — revisiting it is a deliberate future decision, not part of day-to-day work here.

Quick Start

Step 1 — Install (see all install options for every platform)

Platform	Command
Linux (Debian/Ubuntu)	`bash <(curl -fsSL https://raw.githubusercontent.com/MSKazemi/yazses/main/install-apt.sh)`
Linux (any distro)	`sudo snap install yazses`
Any OS (Python ≥ 3.11)	`pipx install yazses`

Step 2 — Provision the system (Linux — one command; the APT install does it automatically)

yazses setup        # installs audio + injection deps, joins the input group, sets up ydotoold
# then log out and back in (the input-group change needs a fresh login)

yazses setup fixes everything dictation needs and is safe to re-run — it only does what's missing:

libportaudio2 — audio capture (without it the daemon crashes on start with OSError: PortAudio library not found).
injection backends — xdotool/xclip (X11) and wtype/ydotool/wl-clipboard (Wayland).
input group — required to read the hold-to-talk hotkey from the kernel.
ydotoold — the virtual-input daemon. On GNOME/KDE Wayland this is the only way to inject keystrokes (wtype is blocked there), so setup installs and enables it.

Prefer to do it by hand? sudo apt install libportaudio2 xdotool ydotool wtype xclip wl-clipboard pipx && sudo usermod -aG input "$USER", then enable ydotoold (see install-linux). Verify anytime with yazses doctor — you want [OK] Keyboard capture, [OK] Microphone, and [OK] Injection. macOS/Windows skip this step (grant Accessibility/permissions when prompted — see below).

Step 3 — Set up

yazses doctor               # check mic, injection backend, permissions (want all [OK])
yazses enroll               # calibrate your microphone (~30 seconds)
yazses start                # start the dictation daemon

Step 4 — Use it — hold the hotkey, speak, release. The text is typed into the focused app.

OS	Hold this key	Say…
Linux	`Space`	"the quick brown fox" (types it) · "go to line 42" · "run the tests"
macOS	`Right Option`	"delete the last word" · "save file" · "new function parse config"
Windows	`Right Ctrl`	"undo that" · "select all" · "comment this line"

Release the key — YazSes transcribes and acts within about a second.

First time on macOS? v0 builds are unsigned: right-click the app → Open (Gatekeeper), then grant Accessibility + Microphone when prompted.

First time on Windows? If SmartScreen warns you, click More info → Run anyway.

What you can say

Hold the key and just talk — by default everything you say is typed at the cursor. YazSes also recognises a set of voice commands (a fast regex grammar; an optional ~0.5B SLM router catches phrasings the grammar misses) that map to editor/terminal key sequences instead of being typed:

Say something like…	What happens
"the quick brown fox"	Types the text at the cursor (dictation)
"delete the last three words"	Deletes the last 3 words
"undo that" / "undo five times"	Sends undo
"save file" · "copy" · "paste"	Save / copy / paste
"select all" · "select to end"	Selection commands
"comment this line"	Toggles a comment
"go to line 42"	Jumps to line 42
"go to function parse_config"	Jumps to the symbol (via LSP, opt-in)
"run the tests" / "run the build"	Runs the editor/terminal action
"rename this to user_id"	Renames the symbol

You can also define multi-step macros and a personal vocabulary of mis-heard words — see the CLI reference.

How it works

Hold hotkey → record audio → VAD gate → faster-whisper (CPU) → clean + disfluency filter
            → command grammar (Tier 1 regex, optional Tier 2 SLM router)
            → dictate? type the text   ·   command? send the key sequence

Everything runs on your CPU — no GPU, no network. Transcription uses faster-whisper (int8). A fast regex grammar classifies each utterance as dictation or a command; when its confidence is low, an optional ~0.5B SLM router takes a second look. The result appears in the focused window within about a second on a modern laptop.

Models:

Speech-to-text: faster-whisper — tiny.en (fast) / base.en / small.en (more accurate), int8 on CPU
Command routing (optional): Qwen2.5-0.5B SLM for Tier 2 intent classification — not required for dictation, fetched with yazses model download
Dictation cleanup (optional, off by default): a small offline LLM can tidy grammar/punctuation; length- and token-preservation guards stop it rewriting meaning

Requirements


OS	Linux (primary) · macOS 11+ · Windows 10 (21H2)+
RAM	4 GB minimum · 8 GB comfortable
Disk	~250 MB–1 GB for the faster-whisper model (downloaded on first run)
CPU	2+ cores · no GPU required
Mic	Any USB or built-in microphone

Key features

Fully offline — no audio, no text, nothing leaves the machine by default; no cloud, API key, or subscription
Hold-to-talk dictation — type into any focused app on Linux, macOS, or Windows
Voice commands — editor/terminal actions (undo, save, go-to-line, run tests, rename…) via regex grammar + an optional SLM router
Macros & personal vocabulary — define multi-step commands and teach YazSes your mis-heard words
Dysfluency-Friendly Mode — opt-in collapse of stutters/repeats (b-b-because → because) for stuttered or dysarthric speech
Self-improving — opt-in, encrypted on-device learning corpus; yazses tune proposes accuracy fixes from your own corrections (nothing leaves the machine)
Editor context — optional Neovim / VS Code LSP context improves accuracy on code identifiers
Accessibility — VAD calibration wizard, mic-level tuning, and EMG (muscle-sensor) trigger support for motor-disability use
Voice-activity overlay — optional sonar rings near the cursor while you speak

Limitations / when not to use YazSes

Not an LLM agent. YazSes dictates text and runs editor/terminal commands. It does not browse, reason over your files, set timers, or hold a conversation — that was the paused Rust exploration (see Two versions above).
CPU faster-whisper, not a cloud service. For the absolute lowest word-error rate on a noisy mic, a cloud STT may still beat it; the trade-off is that nothing leaves your machine.
English-tuned by default. It ships with *.en Whisper models; other languages need a different model.
Desktop only. No mobile or web build.

CLI commands

Command	Description
`yazses start`	Start the YazSes daemon in the background (restarts cleanly if one is already running)
`yazses restart`	Stop all daemons (including detached) and start exactly one
`yazses stop`	Stop the running daemon
`yazses status`	Show daemon status — queries the daemon over IPC when reachable
`yazses doctor`	Check prerequisites (version, daemon, model, mic, injection backend, permissions)
`yazses enroll`	Calibrate your microphone — tunes `vad_threshold` for your voice and room
`yazses mic-level`	Measure mic speech level and recommend (or `--set`) the VAD threshold
`yazses features`	List capabilities and toggle them (`enable`/`disable <name>`)
`yazses vocab`	Personal dictionary of mis-heard words (`add`/`list`/`remove`)
`yazses hotkey`	Show or change the hold-to-talk key (`set`) and the dedicated command key (`command`)
`yazses overlay`	Launch the sonar voice-activity overlay (requires the `overlay` extra)
`yazses inject TEXT`	Type arbitrary text into the focused window — test injection without speaking
`yazses say TEXT`	Speak text aloud (offline TTS)
`yazses test`	End-to-end self-test: focuses a window and types `YazSes OK`
`yazses logs`	Show the daemon diagnostic log (metadata only — no dictated text is stored)
`yazses mark-wrong`	Flag the last dictation as a misrecognition (feeds the learning corpus)
`yazses tune`	Analyse the learning corpus and propose accuracy improvements; `--apply` to write changes
`yazses corpus`	Manage the local learning corpus (`status`, `forget`, `destroy`)
`yazses model`	List or download the optional SLM intent-routing model
`yazses remote HOST`	Forward voice typing to a remote host over SSH

Configuration

Config file location:

OS	Path
Linux	`~/.config/yazses/config.toml`
macOS	`~/Library/Application Support/yazses/config.toml`
Windows	`%APPDATA%\yazses\config.toml`

Prefer yazses features / yazses hotkey / yazses vocab to edit config safely (they preserve comments). Essential settings:

[stt]
model = "small.en"          # tiny.en (fast) | base.en | small.en (accurate); CPU int8
initial_prompt = ""         # vocabulary/context primed into Whisper

[hotkey]
key = "space"               # hold-to-talk key (yazses hotkey set <key>)
command_key = ""            # optional dedicated key that forces command mode
hold_threshold_ms = 500     # how long to hold before recording starts

[audio]
sample_rate = 16000
max_record_seconds = 90

[injection]
backend = "auto"            # auto | xdotool | ydotool | wtype | clipboard

[accessibility]
vad_threshold = 0.0008      # lower for quiet speech, raise if room noise triggers (yazses mic-level --set)

See the CLI reference and examples/config.example.toml for all options.

Microphone not working?

If YazSes does nothing and the log shows Silent audio -- discarding, your speech is below the VAD threshold:

yazses mic-level --set   # measure your voice and set the right threshold
yazses restart

All install options

Linux

# APT script — Debian / Ubuntu (recommended)
bash <(curl -fsSL https://raw.githubusercontent.com/MSKazemi/yazses/main/install-apt.sh)

# Snap — any distro (strict confinement; keystroke injection works on X11.
# On Wayland, prefer pipx below for full input access.)
sudo snap install yazses

# pipx — any distro with Python ≥ 3.11
# Debian/Ubuntu runtime deps. libportaudio2 = audio capture (required);
# xdotool/xclip = X11 injection+clipboard; wtype/ydotool/wl-clipboard = Wayland.
# Installing all of them makes YazSes work on either session type.
sudo apt install libportaudio2 xdotool ydotool wtype xclip wl-clipboard pipx
sudo usermod -aG input "$USER"   # hotkey access — then log out and back in
pipx install yazses

macOS

# pipx (Python ≥ 3.11)
pipx install yazses

# App bundle (.dmg) — unsigned developer preview
# https://github.com/MSKazemi/yazses/releases/latest

Windows

# pipx (Python ≥ 3.11)
pipx install yazses

# Installer (.exe) — unsigned developer preview
# https://github.com/MSKazemi/yazses/releases/latest

Documentation


Install on Linux	Detailed Linux guide — permissions, injection backends, service setup
Install on macOS	Gatekeeper, Accessibility, Microphone permissions
Install on Windows	SmartScreen, antivirus exceptions, privacy settings
CLI reference	All commands and flags (incl. macros & vocabulary for custom voice commands)
Privacy statement	What stays on-device, what is never collected

Development

YazSes (Part 1) is a Python project managed with uv:

git clone https://github.com/MSKazemi/yazses
cd yazses
uv sync
uv run python -m pytest tests/ -v
bash scripts/install-local.sh        # install locally + run as a user service

Rust HCI exploration (archived)

The early-stage Rust rewrite lives on the archive/rust-hci-v1 branch, not on main. It is not built or installed by anything here — see Two versions of YazSes above for what it does and doesn't have. To look at it:

git checkout archive/rust-hci-v1
cargo build && cargo test --workspace   # optional backends: whisper, moonshine, llama-cpp, ollama, silero

License

Apache 2.0 — see LICENSE.

If YazSes is useful to you, a ⭐ on GitHub and a mention in your project, blog, or talk is the best way to support continued development.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mskazemi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.12.0.dev4 pre-release

Jul 31, 2026

This version

1.4.1

Jul 1, 2026

1.4.0

Jul 1, 2026

1.3.9

Jul 1, 2026

1.3.8

Jul 1, 2026

1.3.7

Jul 1, 2026

1.3.6

Jun 27, 2026

1.3.5

Jun 27, 2026

1.3.4

Jun 27, 2026

1.3.3

Jun 27, 2026

1.3.2

Jun 27, 2026

1.3.1

Jun 27, 2026

1.3.0

Jun 23, 2026

1.2.0

Jun 20, 2026

1.0.0

Jun 19, 2026

0.4.1

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yazses-1.4.1.tar.gz (3.3 MB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yazses-1.4.1-py3-none-any.whl (211.5 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file yazses-1.4.1.tar.gz.

File metadata

Download URL: yazses-1.4.1.tar.gz
Upload date: Jul 1, 2026
Size: 3.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yazses-1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`17f6fac0a47bdb3680c665fa33d98cafb792fc913615f3b0e7673b797a3e346f`
MD5	`a096d895e3891758400e784082e63761`
BLAKE2b-256	`171462d6bc05a8c931b42231364c48b9194eb60803bc1fcbe9c4d89791fe45a7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yazses-1.4.1.tar.gz:

Publisher: release.yml on MSKazemi/yazses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yazses-1.4.1.tar.gz
- Subject digest: 17f6fac0a47bdb3680c665fa33d98cafb792fc913615f3b0e7673b797a3e346f
- Sigstore transparency entry: 2041664078
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: MSKazemi/yazses@b078f5057abc5063e9413c4162b61257dcb42566
- Branch / Tag: refs/tags/v1.4.1
- Owner: https://github.com/MSKazemi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b078f5057abc5063e9413c4162b61257dcb42566
- Trigger Event: push

File details

Details for the file yazses-1.4.1-py3-none-any.whl.

File metadata

Download URL: yazses-1.4.1-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 211.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yazses-1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cea21f6b07d60bd1575ef333b5160929e98881f23a6a8bbb390ccb9525f7f4a`
MD5	`ff9abf50af903da767efadbba04eabe4`
BLAKE2b-256	`e1fef3370893514c84959c053670f583003c5f1c2be90310724286f2d84e23d1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yazses-1.4.1-py3-none-any.whl:

Publisher: release.yml on MSKazemi/yazses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yazses-1.4.1-py3-none-any.whl
- Subject digest: 8cea21f6b07d60bd1575ef333b5160929e98881f23a6a8bbb390ccb9525f7f4a
- Sigstore transparency entry: 2041664378
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: MSKazemi/yazses@b078f5057abc5063e9413c4162b61257dcb42566
- Branch / Tag: refs/tags/v1.4.1
- Owner: https://github.com/MSKazemi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b078f5057abc5063e9413c4162b61257dcb42566
- Trigger Event: push

yazses 1.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

YazSes

Two versions of YazSes

Quick Start

What you can say

How it works

Requirements

Key features

Limitations / when not to use YazSes

CLI commands

Configuration

Microphone not working?

All install options

Linux

macOS

Windows

Documentation

Development

Rust HCI exploration (archived)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance