Skip to main content

Local, offline voice dictation for Linux, macOS, and Windows — hold a key, speak, release

Project description

YazSes

Local, offline voice dictation for Linux, macOS, and Windows. Hold a key, speak, release — the transcribed text appears in whatever app is focused. No cloud, no GPU.

Tests PyPI Get it from the Snap Store Apache 2.0

Hold the dictation key (>0.5s) → speak → release → text appears

Powered by faster-whisper (CPU/int8). Works in browsers, terminals, IDEs, chat apps — anywhere the OS lets keystrokes reach the focused window.


Supported platforms

OS Hotkey default Install Status
Linux Space apt / snap / PPA / pipx / .deb / installer Stable
macOS Right Option .dmg (Homebrew Cask coming) Developer preview (unsigned)
Windows Right Ctrl .exe installer (winget coming) Developer preview (unsigned)

Why Right Ctrl on Windows, not Right Alt? On many international layouts Right Alt acts as AltGr — used to type @, , {}, [], \, ~, etc. Hijacking it would break normal typing. Right Ctrl is rarely used for typing, so it's the safer default. Every platform's hotkey is configurable in config.toml.


Quick install

One-line install on every major OS:

# macOS  — via Homebrew tap
brew tap novafabric/yazses && brew install --cask yazses

# Windows  — via winget (pending PR review at microsoft/winget-pkgs#371427)
winget install NovaFabric.YazSes

# Linux  — via the apt repo
bash <(curl -fsSL https://raw.githubusercontent.com/novafabric/yazses/main/install.sh)

# Cross-platform fallback — pip
pipx install yazses

After install:

OS What's left
macOS Right-click → Open the first time (unsigned dev preview); grant Accessibility + Microphone when prompted; hold Right Option to dictate.
Windows If SmartScreen warns, click More info → Run anyway (unsigned dev preview); hold Right Ctrl to dictate.
Linux sudo usermod -aG input "$USER" then re-login; systemctl --user enable --now yazses.service; hold Space to dictate.

Full per-OS guides: docs/macos-install.md, docs/windows-install.md. Status of every distribution channel lives in docs/distribution-status.md.

Other channels

If a one-liner above doesn't fit your environment, pick from the platform sections below.

macOS — alternatives

# Direct .dmg download (no Homebrew needed)
# https://github.com/novafabric/yazses/releases/latest
# Open the .dmg, drag YazSes.app into /Applications, right-click → Open the first time.

Windows — alternatives

# Direct .exe download
# https://github.com/novafabric/yazses/releases/latest
# Click "More info → Run anyway" if SmartScreen warns.

Linux — alternatives

# APT repo (Debian/Ubuntu)
curl -fsSL https://novafabric.github.io/yazses/apt/KEY.gpg \
  | sudo gpg --dearmor --yes -o /usr/share/keyrings/yazses.gpg
echo "deb [signed-by=/usr/share/keyrings/yazses.gpg] https://novafabric.github.io/yazses/apt ./" \
  | sudo tee /etc/apt/sources.list.d/yazses.list
sudo apt update && sudo apt install yazses

# Launchpad PPA (Ubuntu)
sudo add-apt-repository ppa:novafabric/yazses
sudo apt update && sudo apt install yazses

# Snap (works on most distros after `snapd` is installed)
sudo snap install yazses --classic

# AUR (Arch / Manjaro / EndeavourOS)
yay -S yazses          # any AUR helper

# .deb download
# https://github.com/novafabric/yazses/releases/latest
sudo apt install ./yazses_*.deb

# pipx (any Linux)
sudo apt install libportaudio2 xdotool xclip pipx
pipx install yazses

Optional extras

v0.4.0 introduces three opt-in feature groups. Install only what you need:

Extra What it enables Dependencies installed
yazses[slm] SLM intent routing — natural phrasing for voice commands llama-cpp-python + GGUF model
yazses[lsp] LSP code context injection — better identifier accuracy pygls, pynvim
yazses[emg] EMG silent speech backend — dictate without speaking aloud pyserial
yazses[all] All optional extras all of the above
pip install "yazses[slm]"        # SLM routing only
pip install "yazses[lsp]"        # LSP context only
pip install "yazses[emg]"        # EMG backend only
pip install "yazses[all]"        # everything

Each extra requires additional setup described in the Configuration section below.


Usage

YazSes runs silently in the background. The same CLI works on every platform.

Command What it does
Hold the hotkey, speak, release Transcribe and inject text into focused app
yazses status Daemon state, model, hotkey, backend, uptime
yazses start / stop Manage the daemon
yazses doctor Per-platform prerequisite check
yazses inject "hello" Type text without recording (debug)
yazses remote <host> Forward voice typing to a remote SSH host
yazses remote --stop Disconnect active remote session
yazses enroll Calibration wizard for VAD / silence settings

On macOS and Windows the YazSes tray icon changes color to reflect state (idle / recording / transcribing / remote / error).

Voice commands (v0.4.0)

Speak natural commands while [commands] enabled = true (default). v0.4.0 adds a Tier 2 SLM routing layer (requires yazses[slm]) that handles natural, varied phrasing — you no longer need to say the exact canonical form:

Say (examples) Action
"undo" / "undo 3 times" Ctrl+Z (×N)
"save file" / "save this" / "save it" Ctrl+S
"delete 2 words" Ctrl+Backspace ×2
"delete 3 lines" Delete 3 lines
"go to line 42" Ctrl+G → "42" → Enter
"comment selection" Ctrl+/
"copy" / "paste" Ctrl+C / Ctrl+V
"scratch that" / "delete that" Remove text back to last sentence
"close this tab" / "close the current tab" Ctrl+W (SLM Tier 2)
"zoom in" / "make this bigger" Ctrl++ (SLM Tier 2)

Without yazses[slm], the Tier 1 regex grammar handles a fixed set of canonical phrases. With it, the SLM layer catches anything the regex misses, at the cost of ~50–200 ms additional latency per utterance.

Everything that does not match a command intent is typed verbatim.


Configuration

config.toml lives in the platform's standard config dir:

OS Path
Linux ~/.config/yazses/config.toml
macOS ~/Library/Application Support/yazses/config.toml
Windows %APPDATA%\yazses\config.toml
[stt]
model = "tiny.en"   # tiny.en (fast) | base.en (more accurate, slower)

[hotkey]
# "auto" → Space (Linux) / right_option (macOS) / right_ctrl (Windows).
key = "auto"
hold_threshold_ms = 500

[audio]
sample_rate = 16000
max_record_seconds = 90

[tray]
enabled = "auto"   # default true on macOS/Windows, false on Linux v0

[general]
log_level = "INFO"

# --- v0.3.0 additions (all optional — defaults shown) ---

[commands]
enabled = true          # voice command grammar (undo, save, go to line N, …)
profile = "auto"        # "auto" | "vscode" | "vim" | "default"

[filters.disfluency]
enabled = true          # remove filler words, repeated phrases, "scratch that"

[accessibility]
vad_threshold = 0.01    # silence threshold — run `yazses enroll` to calibrate
min_silence_ms = 500    # minimum silence to end a recording
pre_speech_padding_ms = 200   # prepend ring-buffer audio to catch voice onset

[streaming]
enabled = true          # emit stable partial transcripts while you speak
partial_interval_ms = 300

[remote]
default_host = ""       # SSH host for `yazses remote`
ssh_port = 22
agent_port = 9875
key_file = ""           # path to SSH private key (optional)

# --- v0.4.0 additions (all optional — defaults shown) ---

[commands]
# existing fields above…

# Tier 2 SLM routing (optional; requires `pip install yazses[slm]`)
# Download a GGUF model separately — TinyLlama (~700 MB) or Phi-3-mini (~2.2 GB).
slm_model_path = ""              # e.g. ~/.cache/yazses/models/tinyllama.gguf
slm_confidence_threshold = 0.75  # fall back to verbatim text below this score

# LSP code context injection (optional; requires `pip install yazses[lsp]`)
# Connects to Neovim or VS Code via LSP and feeds the active file's language,
# scope, and identifier list into Whisper's initial_prompt — significantly
# improves transcription accuracy for code identifiers spoken aloud.
lsp_enabled = false
lsp_editor = "auto"              # auto | neovim | vscode

[emg]
# EMG silent speech backend (optional; requires `pip install yazses[emg]` + device)
# Supported devices: YESP-protocol USB serial EMG headphones/wristbands.
# When active, replaces the hotkey-hold trigger — muscle signals start/stop capture.
device_port = ""                 # e.g. /dev/ttyUSB0, COM3
baud_rate = 115200
mode = "command"                 # command | full_text

# Map EMG gesture labels to voice-command strings (processed by the same
# grammar/SLM pipeline as spoken commands):
# [emg.command_map]
# save = "save file"
# undo = "undo"

How it works

                   ┌─────────────────────┐
                   │  EMG backend        │  ← v0.4.0 (optional)
                   │  (YESP USB serial)  │
                   └──────────┬──────────┘
                              │ (alternative trigger)
┌──────────────┐   ┌──────────▼───────┐   ┌──────────────────────────────┐
│ Hotkey hook  │──▶│ Audio (16kHz     │──▶│ faster-whisper (CPU / int8)  │
│ (per-OS API) │   │  PortAudio)      │   │                              │◀──┐
└──────────────┘   └──────────────────┘   └──────────────┬───────────────┘   │
                                                         │                   │
                                          ┌──────────────▼───────────────┐   │
                                          │  LspContextProvider          │───┘
                                          │  (injects initial_prompt)    │  ← v0.4.0 (optional)
                                          └──────────────────────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  disfluency filter           │  ← v0.3.0
                                          │  clean_text                  │
                                          └──────────────┬───────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  Tier 1: grammar classifier  │  ← v0.3.0
                                          │  (regex, zero latency)       │
                                          └──────────────┬───────────────┘
                                                         │ (unmatched intents)
                                          ┌──────────────▼───────────────┐
                                          │  Tier 2: SLM router          │  ← v0.4.0 (optional)
                                          │  (llama-cpp-python + GGUF)   │
                                          └──────────────┬───────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  Text injector               │
                                          │  (local or SSH remote)       │  ← v0.3.0
                                          └──────────────────────────────┘
       │
       └─────────── daemon process ──────────────────────────────────────
                          ▲
              JSON-RPC over Unix socket / named pipe
                          │
                ┌─────────┴─────────┐
                │     CLI / tray    │
                └───────────────────┘

Every platform-specific surface (keyboard hook, text injection, autostart, IPC, paths, permissions, tray) lives behind a single Protocol-based abstraction in src/yazses/platform/. Adding a fifth platform is a matter of writing one more sub-package.

Remote voice forwarding (v0.3.0)

Local machine                             Remote machine
─────────────────────────────────────     ──────────────────────
microphone → daemon → transcript ──SSH──▶ yazses-agent → injector
                                  tunnel       (types into remote app)

Start: yazses remote user@remote-host
Stop: yazses remote --stop

Only the transcript text travels over SSH — audio never leaves the local machine.

LSP code context injection (v0.4.0)

When lsp_enabled = true, YazSes connects to the running Neovim or VS Code LSP server and queries the active buffer for its language, current scope, and visible symbol names. This list is passed to faster-whisper as initial_prompt, biasing the model toward the identifiers actually present in the file. In practice this eliminates most transcription errors on camelCase and snake_case names spoken aloud.

Requires pip install yazses[lsp] and a running editor with an active LSP session.

EMG silent speech (v0.4.0)

When an EMG device is configured, muscle-signal onset/offset replaces the hotkey-hold trigger. Audio is captured normally; the user does not need to speak aloud — the EMG envelope alone gates recording. This is useful in open-plan offices or wherever speaking is impractical.

Supported protocol: YESP (USB CDC serial). Hardware examples: YESP-1 EMG headband, compatible wristbands. The mode = "full_text" setting attempts continuous dictation; mode = "command" maps gesture labels via [emg.command_map].

Requires pip install yazses[emg] and a compatible device.


Build from source

git clone https://github.com/novafabric/yazses
cd yazses
uv sync
uv run pytest tests/ -v   # 246 tests across all platforms

Platform-specific installers:

# macOS — produces dist/YazSes-<v>.dmg
./scripts/build-macos.sh

# Windows — produces dist/YazSes-<v>-windows-x64.exe
./scripts/build-windows.ps1

# Linux .deb
./scripts/build-deb.sh

CI builds the unsigned .dmg and .exe on every PR that touches the relevant code paths.


Troubleshooting

  • yazses doctor — first stop. Tells you what's missing on the current OS.
  • macOS: see docs/macos-install.md for Gatekeeper / Accessibility / Microphone.
  • Windows: see docs/windows-install.md for SmartScreen / antivirus / privacy.
  • Linux: confirm you're in the input group; check journalctl --user -u yazses.service -f.

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yazses-0.4.1.tar.gz (253.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yazses-0.4.1-py3-none-any.whl (103.5 kB view details)

Uploaded Python 3

File details

Details for the file yazses-0.4.1.tar.gz.

File metadata

  • Download URL: yazses-0.4.1.tar.gz
  • Upload date:
  • Size: 253.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yazses-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4a3e92ecf117512ecfc7e8d7eb3de72f483ba31b972e7e9b702faca298d6f5e8
MD5 4738d476f7e9fe5fed549def1644afbd
BLAKE2b-256 31168ef62619420beed7542224139121ea734073cf7985fea39f94a1cce3f4b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for yazses-0.4.1.tar.gz:

Publisher: release.yml on novafabric/yazses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yazses-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: yazses-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 103.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yazses-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e35b32e6d1de3ba9adf4caedd424a9d1fcbf4537d39c23ebcf68a872b1cfdbd8
MD5 4aae5859a85642ccd2232885e5ff8dfc
BLAKE2b-256 8a859a34b5758f424bfac29717f82d618f9b7264e053ea74aa5feb4c2b4aa4e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for yazses-0.4.1-py3-none-any.whl:

Publisher: release.yml on novafabric/yazses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page