Skip to main content

Communicate with your favorite AI model by talking to it.

Project description

Spych

PyPI version License: MIT PyPI Downloads

Spych (pronounced "speech"): Talk with your computer like it's your personal assistant without sending your voice to the cloud.

A lightweight, fully offline Python toolkit for wake word detection, audio transcription, spoken AI responses, and AI integrations. Built on faster-whisper, PvRecorder, and Kokoro.

API Docs: https://connor-makowski.github.io/spych/spych.html


Installation

Recommended: pipx

pipx install spych

Alternative: pip

pip install spych

TTS Extras

By default, Spych automatically installs the right TTS backend for your Python version. You can also install explicitly:

pipx install "spych[kokoro]"       # Fast, lightweight (Python < 3.13 recommended)
pipx install "spych[chatterbox]"   # High-quality voice cloning (Python >= 3.13 required)

Quick Start

# Navigate to your project directory first
cd ~/my_project

# Voice-control Claude Code — say "hey claude" to trigger
spych claude

# Use a personality preset — say "hey jarvis" to trigger
spych claude --personality jarvis

# Voice-control a local Ollama model — say "hey llama" to trigger
spych ollama --model llama3.2:latest

💡 Pro tip: Saying "Hey Claude" or "Hey Llama" tends to trigger more reliably than the bare wake word.

Say "terminate" (or press Ctrl+C) to stop any session.


CLI

Available Agents

All agents require their respective CLI tool to be installed and authenticated before use.

Command Alias Description Default wake words
spych claude_code_cli Voice-control Claude Code via the CLI claude, clod, cloud, clawed
spych claude_code_sdk spych claude Voice-control Claude Code via the Agent SDK claude, clod, cloud, clawed
spych codex_cli spych codex Voice-control the OpenAI Codex agent codex
spych gemini_cli spych gemini Voice-control the Google Gemini agent gemini, google
spych opencode_cli spych opencode Voice-control the OpenCode agent opencode, open code
spych ollama Talk to a local Ollama model llama, ollama, lama

Available Utilities

The following utilities are also available as CLI commands. They don't use wake words, but serve various auxiliary functions like live transcription and voice profiling.

Command Description
spych --version Print the version number and exit
spych --help Show detailed usage instructions and exit
spych live Continuous speech-to-text transcription to file
spych multi Run multiple agents simultaneously
spych profile_my_voice Record a voice sample for TTS cloning

Global Flags

These must be placed before the agent name:

spych --theme light claude
Flag Options Default Description
--theme dark, light, solarized, mono dark Terminal colour theme

Common Flags

All agent subcommands accept these flags:

Flag Default Description
--personality NAME Apply a named preset (sets wake words, voice, name, style)
--name NAME (agent default) Custom display name shown in the terminal
--wake-words WORD [...] (agent default) One or more words that trigger the agent
--terminate-words WORD [...] terminate Words that stop the listener
--listen-duration SECONDS 0 (VAD auto) Seconds to record after wake word
--follow-up-listen-duration SECONDS 0 Seconds to listen for a follow-up answer
--inactivity-timeout SECONDS 4.0 Seconds of silence before returning to wake word
--use-speaker BOOL true Speak responses aloud via TTS
--speaker-voice VOICE af_heart Voice name for spoken responses
--speaker-backend BACKEND (auto) chatterbox or kokoro
--response-style STYLE Style preset or custom instruction for spoken output

Coding agents (claude, codex, gemini, opencode) also accept:

Flag Default Description
--continue-conversation BOOL true Resume the most recent session
--show-tool-events BOOL true Print live tool start/end events

Agent-specific flags:

Agent Flag Default Description
ollama --model llama3.2:latest Ollama model name
ollama --history-length 10 Past interactions to include in context
ollama --host http://localhost:11434 Ollama instance URL
opencode_cli --model Model in provider/model format
claude_code_sdk --setting-sources user project local Claude Code settings sources

Personalities

Personalities are named presets that bundle a wake word list, voice, display name, and response style into a single flag. Any explicit flag overrides the preset.

spych claude --personality jarvis
# equivalent to:
spych claude --name "J.A.R.V.I.S." --wake-words jarvis jarves \
             --speaker-voice bm_george --use-speaker true \
             --response-style jarvis
Name Wake words Voice Style
assistant assistant, helper, computer af_heart assistant — helpful, precise, informative
friend friend, buddy, pal af_amy friendly — warm and simple
jarvis jarvis, jarves, jargus, jervis bm_george jarvis — precise, dry wit, "sir"
pirate blackbeard, pirate, ahoy am_michael pirate — pirate speak, colorful
news_anchor bella, news anchor, anchor af_bella news_anchor — professional broadcast tone
robot rob, robot am_adam robot — monotone, literal
caveman er, ur, caveman, cave man am_onyx caveman — very simple, direct

Response Styles

The --response-style flag shapes how the agent formats its spoken output.

Style Description
assistant Helpful and precise, concise and informative
concise Key points only, direct
friendly Warm, approachable, simple language
military Brevity-style, short sentences
five_year_old Simple words, very short
fast As brief as reasonably possible
pirate Pirate speak, colorful
news_anchor Professional broadcast tone
haiku 5-7-5 haiku form
shakespearean Elizabethan English
robot Monotone, literal
caveman Very simple, direct
yoda Inverted sentence structure
jarvis J.A.R.V.I.S. from Iron Man — precise, dry wit, addresses user as "sir"

You can also pass any custom instruction string directly: --response-style "Reply in exactly one sentence.".


Text-to-Speech & Voices

Spoken responses are enabled by default for personality presets and when --use-speaker true is set.

spych claude --use-speaker true --speaker-voice bm_george
spych claude --use-speaker true --speaker-backend kokoro
spych claude --use-speaker false   # disable TTS

When TTS is active, short responses are spoken verbatim; longer ones use the agent's short summary. If the response ends with a question, Spych automatically listens for a follow-up — no wake word required.

TTS Backends

Backend Best for Python support
Chatterbox (default priority) Natural voices, zero-shot voice cloning 3.11+ (required for 3.13+)
Kokoro (lightweight fallback) Fast, low-resource devices (e.g. Raspberry Pi) 3.11–3.12 recommended

Spych tries Chatterbox first, then Kokoro. Use --speaker-backend to force one explicitly.

Available Voices

The same voice names work for both backends.

American English (am_ / af_):

Voice Gender Grade
af_heart F A (default)
af_bella F A-
af_nicole F B-
am_michael M C+
am_fenrir M C+
am_puck M C+

British English (bm_ / bf_):

Voice Gender Grade
bf_emma F B-
bf_isabella F C
bm_george M C

Voice Cloning

Record a 10-second sample of your voice, then use it as the speaker voice. Requires the Chatterbox backend.

# Step 1: record your profile
spych profile_my_voice --name my_voice

# Step 2: use it
spych claude --use-speaker true --speaker-voice my_voice --speaker-backend chatterbox

# Or use any .wav file directly
spych claude --use-speaker true --speaker-voice /path/to/my_voice.wav --speaker-backend chatterbox

Live Transcription

spych live continuously records from the microphone using VAD and writes the transcript to disk in real time. No wake word required — it transcribes everything until stopped.

CLI

spych live                                                 # writes transcript.srt
spych live --output-path meeting --output-format both
spych live --terminate-words "stop recording"
spych live --no-timestamps --whisper-model small.en

Stop by pressing the stop key (default: q + Enter), saying a terminate word, or pressing Ctrl+C.

Parameters

Flag Default Description
--output-path PATH transcript Base output file path without extension
--output-format FORMAT srt txt, srt, or both
--no-timestamps false Omit timestamps from terminal and .txt output
--stop-key KEY q Key (then Enter) to stop the session
--terminate-words WORD [...] Spoken words that stop the session
--device-index N -1 Microphone device index; -1 uses system default
--whisper-model MODEL base.en faster-whisper model name
--whisper-device DEVICE cpu cpu or cuda
--whisper-compute-type TYPE int8 int8, float16, or float32
--no-speech-threshold FLOAT 0.3 Whisper segments above this no_speech_prob are dropped
--speech-threshold FLOAT 0.5 VAD speech onset probability
--silence-threshold FLOAT 0.35 VAD silence probability during speech
--silence-frames N 20 Consecutive silent frames to end a segment (~32ms each)
--speech-pad-frames N 5 Pre-roll frames and onset confirmation count
--max-speech-duration SECONDS 30.0 Hard cap on a single segment
--context-words N 32 Trailing words passed as whisper initial_prompt

Python

from spych.live import SpychLive

SpychLive(
    output_format="srt",          # "txt", "srt", or "both"
    output_path="my_transcript",  # written to my_transcript.srt
    show_timestamps=True,
    stop_key="q",                 # type q + Enter to stop
    terminate_words=["stop recording"],
).start()

SpychLive Parameters

Parameter Default Description
output_format "srt" Output format(s): "txt", "srt", or "both"
output_path "transcript" Base path without extension
show_timestamps True Prepend [HH:MM:SS] timestamps to terminal and .txt output
stop_key "q" Key (then Enter) to stop the session
terminate_words None Spoken words that stop the session
on_terminate None No-argument callback executed when a terminate word fires
device_index -1 Microphone device index; -1 uses system default
whisper_model "base.en" faster-whisper model name
whisper_device "cpu" Device for inference: "cpu" or "cuda"
whisper_compute_type "int8" Compute precision: "int8", "float16", or "float32"
no_speech_threshold 0.4 Whisper segments above this are discarded
speech_threshold 0.5 Silero VAD onset probability
silence_threshold 0.35 Silero VAD silence probability during speech
silence_frames_threshold 20 Consecutive silent frames to close a segment
speech_pad_frames 5 Pre-roll frame count and onset confirmation threshold
max_speech_duration_s 30.0 Hard cap on a single segment in seconds
context_words 32 Trailing transcript words passed as initial_prompt

Multi-agent

Run several agents simultaneously under a single listener, each bound to its own wake words. Say "hey claude" to talk to Claude, "hey llama" to talk to Ollama — all in the same terminal session.

CLI

# Two agents, default wake words
spych multi --agents claude gemini

# Include Ollama with a specific model
spych multi --agents claude ollama --ollama-model llama3.2:latest

# Tune listen duration across all agents
spych multi --agents claude codex --listen-duration 8

Multi-agent CLI Flags

Flag Default Description
--agents AGENT [...] (required) Agents to run: claude (claude_code_cli), claude_sdk (claude_code_sdk), codex (codex_cli), gemini (gemini_cli), opencode (opencode_cli), ollama
--terminate-words WORD [...] terminate Words that stop all agents
--listen-duration SECONDS 5 Seconds to listen after a wake word
--follow-up-listen-duration SECONDS 0 Seconds to listen for follow-up answers
--inactivity-timeout SECONDS 4.0 Seconds of silence before returning to wake word
--continue-conversation BOOL true Resume the most recent session for each coding agent
--show-tool-events BOOL true Print live tool start/end events
--use-speaker BOOL true Speak responses aloud via TTS
--speaker-backend BACKEND (auto) chatterbox or kokoro
--ollama-model MODEL llama3.2:latest Only used when ollama is in --agents
--ollama-host URL http://localhost:11434 Only used when ollama is in --agents
--ollama-history-length N 10 Only used when ollama is in --agents
--opencode-model MODEL provider/model format. Only used when opencode_cli is in --agents
--setting-sources SOURCE [...] user project local Only used when claude_code_sdk is in --agents

Python

from spych.core import Spych
from spych.orchestrator import SpychOrchestrator
from spych.agents.claude import LocalClaudeCodeCLIResponder
from spych.agents.ollama import OllamaResponder

spych_object = Spych(whisper_model="base.en")

SpychOrchestrator(
    entries=[
        {
            "responder": LocalClaudeCodeCLIResponder(spych_object=spych_object),
            "wake_words": ["claude", "clod", "cloud", "clawed"],
            "terminate_words": ["terminate"],
        },
        {
            "responder": OllamaResponder(spych_object=spych_object, model="llama3.2:latest"),
            "wake_words": ["llama", "ollama", "lama"],
        },
    ]
).start()

OrchestratorEntry Keys

Key Required Default Description
responder A BaseResponder instance
wake_words Words that trigger this responder. Must be unique across all entries
terminate_words ["terminate"] Words that stop the entire orchestrator

SpychOrchestrator Parameters

Parameter Default Description
entries (required) List of OrchestratorEntry dicts
spych_wake_kwargs None Extra kwargs forwarded to SpychWake

Python — Built-in Agents

The same agents available from the CLI can be used directly from Python.

Claude Code CLI

from spych.agents import claude_code_cli

# Say "hey claude" to trigger
claude_code_cli()

Claude Code SDK

from spych.agents import claude_code_sdk

# Say "hey claude" to trigger
claude_code_sdk()

Codex CLI

from spych.agents import codex_cli

# Say "hey codex" to trigger
codex_cli()

Gemini CLI

from spych.agents import gemini_cli

# Say "hey gemini" to trigger
gemini_cli()

OpenCode CLI

from spych.agents import opencode_cli

# Say "hey opencode" to trigger
opencode_cli()

Ollama

from spych.agents import ollama

# Pull the model first: ollama pull llama3.2:latest
# Say "hey llama" to trigger
ollama(model="llama3.2:latest")

Coding Agent Parameters

Parameter claude_code_cli claude_code_sdk codex_cli gemini_cli opencode_cli Description
name Claude Claude Codex Gemini OpenCode Custom display name
wake_words ["claude", "clod", "cloud", "clawed"] ["claude", "clod", "cloud", "clawed"] ["codex"] ["gemini", "google"] ["opencode", "open code"] Words that trigger the agent
terminate_words ["terminate"] ["terminate"] ["terminate"] ["terminate"] ["terminate"] Words that stop the listener
model None Model in provider/model format
listen_duration 0 0 0 0 0 Seconds to listen (0 = VAD auto)
continue_conversation True True True True True Resume the most recent session
setting_sources ["user", "project", "local"] Claude Code settings sources
show_tool_events True True True True True Print live tool start/end events
use_speaker False False False False False Speak responses aloud via TTS
speaker_voice "af_heart" "af_heart" "af_heart" "af_heart" "af_heart" Voice name for TTS
response_style "" "" "" "" "" Style preset or custom instruction
spych_kwargs Extra kwargs passed to Spych
spych_wake_kwargs Extra kwargs passed to SpychWake

Ollama Parameters

Parameter Default Description
name "Ollama" Custom display name
wake_words ["llama", "ollama", "lama"] Words that trigger the agent
terminate_words ["terminate"] Words that stop the listener
model "llama3.2:latest" Ollama model name
listen_duration 0 Seconds to listen (0 = VAD auto)
history_length 10 Past interactions to include in context
host "http://localhost:11434" Ollama instance URL
use_speaker False Speak responses aloud via TTS
speaker_voice "af_heart" Voice name for TTS
response_style "" Style preset or custom instruction
spych_kwargs None Extra kwargs passed to Spych
spych_wake_kwargs None Extra kwargs passed to SpychWake

Python: Building Your Own Agent

Subclass BaseResponder, implement respond, and Spych handles the rest: wake word detection, transcription, spinner UI, timing, TTS, error handling.

respond() must return an AgentResponse. Use self.format_prompt() to inject the JSON schema into your prompt and self.parse_output() to parse the result:

from spych.responders import BaseResponder, AgentResponse

class MyResponder(BaseResponder):
    def respond(self, user_input: str) -> AgentResponse:
        raw = call_my_llm(self.format_prompt(user_input))
        return self.parse_output(raw)

A complete working example with a custom wake word:

from spych import Spych, SpychOrchestrator
from spych.responders import BaseResponder, AgentResponse

class EchoResponder(BaseResponder):
    def respond(self, user_input: str) -> AgentResponse:
        return AgentResponse(
            response=f"'{self.name}' heard: {user_input}",
            summary=f"Heard: {user_input}",
            requires_user_feedback=False,
        )

SpychOrchestrator(
    entries=[
        {
            "responder": EchoResponder(
                spych_object=Spych(whisper_model="base.en"),
                listen_duration=5,
                name="TestResponder",
            ),
            "wake_words": ["test"],
            "terminate_words": ["terminate"],
        }
    ]
).start()

You can also subclass a built-in agent. For example, a translation agent that routes to Ollama:

from spych import Spych, SpychOrchestrator
from spych.agents import OllamaResponder
from spych.responders import AgentResponse

class Spanish(OllamaResponder):
    def respond(self, user_input: str) -> AgentResponse:
        user_input = f"Translate the following to Spanish and return only the translated text: '{user_input}'"
        return super().respond(user_input)

class German(OllamaResponder):
    def respond(self, user_input: str) -> AgentResponse:
        user_input = f"Translate the following to German and return only the translated text: '{user_input}'"
        return super().respond(user_input)

spych_object = Spych(whisper_model="base.en")

SpychOrchestrator(
    entries=[
        {
            "responder": Spanish(spych_object=spych_object, name="SpanishTranslator", model="llama3.2:latest"),
            "wake_words": ["spanish"],
            "terminate_words": ["terminate"],
        },
        {
            "responder": German(spych_object=spych_object, name="GermanTranslator", model="llama3.2:latest"),
            "wake_words": ["german"],
            "terminate_words": ["terminate"],
        },
    ]
).start()

Think your agent would be useful to others? Open a PR or file a feature request via a GitHub issue.


Python: Lower-Level API

Need more control? Use Spych and SpychWake directly.

Transcription

from spych import Spych

spych = Spych(
    whisper_model="base.en",  # tiny, small, medium, large — all faster-whisper models work
    whisper_device="cpu",     # use "cuda" for Nvidia GPU
)

print(spych.listen(duration=5))

See: https://connor-makowski.github.io/spych/spych/core.html

Wake Word Detection

from spych import SpychWake, Spych

spych = Spych(whisper_model="base.en", whisper_device="cpu")

def on_wake():
    print("Wake word detected! Listening...")
    print(spych.listen(duration=5))

SpychWake(
    wake_word_map={"speech": on_wake},
    whisper_model="tiny.en",
    whisper_device="cpu",
).start()

See: https://connor-makowski.github.io/spych/spych/wake.html


API Reference

Full docs including all parameters and methods: https://connor-makowski.github.io/spych/spych.html


Support

Found a bug or want a new feature? Open an issue on GitHub.


Contributing

Contributions are welcome!

  1. Fork the repo and clone it locally.
  2. Make your changes.
  3. Run tests and make sure they pass.
  4. Commit atomically with clear messages.
  5. Submit a pull request.

Virtual environment setup:

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./utils/test.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spych-4.0.0.tar.gz (72.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spych-4.0.0-py3-none-any.whl (88.7 kB view details)

Uploaded Python 3

File details

Details for the file spych-4.0.0.tar.gz.

File metadata

  • Download URL: spych-4.0.0.tar.gz
  • Upload date:
  • Size: 72.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for spych-4.0.0.tar.gz
Algorithm Hash digest
SHA256 aede89e974fd871e6846d0532c133be061fc9dd2a1decd5fbf68200a6a64d35d
MD5 a825f91f198ee16c63d85c0763f44d29
BLAKE2b-256 1d93560482d29b6347cab4dfed0bc2330a7ec65312483e91300adcaa0edbd75f

See more details on using hashes here.

File details

Details for the file spych-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: spych-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 88.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for spych-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c9731755a4166bf7f080147dfe0f1194451e360f8040765a7e240e4a0539bc0
MD5 2bdd1098de8baf92c3e3b186585705ec
BLAKE2b-256 7c35344ce26489b18b91f1bc04b806c1df8333a74aeb532a1ba30ed23ff99c5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page