Offline TTS library using Kokoro-82M

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

stackvox

Offline TTS using Kokoro-82M via kokoro-onnx. Apache 2.0 model, ~340MB, CPU real-time, plays straight to system audio. Designed to be importable as a Python library, drivable as a CLI, or poked via a unix socket for ~13ms speech requests from shell scripts.

Install

From PyPI — recommended for most users:

pipx install stackvox    # global CLI (`stackvox` and `stackvox-say` on PATH)
# or
pip install stackvox     # use as a library

From git, if you want an unreleased commit:

pipx install git+https://github.com/StackOneHQ/stackvox.git
# upgrade later with: pipx install --force git+https://github.com/StackOneHQ/stackvox.git

Dev install from a clone:

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Model + voice files auto-download to ~/.cache/stackvox/ on first use. Override with STACKVOX_CACHE_DIR.

CLI

stackvox "Hello world"              # synthesize and play in-process
stackvox speak "Hi" --voice bf_emma # same, explicit subcommand
stackvox speak "save" --out a.wav   # write wav instead of playing
stackvox welcome                    # multilingual welcome (6 languages)
stackvox voices                     # list all voice ids

Daemon mode (keeps the model resident so each subsequent call is instant):

stackvox serve         # foreground; run with `nohup stackvox serve &` to background
stackvox status        # is the daemon up?
stackvox say "Hello"   # send text to the daemon (fails if not running)
stackvox stop          # graceful shutdown

`stackvox-say` (bash helper, ~13ms)

When you want minimum latency from shell scripts (hooks, CI steps, etc.), skip the Python client and use the bash helper — it talks directly to the daemon's unix socket via nc:

stackvox-say "back to you in 5"
stackvox-say --voice bf_emma --speed 1.1 "hello"
stackvox-say --fallback-say "text"     # shell out to macOS `say` if daemon is down

Exit codes: 0 ok, 2 daemon unreachable (unless --fallback-say was given).

Python library

from stackvox import Stackvox, speak, synthesize

# One-shot — model loads on first call, reused for subsequent calls.
speak("Hello world")

# Reusable engine.
tts = Stackvox(voice="af_bella")
tts.speak("First line")
tts.speak("Faster", speed=1.2)

# Non-blocking playback.
tts.speak("async", blocking=False)
tts.stop()

# Raw samples for custom processing.
samples, sr = tts.synthesize("give me the array")

# Gapless multi-line playback with concurrent synthesis.
tts.speak_sequence([
    {"text": "Hello", "voice": "af_heart", "lang": "en-us"},
    {"text": "Bonjour", "voice": "ff_siwis", "lang": "fr-fr"},
])

Daemon client from Python

from stackvox import daemon

ok, resp = daemon.say("queue this via the running daemon")
if daemon.is_running():
    daemon.stop()

Voices

Kokoro ships voices across several languages. Voice prefix encodes gender + language:

Prefix	Language	Example
`af_`, `am_`	American English	`af_heart`, `am_michael`
`bf_`, `bm_`	British English	`bf_emma`, `bm_fable`
`ff_*`	French	`ff_siwis`
`hf_`, `hm_`	Hindi	`hf_alpha`, `hm_omega`
`if_`, `im_`	Italian	`if_sara`, `im_nicola`
`pf_`, `pm_`	Portuguese	`pf_dora`, `pm_alex`
`ef_`, `em_`	Spanish	`ef_dora`, `em_alex`
`jf_`, `jm_`	Japanese	`jf_alpha`
`zf_`, `zm_`	Mandarin Chinese	`zf_xiaoxiao`

Run stackvox voices for the authoritative list.

Architecture

┌────────────────────┐      unix socket           ┌─────────────────────────┐
│  stackvox-say      │ ───────────────────────▶   │  stackvox daemon        │
│  (bash, ~13ms)     │   JSON line per request    │  (Python, long-lived)   │
└────────────────────┘                            │                         │
┌────────────────────┐      ~500ms (Py startup)   │  preloaded Kokoro ONNX  │
│  stackvox say      │ ───────────────────────▶   │  worker thread playback │
│  (Python client)   │                            │  → sounddevice → audio  │
└────────────────────┘                            └─────────────────────────┘
┌────────────────────┐
│  stackvox speak    │   loads model in-process, plays, exits
│  (one-shot CLI)    │
└────────────────────┘

Socket lives at ~/.cache/stackvox/daemon.sock (override with STACKVOX_SOCKET for the client, STACKVOX_CACHE_DIR for the daemon). Protocol is one line of JSON per connection: {"text":"...", "voice":"...", "speed":1.0, "lang":"en-us"}; reply is ok / busy / err: <msg>. Plain text (no JSON) is accepted as a fallback and treated as {"text": line}.

Queue depth is 2 — rapid-fire requests beyond that get busy rather than piling up.

Before each utterance the daemon resets PortAudio so it picks up the current system default output device. Swap from speakers to Bluetooth headphones mid-session and the next say follows you — no daemon restart needed. The refresh costs ~10–50ms per play, which is invisible next to synthesis time.

Requirements

Python 3.10+
macOS or Linux
nc (BSD netcat — default on macOS, netcat-openbsd on Linux) for the bash helper

Security considerations

stackvox doesn't open any network port. The daemon binds a unix socket under ~/.cache/stackvox/ (default file-mode 0600, i.e. user-only per the OS defaults for files in $HOME). Any process running as the same local user can send text to the daemon — there's no per-message authentication on the socket itself. That's the trust boundary: stackvox assumes anything running as your UID is allowed to speak on your behalf.

If you're exposing stackvox through a different surface (HTTP server, shared system service, container), authentication and rate-limiting are your responsibility at that layer.

Model weights (kokoro-v1.0.onnx, ~340 MB) and voices are downloaded from the kokoro-onnx GitHub release assets on first use and cached under ~/.cache/stackvox/. If you operate in a restricted environment, pre-seed that directory offline.

Security issues themselves should not be filed as public GitHub issues — see SECURITY.md for the disclosure process.

License & attributions

stackvox itself is licensed under the Apache License, Version 2.0 — see LICENSE. Third-party attributions are collected in NOTICE; the summary below is informational.

Model. Speech is generated by Kokoro-82M (© hexgrad, Apache 2.0). The ONNX-converted weights (kokoro-v1.0.onnx) and voice pack (voices-v1.0.bin) are downloaded from the kokoro-onnx release assets on first use and cached under ~/.cache/stackvox/. stackvox does not modify or redistribute them.

Runtime dependencies. kokoro-onnx (MIT, © thewh1teagle), onnxruntime (MIT, © Microsoft), sounddevice (MIT, © Matthias Geier), soundfile (BSD-3, © Bastian Bechtold), numpy (BSD-3).

GPL note. kokoro-onnx pulls in phonemizer-fork as a transitive runtime dependency; it is licensed under GPL-3.0. stackvox does not bundle, modify, or statically link it — pip installs it alongside stackvox and the two communicate through phonemizer's published Python API at runtime. If you redistribute a combined work (e.g. a frozen binary, container image, or vendored wheel set) that includes phonemizer-fork, review GPL-3.0 obligations for that distribution.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

StuBehan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Apr 30, 2026

0.3.1

Apr 30, 2026

0.3.0

Apr 29, 2026

This version

0.2.1

Apr 22, 2026

0.2.0

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stackvox-0.2.1.tar.gz (23.3 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stackvox-0.2.1-py3-none-any.whl (19.6 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file stackvox-0.2.1.tar.gz.

File metadata

Download URL: stackvox-0.2.1.tar.gz
Upload date: Apr 22, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for stackvox-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`99a3287276c6cb4871a13c087f0ba99437739d00d4bb9d00a4c3245377a47eb8`
MD5	`f3f1932b1b3c7aca1d80d36cbcab8513`
BLAKE2b-256	`2d0731bb480913d767af11d2daf9e3d23a5cf01fa5cad2ab3ca9079daabe5b16`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stackvox-0.2.1.tar.gz:

Publisher: release-please.yml on StackOneHQ/stackvox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stackvox-0.2.1.tar.gz
- Subject digest: 99a3287276c6cb4871a13c087f0ba99437739d00d4bb9d00a4c3245377a47eb8
- Sigstore transparency entry: 1359960033
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: StackOneHQ/stackvox@584b1bc6f4612bbde840a89f65a1c9c0694f44ef
- Branch / Tag: refs/heads/main
- Owner: https://github.com/StackOneHQ
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@584b1bc6f4612bbde840a89f65a1c9c0694f44ef
- Trigger Event: push

File details

Details for the file stackvox-0.2.1-py3-none-any.whl.

File metadata

Download URL: stackvox-0.2.1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for stackvox-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d5ae5d344df44226754bfbc338b78754df51500669dba7cab9cb6544501e8d2`
MD5	`a2938d0ec8f7d8870c2845ad0cf3d32d`
BLAKE2b-256	`9bb46e8a8c893da9a905158624bda5423f67fff7d03ce9a1dc0736d1ce64a03c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stackvox-0.2.1-py3-none-any.whl:

Publisher: release-please.yml on StackOneHQ/stackvox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stackvox-0.2.1-py3-none-any.whl
- Subject digest: 3d5ae5d344df44226754bfbc338b78754df51500669dba7cab9cb6544501e8d2
- Sigstore transparency entry: 1359960035
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: StackOneHQ/stackvox@584b1bc6f4612bbde840a89f65a1c9c0694f44ef
- Branch / Tag: refs/heads/main
- Owner: https://github.com/StackOneHQ
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@584b1bc6f4612bbde840a89f65a1c9c0694f44ef
- Trigger Event: push

stackvox 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

stackvox

Install

CLI

stackvox-say (bash helper, ~13ms)

Python library

Daemon client from Python

Voices

Architecture

Requirements

Security considerations

License & attributions

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`stackvox-say` (bash helper, ~13ms)