Unified local TTS CLI โ kitten | kokoro | piper | coqui | pocket
Project description
๐ marmalade-tts
A unified command-line interface for local text-to-speech synthesis. Supports multiple engines with a single consistent interface โ daemon mode for fast synthesis, per-engine text preprocessing, and optional audio effects via sox.
Hear it
A short demo and a few effect samples (download and play with paplay,
aplay, or any audio app):
| Sample | What it is |
|---|---|
demos/tahlia-voice-sample/tahlia-intro.wav |
Capability-demo clip generated to show off marmalade-tts |
samples/effects/baseline-F.wav |
Kitten voice, no effects (reference) |
samples/effects/cave-01-F.wav |
--effect cave (heavy reverb + echo) |
samples/effects/robot-01-F.wav |
--effect robot (overdrive + pitch + reverb) |
samples/effects/chipmunk-01-F.wav |
--effect chipmunk (pitch up + faster) |
samples/effects/deep-01-F.wav |
--effect deep (pitch down + bass) |
samples/effects/alien-01-classic-F.wav |
Custom alien chain |
samples/effects/ghost-02-echo-F.wav |
Custom ghost chain |
See samples/effects/README.md for the exact
commands used to generate each one.
Installation
pipx (recommended for most users)
pipx install marmalade-tts
marmalade-tts init
deb / rpm (system-wide install)
Download the latest .deb or .rpm from the GitHub releases page, then:
# Debian/Ubuntu
sudo dpkg -i marmalade-tts_0.4.4_amd64.deb
# Fedora/RHEL
sudo rpm -i marmalade-tts-0.4.4-1.x86_64.rpm
AUR (Arch Linux) โ coming soon
yay -S marmalade-tts
# or: paru -S marmalade-tts
The packaging/aur/PKGBUILD is in the repo and Arch users can build
from a clone (makepkg -si). Submission to the official AUR is on the
roadmap.
Manual (git clone)
git clone https://github.com/maxwhipw/marmalade-tts
cd marmalade-tts
./install.sh
marmalade-tts init
See INSTALL.md for per-engine dependencies (pip packages, models).
Engines
| Engine | What it is | Daemon mode |
|---|---|---|
| kitten | Fast lightweight neural TTS (default) | โ enabled by default |
| kokoro | High-quality multilingual neural TTS | optional |
| piper | Offline neural TTS, many voices | optional |
| coqui | Open-source neural TTS toolkit | optional |
| CPU-only 100M-param TTS with voice cloning | n/a (loads in ~200 ms) |
Install the engines you want โ marmalade-tts works with whichever are present. (There's no need to install all five โ even just one engine is enough to be useful.)
Quick Start
# Interactive setup (arrow keys to pick engines, voices, model sizes)
marmalade-tts init
# Non-interactive setup (for AI agents / scripts)
marmalade-tts init --non-interactive --engines kitten,piper
marmalade-tts init --non-interactive --engines kitten --set kitten.model_size=nano
marmalade-tts init --non-interactive --engines kitten,kokoro \
--set kokoro.voice=am_adam --default-engine kokoro --test
# Speak with the default engine
marmalade-tts "Hello world"
# Specify an engine
marmalade-tts kokoro "Hello world"
marmalade-tts kitten "Hello world"
# Read from a file
marmalade-tts @script.txt
# Save to a file instead of playing
marmalade-tts "Hello" --out hello.wav
# Speed up or slow down
marmalade-tts "Hello" --speed 1.4
# Choose a voice (positional voice works for engines whose names look like
# identifiers โ kitten, kokoro, pocket. Use --voice for path-shaped voices
# like piper's .onnx files and coqui's tts_models/... specs.)
marmalade-tts kokoro george "Hello"
marmalade-tts kitten Bella "Hello"
marmalade-tts piper --voice ~/voices/en_US-lessac-medium.onnx "Hello"
Engines & Voices
kokoro
marmalade-tts kokoro "Hello"
marmalade-tts kokoro george "Hello" # British male, positional
marmalade-tts kokoro nicole "Hello" # American female
marmalade-tts kokoro alpha "Hello" --lang a # Japanese voice, English accent
marmalade-tts kokoro --list # show all voices
Voices are referred to by their bare name (e.g. george):
| Language | Voices |
|---|---|
| American English | heart, bella, nicole, adam, michael |
| British English | emma, isabella, george, lewis |
| Japanese | alpha, gongitsune, kumo |
| Mandarin | xiaobei, yunjian |
Each voice has a natural language but kokoro can speak any voice in any
supported language โ pass --lang a/b/j/z (or set engines.kokoro.lang in
config) to override. Useful for accent effects.
The canonical upstream form (bm_george, af_heart, etc.) is also
accepted everywhere for back-compat.
kitten
marmalade-tts kitten "Hello"
marmalade-tts kitten Kiki "Hello from Kiki" # specify voice inline
marmalade-tts kitten --list # show all voices
marmalade-tts kitten --fast "Quick response" # nano model
marmalade-tts kitten --quality "Important message" # mini model
piper
marmalade-tts piper "Hello"
marmalade-tts piper --voice ~/voices/en_US-lessac-medium.onnx "Hello"
marmalade-tts piper "Hello" --speaker 2 # multi-speaker models
coqui
marmalade-tts coqui "Hello"
marmalade-tts coqui "Hello" --voice tts_models/en/ljspeech/tacotron2-DDC
marmalade-tts coqui --list
marmalade-tts pocket "Hello"
marmalade-tts pocket alba "Hello from alba"
marmalade-tts pocket --list # show all built-in voices
marmalade-tts pocket my_recording.wav "Cloned!" # voice cloning from any .wav
Built-in voices: alba, marius, javert, jean, fantine, cosette,
eponine, azelma.
For faster cloning, pre-export the speaker embedding to .safetensors:
pocket-tts export-voice friend.wav --out friend.safetensors
marmalade-tts pocket friend.safetensors "Hi!"
Note on voice cloning: Pocket TTS can clone any voice from a short WAV sample. Only clone voices you have explicit, informed consent to clone. Cloning a real person's voice without permission โ to deceive, impersonate, harass, or misrepresent them โ is harmful and in many jurisdictions illegal. The built-in voices are fine for any use.
Speed Presets
Choose a quality/speed tradeoff that picks the appropriate model variant:
marmalade-tts --fast "Hello" # fastest, smallest model
marmalade-tts --balanced "Hello" # balanced (default)
marmalade-tts --quality "Hello" # best quality
Text Preprocessing
marmalade-tts normalises text before synthesis so engines hear readable English instead of symbols. This is on by default and tuned per-engine.
# These are handled automatically:
marmalade-tts "$42.50 is 15% off"
# โ "forty-two dollars and fifty cents is fifteen percent off"
marmalade-tts "See https://example.com for details"
# โ "See example dot com for details"
marmalade-tts "The 3rd place finisher at 9:30am"
# โ "The third place finisher at nine thirty a m"
# Turn it off if you've already formatted your text:
marmalade-tts --no-preprocessing "forty two dollars"
# See all available preprocessing rules:
marmalade-tts --list-rules
Per-engine preprocessing config
You can set per-engine rule lists in ~/.config/marmalade-tts/config.yaml:
engines:
kokoro:
preprocessing: [currency, percent, ordinal, time, url]
piper:
preprocessing: true # all rules (default)
kitten:
preprocessing: false # disable entirely
Audio Effects
Effects are applied after synthesis using sox. If sox is not installed, effects are silently skipped with a note โ the speech is still generated.
# Install sox (required for effects):
apt install sox # Debian/Ubuntu
brew install sox # macOS
# Apply a single effect
marmalade-tts "Hello" --effect reverb=50
marmalade-tts "Hello" --effect pitch=200 # shift up 2 semitones
marmalade-tts "Hello" --effect pitch=-300 # shift down 3 semitones
# Chain multiple effects
marmalade-tts "Hello" --effect pitch=200 --effect reverb=30
# Use a built-in preset
marmalade-tts "Hello" --effect robot
marmalade-tts "Hello" --effect cave
marmalade-tts "Hello" --effect telephone
# See all effects and presets
marmalade-tts --list-effects
Built-in effect presets
| Preset | Effects applied |
|---|---|
robot |
overdrive + deep pitch shift + reverb |
cave |
heavy reverb + echo |
chipmunk |
pitch up + slightly faster |
deep |
pitch down + bass boost |
telephone |
bandpass filter + overdrive |
whisper |
quieter + treble boost + reverb |
stadium |
heavy reverb + echo |
megaphone |
bandpass + heavy overdrive + volume boost |
slow_deep |
pitch down + slower tempo |
fast_high |
pitch up + faster tempo |
Available effects
| Effect | Parameter | Example |
|---|---|---|
reverb |
amount 0โ100 (default 50) | reverb=30 |
pitch |
cents (100 = 1 semitone) | pitch=200 or pitch=-400 |
tempo |
speed factor, no pitch change | tempo=0.8 |
echo |
gain-in:gain-out:delay-ms:decay | echo=0.8:0.88:60:0.4 |
overdrive |
gain 1โ100 | overdrive=20 |
flanger |
(none) | flanger |
chorus |
(none, or 6-part custom) | chorus |
treble |
dB boost/cut | treble=6 |
bass |
dB boost/cut | bass=4 |
bandpass |
low-hz:high-hz | bandpass=300:3400 |
speed |
factor (pitch shifts too) | speed=1.2 |
vol |
volume multiplier | vol=2.0 |
normalize |
(none) | normalize |
fade |
in-seconds:out-seconds | fade=0.1:0.5 |
Default effects per engine
You can set default effects that apply automatically for a given engine, without
needing --effect every time. CLI --effect flags override the engine default
entirely.
# ~/.config/marmalade-tts/config.yaml
effects:
defaults:
kitten: ["reverb=20"] # subtle warmth on kitten by default
kokoro: [] # no default effects (explicit empty = off)
piper: []
coqui: []
# Define your own named presets:
presets:
warm: ["reverb=25", "bass=3"]
dramatic: ["reverb=70", "echo=0.8:0.6:80:0.3"]
broadcast: ["bandpass=80:15000", "normalize"]
Daemon Mode
Daemon mode keeps the engine model loaded in RAM so the first synthesis request is instant instead of waiting for model load.
# Start / stop individual daemons
marmalade-tts daemon start --engine kitten
marmalade-tts daemon stop --engine kitten
# Start all configured daemons
marmalade-tts daemon start-all
# Check what's running
marmalade-tts daemon status
Enable daemon mode per-engine in config:
engines:
kitten:
daemon: true # start automatically on first use
kokoro:
daemon: false
Or use systemd to keep the daemon alive across reboots:
systemctl --user enable marmalade-kitten
systemctl --user start marmalade-kitten
Configuration
marmalade-tts init
The setup wizard configures engines, voices, and defaults. Run it again at any time to change your setup.
Interactive mode (default when stdin is a TTY):
marmalade-tts init
Uses arrow keys + space to multi-select engines, then walks through per-engine options (model size, voice, etc.).
Non-interactive mode (for AI agents, scripts, CI):
marmalade-tts init --non-interactive --engines kitten,piper
marmalade-tts init --non-interactive --engines kitten --set kitten.model_size=nano
marmalade-tts init --non-interactive --engines kitten,kokoro \
--set kokoro.voice=am_adam --default-engine kokoro --test
Flags:
--non-interactiveโ skip TUI prompts (auto-enabled when stdin is not a TTY)--engines LISTโ comma-separated engines to enable--set ENGINE.KEY=VALUEโ override engine options (repeatable)--default-engine NAMEโ set the default engine--testโ run a test synthesis after setup
Manual config
Config is stored at ~/.config/marmalade-tts/config.yaml.
A default config is written on first run.
# View current config
marmalade-tts config show
# Get a value
marmalade-tts config get defaults.engine
# Set a value
marmalade-tts config set defaults.engine kitten
marmalade-tts config set defaults.speed 1.2
marmalade-tts config set defaults.play false
Value coercion rules (predictable so AI agents don't get surprised):
true/false(any case) โ boolnull/~/ empty โ None- Integer-looking strings โ int
- Float-looking strings โ float
- Everything else โ string, verbatim
yes / no / on / off are kept as strings, not coerced to bools.
This is intentional โ YAML 1.1's "Norway problem" silently turning the
word "yes" into a boolean is a common footgun.
Full config reference
defaults:
engine: kitten # default engine when none is specified
device: cpu # cpu or cuda
speed: 1.0 # speech speed multiplier
play: true # play audio automatically (false = save only)
preprocessing: true # normalize text before synthesis
presets:
fast:
kitten: nano
kokoro: heart
piper: en_US-lessac-medium
coqui: tts_models/en/ljspeech/tacotron2-DDC
pocket: alba
balanced:
# ...same structure...
quality:
# ...same structure...
engines:
kokoro:
device: cpu
voice: heart # bare name (or canonical "af_heart" for back-compat)
# lang: a # optional โ defaults to the voice's natural language
daemon: false
# preprocessing: [currency, percent] # or true / false
kitten:
device: cpu
model_size: micro # nano / micro / mini
voice: Kiki
daemon: true
piper:
device: cpu
model: ~/.local/share/piper/voices/en_US-lessac-medium.onnx
daemon: false
coqui:
device: cpu
model: tts_models/en/ljspeech/tacotron2-DDC
daemon: false
pocket:
device: cpu
voice: alba # built-in voice, or path to .wav / .safetensors
# No daemon needed โ Pocket TTS loads fast (~200ms)
effects:
defaults:
kitten: []
kokoro: []
piper: []
coqui: []
presets:
warm: ["reverb=25", "bass=3"]
Shell Completion
# bash
eval "$(marmalade-tts --completion bash)"
# zsh
eval "$(marmalade-tts --completion zsh)"
# Or add to your shell rc:
echo 'eval "$(marmalade-tts --completion bash)"' >> ~/.bashrc
KDE Global Hotkeys (speak selected text)
The scripts/ directory contains ready-to-use helpers for binding speech
to keyboard shortcuts in KDE.
Install the scripts:
cp scripts/speak-selection scripts/speak-clipboard scripts/marmalade-pipe ~/.local/bin/
chmod +x ~/.local/bin/speak-selection ~/.local/bin/speak-clipboard ~/.local/bin/marmalade-pipe
Dependencies (pick one per display server):
sudo apt install xclip # X11
sudo apt install wl-clipboard # Wayland
Bind in KDE:
- System Settings โ Shortcuts โ Custom Shortcuts
- New โ Script/Command
- Set the trigger (e.g.
Meta+Shift+S) and the action path
| Script | What it speaks | Suggested shortcut |
|---|---|---|
speak-selection |
Highlighted text (primary selection) | Meta+Shift+S |
speak-clipboard |
Last copied text (Ctrl+C) | Meta+Shift+C |
See scripts/SCRIPTS.md for full details.
Scripting & Agent Use
marmalade-tts is designed to be used from scripts, agents, and pipelines.
# Read from stdin
echo "Hello world" | marmalade-tts --stdin --no-play --out hello.wav
echo "Hello world" | marmalade-pipe --out hello.wav # convenience wrapper
# Suppress all status output (exit code only)
marmalade-tts --quiet "Hello"
# Print only the output WAV path to stdout
WAV=$(marmalade-tts --print-path --no-play "Hello")
aplay "$WAV"
# JSON result for structured consumption
marmalade-tts --json --no-play "Hello"
# โ {"ok": true, "version": "0.4.4", "engine": "kitten", "voice": "Kiki",
# "out": "/tmp/...", "effects": [], "text": "Hello"}
# Never play back, just generate
marmalade-tts --no-play --out result.wav "Generate but don't play"
# Skip engine-default effects from config (e.g. for a dry signal)
marmalade-tts --no-effects "Hello"
# Combine flags for maximum scriptability
cat script.txt | marmalade-tts --stdin --quiet --json --no-play --out speech.wav
Exit codes:
0โ success- non-zero โ failure. Specific codes are not promised; expect
1for user-visible errors and2from argparse for bad flags.
Text Input Methods
# Literal text
marmalade-tts "Hello world"
# From a file (@ prefix)
marmalade-tts @speech.txt
# From stdin
echo "Hello world" | marmalade-tts -
# Combine with --out to save a file
marmalade-tts @script.txt --out script.wav
Requirements
- OS: Linux (primary target, tested on Ubuntu 24.04). macOS untested but
most engines (
piper,kokoro,pocket,coqui) should work. Windows is not supported. - Python: 3.10 or newer.
- CPU-only by default. All engines run on CPU; no GPU needed. Optional CUDA acceleration for kokoro/coqui on supported NVIDIA cards.
- RAM: ~200 MB for kitten/pocket, ~1.5 GB for kokoro daemon, varies for coqui depending on model.
- Disk (models, downloaded on first use):
- Kitten: 23โ80 MB (nano/micro/mini)
- Piper voices: 15โ75 MB each
- Pocket: ~200 MB
- Kokoro: ~500 MB
- Coqui: 200 MB โ 2 GB depending on model
- Audio playback: one of
paplay,aplay, orffplay(already present on most Linux desktops). - Optional:
soxfor audio effects,xclip/wl-clipboardfor the KDE selection scripts.
The CLI wrapper itself (pipx install marmalade-tts) is tiny โ engines live
in their own venvs to keep their dependencies isolated. marmalade-tts init
walks you through installing whichever engines you want.
Contributing
Want to add a new TTS engine? See ENGINE-GUIDE.md for a step-by-step walkthrough of every file that needs to be touched.
Engines are first-class citizens in this repo. There is no plugin / entry-point mechanism for external engines โ adding an engine is a PR, not a third-party install. Each engine addition is treated as a feature and ships in the next minor version bump.
Stability & versioning
marmalade-tts is currently in beta (0.4.x). The CLI surface,
config schema, and JSON output are usable today and the project tries
hard not to break working commands, but small changes between minor
versions are still possible until v1.0.0. From 1.0.0 onward this
project follows Semantic Versioning:
- Patch (
1.0.x) โ bug fixes only, no surface changes. - Minor (
1.x.0) โ new engines, new flags, new config keys. Backwards compatible. - Major (
x.0.0) โ breaking changes to CLI surface, config keys, or JSON output. Avoided where possible; called out clearly in the changelog when needed.
If you're scripting against marmalade-tts today, expect the surfaces documented in this README to be stable. Anything not documented here (help-text wording, init wizard formatting, internal subprocess invocation, daemon socket protocol) may evolve without notice.
Roadmap
Ideas under consideration. No promises on timing โ feedback and PRs welcome.
Language detection
Auto-detect the input text's language and route to an appropriate
engine / voice / model โ e.g. Japanese text routes to a kokoro Japanese
voice, Mandarin to a kokoro Mandarin voice, the rest stay on the
configured default. Per-language defaults configurable in config.yaml.
Emoji-driven emotional prosody
Treat emojis as inline prosody directives โ e.g. "Hello ๐" reads
warm, "Hello ๐ข" reads sad, "Hello! โก" reads energetic. Requires
upstream model support for emotion conditioning that runs close to
real-time on consumer hardware (CPU or modest GPU), with a FOSS
licence. Will track FOSS expressive-TTS research and integrate when the
stack exists.
Credits & Acknowledgements
marmalade-tts is a unified wrapper โ the real work is done by these engines:
- Piper โ ONNX neural TTS by Michael Hansen / Rhasspy (MIT)
- Kokoro โ high-quality multilingual TTS by Hexgrad (Apache 2.0)
- KittenTTS โ fast lightweight neural TTS by KittenML (Apache 2.0)
- Coqui TTS โ open-source TTS toolkit by Coqui AI (MPL 2.0)
- Pocket TTS โ CPU-only 100M param TTS with voice cloning by Kyutai Labs (MIT)
- sox โ audio effects processing (GPL)
- num2words โ number-to-words conversion (LGPL)
The Docker HTTP API server implements endpoints compatible with the OpenAI TTS API and ElevenLabs TTS API interfaces. While we use their API interface for compatibility, no code from either project is used โ the server is written from scratch using Python's standard library. This project is not affiliated with or endorsed by OpenAI or ElevenLabs.
License
MIT โ see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marmalade_tts-0.4.4.tar.gz.
File metadata
- Download URL: marmalade_tts-0.4.4.tar.gz
- Upload date:
- Size: 6.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d964a9b247532c13b4e1591095c961df78baf91962c51ef276b245d0c61bb2c
|
|
| MD5 |
c4cadcc743d6688f52d8191f4789a7fe
|
|
| BLAKE2b-256 |
4470d5807cbe2b1e40005523a32a6cf132ccdde05bb0dca477548c0cfddbd808
|
Provenance
The following attestation bundles were made for marmalade_tts-0.4.4.tar.gz:
Publisher:
publish.yml on maxwhipw/marmalade-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marmalade_tts-0.4.4.tar.gz -
Subject digest:
0d964a9b247532c13b4e1591095c961df78baf91962c51ef276b245d0c61bb2c - Sigstore transparency entry: 1530191101
- Sigstore integration time:
-
Permalink:
maxwhipw/marmalade-tts@aadefbf300e14e2c5fa80e861d315bb6b8b66e51 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/maxwhipw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aadefbf300e14e2c5fa80e861d315bb6b8b66e51 -
Trigger Event:
push
-
Statement type:
File details
Details for the file marmalade_tts-0.4.4-py3-none-any.whl.
File metadata
- Download URL: marmalade_tts-0.4.4-py3-none-any.whl
- Upload date:
- Size: 49.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f45e3db0aa1f805c1c4c64bd8e9e901ed900d220c393dbe416beb7769baab66e
|
|
| MD5 |
cad24392838a4cb1319576a416c733a0
|
|
| BLAKE2b-256 |
52cd808c6ef560bbf2394c1d1334bc1f7cac5aa0e16674e5e5906dd1cf6cf437
|
Provenance
The following attestation bundles were made for marmalade_tts-0.4.4-py3-none-any.whl:
Publisher:
publish.yml on maxwhipw/marmalade-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marmalade_tts-0.4.4-py3-none-any.whl -
Subject digest:
f45e3db0aa1f805c1c4c64bd8e9e901ed900d220c393dbe416beb7769baab66e - Sigstore transparency entry: 1530191200
- Sigstore integration time:
-
Permalink:
maxwhipw/marmalade-tts@aadefbf300e14e2c5fa80e861d315bb6b8b66e51 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/maxwhipw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aadefbf300e14e2c5fa80e861d315bb6b8b66e51 -
Trigger Event:
push
-
Statement type: