Skip to main content

Unified local TTS CLI โ€” kitten | kokoro | piper | coqui | pocket

Project description

๐ŸŠ marmalade-tts

marmalade-tts mascot

A unified command-line interface for local text-to-speech synthesis. Supports multiple engines with a single consistent interface โ€” daemon mode for fast synthesis, per-engine text preprocessing, and optional audio effects via sox.

Hear it

A short demo and a few effect samples (download and play with paplay, aplay, or any audio app):

Sample What it is
demos/tahlia-voice-sample/tahlia-intro.wav Capability-demo clip generated to show off marmalade-tts
samples/effects/baseline-F.wav Kitten voice, no effects (reference)
samples/effects/cave-01-F.wav --effect cave (heavy reverb + echo)
samples/effects/robot-01-F.wav --effect robot (overdrive + pitch + reverb)
samples/effects/chipmunk-01-F.wav --effect chipmunk (pitch up + faster)
samples/effects/deep-01-F.wav --effect deep (pitch down + bass)
samples/effects/alien-01-classic-F.wav Custom alien chain
samples/effects/ghost-02-echo-F.wav Custom ghost chain

See samples/effects/README.md for the exact commands used to generate each one.


Installation

pipx (recommended for most users)

pipx install marmalade-tts
marmalade-tts init

deb / rpm (system-wide install)

Download the latest .deb or .rpm from the GitHub releases page, then:

# Debian/Ubuntu
sudo dpkg -i marmalade-tts_0.4.4_amd64.deb

# Fedora/RHEL
sudo rpm -i marmalade-tts-0.4.4-1.x86_64.rpm

AUR (Arch Linux) โ€” coming soon

yay -S marmalade-tts
# or: paru -S marmalade-tts

The packaging/aur/PKGBUILD is in the repo and Arch users can build from a clone (makepkg -si). Submission to the official AUR is on the roadmap.

Manual (git clone)

git clone https://github.com/maxwhipw/marmalade-tts
cd marmalade-tts
./install.sh
marmalade-tts init

See INSTALL.md for per-engine dependencies (pip packages, models).


Engines

Engine What it is Daemon mode
kitten Fast lightweight neural TTS (default) โœ” enabled by default
kokoro High-quality multilingual neural TTS optional
piper Offline neural TTS, many voices optional
coqui Open-source neural TTS toolkit optional
pocket CPU-only 100M-param TTS with voice cloning n/a (loads in ~200 ms)

Install the engines you want โ€” marmalade-tts works with whichever are present. (There's no need to install all five โ€” even just one engine is enough to be useful.)


Quick Start

# Interactive setup (arrow keys to pick engines, voices, model sizes)
marmalade-tts init

# Non-interactive setup (for AI agents / scripts)
marmalade-tts init --non-interactive --engines kitten,piper
marmalade-tts init --non-interactive --engines kitten --set kitten.model_size=nano
marmalade-tts init --non-interactive --engines kitten,kokoro \
  --set kokoro.voice=am_adam --default-engine kokoro --test

# Speak with the default engine
marmalade-tts "Hello world"

# Specify an engine
marmalade-tts kokoro "Hello world"
marmalade-tts kitten "Hello world"

# Read from a file
marmalade-tts @script.txt

# Save to a file instead of playing
marmalade-tts "Hello" --out hello.wav

# Speed up or slow down
marmalade-tts "Hello" --speed 1.4

# Choose a voice (positional voice works for engines whose names look like
# identifiers โ€” kitten, kokoro, pocket. Use --voice for path-shaped voices
# like piper's .onnx files and coqui's tts_models/... specs.)
marmalade-tts kokoro george "Hello"
marmalade-tts kitten Bella "Hello"
marmalade-tts piper --voice ~/voices/en_US-lessac-medium.onnx "Hello"

Engines & Voices

kokoro

marmalade-tts kokoro "Hello"
marmalade-tts kokoro george "Hello"               # British male, positional
marmalade-tts kokoro nicole "Hello"               # American female
marmalade-tts kokoro alpha "Hello" --lang a       # Japanese voice, English accent
marmalade-tts kokoro --list                       # show all voices

Voices are referred to by their bare name (e.g. george):

Language Voices
American English heart, bella, nicole, adam, michael
British English emma, isabella, george, lewis
Japanese alpha, gongitsune, kumo
Mandarin xiaobei, yunjian

Each voice has a natural language but kokoro can speak any voice in any supported language โ€” pass --lang a/b/j/z (or set engines.kokoro.lang in config) to override. Useful for accent effects.

The canonical upstream form (bm_george, af_heart, etc.) is also accepted everywhere for back-compat.

kitten

marmalade-tts kitten "Hello"
marmalade-tts kitten Kiki "Hello from Kiki"       # specify voice inline
marmalade-tts kitten --list                        # show all voices
marmalade-tts kitten --fast "Quick response"       # nano model
marmalade-tts kitten --quality "Important message" # mini model

piper

marmalade-tts piper "Hello"
marmalade-tts piper --voice ~/voices/en_US-lessac-medium.onnx "Hello"
marmalade-tts piper "Hello" --speaker 2           # multi-speaker models

coqui

marmalade-tts coqui "Hello"
marmalade-tts coqui "Hello" --voice tts_models/en/ljspeech/tacotron2-DDC
marmalade-tts coqui --list

pocket

marmalade-tts pocket "Hello"
marmalade-tts pocket alba "Hello from alba"
marmalade-tts pocket --list                       # show all built-in voices
marmalade-tts pocket my_recording.wav "Cloned!"   # voice cloning from any .wav

Built-in voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma.

For faster cloning, pre-export the speaker embedding to .safetensors:

pocket-tts export-voice friend.wav --out friend.safetensors
marmalade-tts pocket friend.safetensors "Hi!"

Note on voice cloning: Pocket TTS can clone any voice from a short WAV sample. Only clone voices you have explicit, informed consent to clone. Cloning a real person's voice without permission โ€” to deceive, impersonate, harass, or misrepresent them โ€” is harmful and in many jurisdictions illegal. The built-in voices are fine for any use.


Speed Presets

Choose a quality/speed tradeoff that picks the appropriate model variant:

marmalade-tts --fast "Hello"       # fastest, smallest model
marmalade-tts --balanced "Hello"   # balanced (default)
marmalade-tts --quality "Hello"    # best quality

Text Preprocessing

marmalade-tts normalises text before synthesis so engines hear readable English instead of symbols. This is on by default and tuned per-engine.

# These are handled automatically:
marmalade-tts "$42.50 is 15% off"
# โ†’ "forty-two dollars and fifty cents is fifteen percent off"

marmalade-tts "See https://example.com for details"
# โ†’ "See example dot com for details"

marmalade-tts "The 3rd place finisher at 9:30am"
# โ†’ "The third place finisher at nine thirty a m"

# Turn it off if you've already formatted your text:
marmalade-tts --no-preprocessing "forty two dollars"

# See all available preprocessing rules:
marmalade-tts --list-rules

Per-engine preprocessing config

You can set per-engine rule lists in ~/.config/marmalade-tts/config.yaml:

engines:
  kokoro:
    preprocessing: [currency, percent, ordinal, time, url]
  piper:
    preprocessing: true    # all rules (default)
  kitten:
    preprocessing: false   # disable entirely

Audio Effects

Effects are applied after synthesis using sox. If sox is not installed, effects are silently skipped with a note โ€” the speech is still generated.

# Install sox (required for effects):
apt install sox          # Debian/Ubuntu
brew install sox         # macOS

# Apply a single effect
marmalade-tts "Hello" --effect reverb=50
marmalade-tts "Hello" --effect pitch=200    # shift up 2 semitones
marmalade-tts "Hello" --effect pitch=-300   # shift down 3 semitones

# Chain multiple effects
marmalade-tts "Hello" --effect pitch=200 --effect reverb=30

# Use a built-in preset
marmalade-tts "Hello" --effect robot
marmalade-tts "Hello" --effect cave
marmalade-tts "Hello" --effect telephone

# See all effects and presets
marmalade-tts --list-effects

Built-in effect presets

Preset Effects applied
robot overdrive + deep pitch shift + reverb
cave heavy reverb + echo
chipmunk pitch up + slightly faster
deep pitch down + bass boost
telephone bandpass filter + overdrive
whisper quieter + treble boost + reverb
stadium heavy reverb + echo
megaphone bandpass + heavy overdrive + volume boost
slow_deep pitch down + slower tempo
fast_high pitch up + faster tempo

Available effects

Effect Parameter Example
reverb amount 0โ€“100 (default 50) reverb=30
pitch cents (100 = 1 semitone) pitch=200 or pitch=-400
tempo speed factor, no pitch change tempo=0.8
echo gain-in:gain-out:delay-ms:decay echo=0.8:0.88:60:0.4
overdrive gain 1โ€“100 overdrive=20
flanger (none) flanger
chorus (none, or 6-part custom) chorus
treble dB boost/cut treble=6
bass dB boost/cut bass=4
bandpass low-hz:high-hz bandpass=300:3400
speed factor (pitch shifts too) speed=1.2
vol volume multiplier vol=2.0
normalize (none) normalize
fade in-seconds:out-seconds fade=0.1:0.5

Default effects per engine

You can set default effects that apply automatically for a given engine, without needing --effect every time. CLI --effect flags override the engine default entirely.

# ~/.config/marmalade-tts/config.yaml
effects:
  defaults:
    kitten: ["reverb=20"]       # subtle warmth on kitten by default
    kokoro: []                  # no default effects (explicit empty = off)
    piper:  []
    coqui:  []

  # Define your own named presets:
  presets:
    warm:      ["reverb=25", "bass=3"]
    dramatic:  ["reverb=70", "echo=0.8:0.6:80:0.3"]
    broadcast: ["bandpass=80:15000", "normalize"]

Daemon Mode

Daemon mode keeps the engine model loaded in RAM so the first synthesis request is instant instead of waiting for model load.

# Start / stop individual daemons
marmalade-tts daemon start --engine kitten
marmalade-tts daemon stop --engine kitten

# Start all configured daemons
marmalade-tts daemon start-all

# Check what's running
marmalade-tts daemon status

Enable daemon mode per-engine in config:

engines:
  kitten:
    daemon: true    # start automatically on first use
  kokoro:
    daemon: false

Or use systemd to keep the daemon alive across reboots:

systemctl --user enable marmalade-kitten
systemctl --user start  marmalade-kitten

Configuration

marmalade-tts init

The setup wizard configures engines, voices, and defaults. Run it again at any time to change your setup.

Interactive mode (default when stdin is a TTY):

marmalade-tts init

Uses arrow keys + space to multi-select engines, then walks through per-engine options (model size, voice, etc.).

Non-interactive mode (for AI agents, scripts, CI):

marmalade-tts init --non-interactive --engines kitten,piper
marmalade-tts init --non-interactive --engines kitten --set kitten.model_size=nano
marmalade-tts init --non-interactive --engines kitten,kokoro \
  --set kokoro.voice=am_adam --default-engine kokoro --test

Flags:

  • --non-interactive โ€” skip TUI prompts (auto-enabled when stdin is not a TTY)
  • --engines LIST โ€” comma-separated engines to enable
  • --set ENGINE.KEY=VALUE โ€” override engine options (repeatable)
  • --default-engine NAME โ€” set the default engine
  • --test โ€” run a test synthesis after setup

Manual config

Config is stored at ~/.config/marmalade-tts/config.yaml. A default config is written on first run.

# View current config
marmalade-tts config show

# Get a value
marmalade-tts config get defaults.engine

# Set a value
marmalade-tts config set defaults.engine kitten
marmalade-tts config set defaults.speed 1.2
marmalade-tts config set defaults.play false

Value coercion rules (predictable so AI agents don't get surprised):

  • true / false (any case) โ†’ bool
  • null / ~ / empty โ†’ None
  • Integer-looking strings โ†’ int
  • Float-looking strings โ†’ float
  • Everything else โ†’ string, verbatim

yes / no / on / off are kept as strings, not coerced to bools. This is intentional โ€” YAML 1.1's "Norway problem" silently turning the word "yes" into a boolean is a common footgun.

Full config reference

defaults:
  engine: kitten        # default engine when none is specified
  device: cpu           # cpu or cuda
  speed: 1.0            # speech speed multiplier
  play: true            # play audio automatically (false = save only)
  preprocessing: true   # normalize text before synthesis

presets:
  fast:
    kitten: nano
    kokoro: heart
    piper: en_US-lessac-medium
    coqui: tts_models/en/ljspeech/tacotron2-DDC
    pocket: alba
  balanced:
    # ...same structure...
  quality:
    # ...same structure...

engines:
  kokoro:
    device: cpu
    voice: heart        # bare name (or canonical "af_heart" for back-compat)
    # lang: a           # optional โ€” defaults to the voice's natural language
    daemon: false
    # preprocessing: [currency, percent]   # or true / false

  kitten:
    device: cpu
    model_size: micro   # nano / micro / mini
    voice: Kiki
    daemon: true

  piper:
    device: cpu
    model: ~/.local/share/piper/voices/en_US-lessac-medium.onnx
    daemon: false

  coqui:
    device: cpu
    model: tts_models/en/ljspeech/tacotron2-DDC
    daemon: false

  pocket:
    device: cpu
    voice: alba         # built-in voice, or path to .wav / .safetensors
    # No daemon needed โ€” Pocket TTS loads fast (~200ms)

effects:
  defaults:
    kitten: []
    kokoro: []
    piper:  []
    coqui:  []
  presets:
    warm: ["reverb=25", "bass=3"]

Shell Completion

# bash
eval "$(marmalade-tts --completion bash)"

# zsh
eval "$(marmalade-tts --completion zsh)"

# Or add to your shell rc:
echo 'eval "$(marmalade-tts --completion bash)"' >> ~/.bashrc

KDE Global Hotkeys (speak selected text)

The scripts/ directory contains ready-to-use helpers for binding speech to keyboard shortcuts in KDE.

Install the scripts:

cp scripts/speak-selection scripts/speak-clipboard scripts/marmalade-pipe ~/.local/bin/
chmod +x ~/.local/bin/speak-selection ~/.local/bin/speak-clipboard ~/.local/bin/marmalade-pipe

Dependencies (pick one per display server):

sudo apt install xclip          # X11
sudo apt install wl-clipboard   # Wayland

Bind in KDE:

  1. System Settings โ†’ Shortcuts โ†’ Custom Shortcuts
  2. New โ†’ Script/Command
  3. Set the trigger (e.g. Meta+Shift+S) and the action path
Script What it speaks Suggested shortcut
speak-selection Highlighted text (primary selection) Meta+Shift+S
speak-clipboard Last copied text (Ctrl+C) Meta+Shift+C

See scripts/SCRIPTS.md for full details.


Scripting & Agent Use

marmalade-tts is designed to be used from scripts, agents, and pipelines.

# Read from stdin
echo "Hello world" | marmalade-tts --stdin --no-play --out hello.wav
echo "Hello world" | marmalade-pipe --out hello.wav   # convenience wrapper

# Suppress all status output (exit code only)
marmalade-tts --quiet "Hello"

# Print only the output WAV path to stdout
WAV=$(marmalade-tts --print-path --no-play "Hello")
aplay "$WAV"

# JSON result for structured consumption
marmalade-tts --json --no-play "Hello"
# โ†’ {"ok": true, "version": "0.4.4", "engine": "kitten", "voice": "Kiki",
#    "out": "/tmp/...", "effects": [], "text": "Hello"}

# Never play back, just generate
marmalade-tts --no-play --out result.wav "Generate but don't play"

# Skip engine-default effects from config (e.g. for a dry signal)
marmalade-tts --no-effects "Hello"

# Combine flags for maximum scriptability
cat script.txt | marmalade-tts --stdin --quiet --json --no-play --out speech.wav

Exit codes:

  • 0 โ€” success
  • non-zero โ€” failure. Specific codes are not promised; expect 1 for user-visible errors and 2 from argparse for bad flags.

Text Input Methods

# Literal text
marmalade-tts "Hello world"

# From a file (@ prefix)
marmalade-tts @speech.txt

# From stdin
echo "Hello world" | marmalade-tts -

# Combine with --out to save a file
marmalade-tts @script.txt --out script.wav

Requirements

  • OS: Linux (primary target, tested on Ubuntu 24.04). macOS untested but most engines (piper, kokoro, pocket, coqui) should work. Windows is not supported.
  • Python: 3.10 or newer.
  • CPU-only by default. All engines run on CPU; no GPU needed. Optional CUDA acceleration for kokoro/coqui on supported NVIDIA cards.
  • RAM: ~200 MB for kitten/pocket, ~1.5 GB for kokoro daemon, varies for coqui depending on model.
  • Disk (models, downloaded on first use):
    • Kitten: 23โ€“80 MB (nano/micro/mini)
    • Piper voices: 15โ€“75 MB each
    • Pocket: ~200 MB
    • Kokoro: ~500 MB
    • Coqui: 200 MB โ€“ 2 GB depending on model
  • Audio playback: one of paplay, aplay, or ffplay (already present on most Linux desktops).
  • Optional: sox for audio effects, xclip / wl-clipboard for the KDE selection scripts.

The CLI wrapper itself (pipx install marmalade-tts) is tiny โ€” engines live in their own venvs to keep their dependencies isolated. marmalade-tts init walks you through installing whichever engines you want.


Contributing

Want to add a new TTS engine? See ENGINE-GUIDE.md for a step-by-step walkthrough of every file that needs to be touched.

Engines are first-class citizens in this repo. There is no plugin / entry-point mechanism for external engines โ€” adding an engine is a PR, not a third-party install. Each engine addition is treated as a feature and ships in the next minor version bump.


Stability & versioning

marmalade-tts is currently in beta (0.4.x). The CLI surface, config schema, and JSON output are usable today and the project tries hard not to break working commands, but small changes between minor versions are still possible until v1.0.0. From 1.0.0 onward this project follows Semantic Versioning:

  • Patch (1.0.x) โ€” bug fixes only, no surface changes.
  • Minor (1.x.0) โ€” new engines, new flags, new config keys. Backwards compatible.
  • Major (x.0.0) โ€” breaking changes to CLI surface, config keys, or JSON output. Avoided where possible; called out clearly in the changelog when needed.

If you're scripting against marmalade-tts today, expect the surfaces documented in this README to be stable. Anything not documented here (help-text wording, init wizard formatting, internal subprocess invocation, daemon socket protocol) may evolve without notice.


Roadmap

Ideas under consideration. No promises on timing โ€” feedback and PRs welcome.

Language detection

Auto-detect the input text's language and route to an appropriate engine / voice / model โ€” e.g. Japanese text routes to a kokoro Japanese voice, Mandarin to a kokoro Mandarin voice, the rest stay on the configured default. Per-language defaults configurable in config.yaml.

Emoji-driven emotional prosody

Treat emojis as inline prosody directives โ€” e.g. "Hello ๐Ÿ™‚" reads warm, "Hello ๐Ÿ˜ข" reads sad, "Hello! โšก" reads energetic. Requires upstream model support for emotion conditioning that runs close to real-time on consumer hardware (CPU or modest GPU), with a FOSS licence. Will track FOSS expressive-TTS research and integrate when the stack exists.


Credits & Acknowledgements

marmalade-tts is a unified wrapper โ€” the real work is done by these engines:

  • Piper โ€” ONNX neural TTS by Michael Hansen / Rhasspy (MIT)
  • Kokoro โ€” high-quality multilingual TTS by Hexgrad (Apache 2.0)
  • KittenTTS โ€” fast lightweight neural TTS by KittenML (Apache 2.0)
  • Coqui TTS โ€” open-source TTS toolkit by Coqui AI (MPL 2.0)
  • Pocket TTS โ€” CPU-only 100M param TTS with voice cloning by Kyutai Labs (MIT)
  • sox โ€” audio effects processing (GPL)
  • num2words โ€” number-to-words conversion (LGPL)

The Docker HTTP API server implements endpoints compatible with the OpenAI TTS API and ElevenLabs TTS API interfaces. While we use their API interface for compatibility, no code from either project is used โ€” the server is written from scratch using Python's standard library. This project is not affiliated with or endorsed by OpenAI or ElevenLabs.


License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marmalade_tts-0.4.4.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marmalade_tts-0.4.4-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file marmalade_tts-0.4.4.tar.gz.

File metadata

  • Download URL: marmalade_tts-0.4.4.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marmalade_tts-0.4.4.tar.gz
Algorithm Hash digest
SHA256 0d964a9b247532c13b4e1591095c961df78baf91962c51ef276b245d0c61bb2c
MD5 c4cadcc743d6688f52d8191f4789a7fe
BLAKE2b-256 4470d5807cbe2b1e40005523a32a6cf132ccdde05bb0dca477548c0cfddbd808

See more details on using hashes here.

Provenance

The following attestation bundles were made for marmalade_tts-0.4.4.tar.gz:

Publisher: publish.yml on maxwhipw/marmalade-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marmalade_tts-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: marmalade_tts-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marmalade_tts-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f45e3db0aa1f805c1c4c64bd8e9e901ed900d220c393dbe416beb7769baab66e
MD5 cad24392838a4cb1319576a416c733a0
BLAKE2b-256 52cd808c6ef560bbf2394c1d1334bc1f7cac5aa0e16674e5e5906dd1cf6cf437

See more details on using hashes here.

Provenance

The following attestation bundles were made for marmalade_tts-0.4.4-py3-none-any.whl:

Publisher: publish.yml on maxwhipw/marmalade-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page