Skip to main content

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with advanced multi-voice blending), and various input formats including EPUB books and PDF documents.

Project description

Kokoro Desktop

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with advanced multi-voice blending), and various input formats including EPUB books and PDF documents.

ngpt-s-c

Features

  • Multiple language and voice support
  • Advanced multi-voice blending with customizable weights (3+ voices)
  • EPUB, PDF and TXT file input support
  • Standard input (stdin) and | piping from other programs
  • Streaming audio playback
  • Split output into chapters
  • Adjustable speech speed
  • WAV and MP3 output formats
  • Chapter merging capability
  • Detailed debug output option
  • GPU Support
  • Graphical User Interface (GUI) for easy access
  • Web-based GUI with modern interface

Demo

Kokoro Desktop is an open-source CLI tool that delivers high-quality text-to-speech right from your terminal. Think of it as your personal voice studio, capable of transforming any text into natural-sounding speech with minimal effort.

https://github.com/user-attachments/assets/8413e640-59e9-490e-861d-49187e967526

Demo Audio (MP3) | Demo Audio (WAV)

TODO

  • Add GPU support
  • Add PDF support
  • Add multi-voice blending (3+ voices)
  • Add GUI

Prerequisites

  • Python 3.9-3.12 (Python 3.13+ is not currently supported)

Installation

Method 1: Install from PyPI (Recommended)

The easiest way to install Kokoro Desktop is from PyPI:

# Using uv (recommended)
uv tool install kokoro-desktop

# Using pip
pip install kokoro-desktop

After installation, you can run:

  • Command line: kokoro-desktop --help
  • Desktop GUI: kokoro-desktop-gui
  • Web GUI: kokoro-web

Method 2: Install from Git

Install directly from the repository:

# Using uv (recommended)
uv tool install git+https://github.com/gondaliyashreyan1/Kokoro-Desktop

# Using pip
pip install git+https://github.com/gondaliyashreyan1/Kokoro-Desktop

Method 3: Clone and Install Locally

  1. Clone the repository:
git clone https://github.com/gondaliyashreyan1/Kokoro-Desktop.git
cd Kokoro-Desktop
  1. Install the package:

With uv (recommended):

uv venv
uv pip install -e .

With pip:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .
  1. Run the tool:
# If using uv
uv run kokoro-desktop --help

# If using pip with activated venv
kokoro-desktop --help

Method 4: Run Without Installation

If you prefer to run without installing:

  1. Clone the repository:
git clone https://github.com/gondaliyashreyan1/Kokoro-Desktop.git
cd Kokoro-Desktop
  1. Install dependencies only:

With uv:

uv venv
uv sync

With pip:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
  1. Run directly:
# With uv
uv run -m kokoro_tts --help

# With pip (venv activated)
python -m kokoro_tts --help

Download Model Files

After installation, download the required model files to your working directory:

# Download voice data (bin format is preferred)
wget https://github.com/nazdridoy/kokoro-tts/releases/download/v1.0.0/voices-v1.0.bin

# Download the model
wget https://github.com/nazdridoy/kokoro-tts/releases/download/v1.0.0/kokoro-v1.0.onnx

The script requires voices-v1.0.bin and kokoro-v1.0.onnx to be present in the same directory where you run the kokoro-desktop command.

Supported voices:

Category Voices Language Code
🇺🇸 👩 af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky en-us
🇺🇸 👨 am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck en-us
🇬🇧 bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis en-gb
🇫🇷 ff_siwis fr-fr
🇮🇹 if_sara, im_nicola it
🇯🇵 jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo ja
🇨🇳 zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang cmn

Usage

Basic Usage

kokoro-desktop <input_text_file> [<output_audio_file>] [options]

[!NOTE]

  • If you installed via Method 1 (PyPI) or Method 2 (git install), use kokoro-desktop directly
  • If you installed via Method 3 (local install), use uv run kokoro-desktop or activate your virtual environment first
  • If you're using Method 4 (no install), use uv run -m kokoro_tts or python -m kokoro_tts with activated venv

Commands

  • -h, --help: Show help message
  • --help-languages: List supported languages
  • --help-voices: List available voices
  • --merge-chunks: Merge existing chunks into chapter files

Options

  • --stream: Stream audio instead of saving to file
  • --speed <float>: Set speech speed (default: 1.0)
  • --lang <str>: Set language (default: en-us)
  • --voice <str>: Set voice or blend voices (default: interactive selection)
    • Single voice: Use voice name (e.g., "af_sarah")
    • Blended voices: Use "voice1:weight,voice2:weight" format for 2-way blend
    • Multi-way blended voices: Use "voice1:weight,voice2:weight,voice3:weight,..." format for 3+ way blends
  • --split-output <dir>: Save each chunk as separate file in directory
  • --format <str>: Audio format: wav or mp3 (default: wav)
  • --debug: Show detailed debug information during processing

Input Formats

  • .txt: Text file input
  • .epub: EPUB book input (will process chapters)
  • .pdf: PDF document input (extracts chapters from TOC or content)
  • - or /dev/stdin (Linux/macOS) or CONIN$ (Windows): Standard input (stdin)

Examples

# Basic usage with output file
kokoro-desktop input.txt output.wav --speed 1.2 --lang en-us --voice af_sarah

# Read from standard input (stdin)
echo "Hello World" | kokoro-desktop - --stream
cat input.txt | kokoro-desktop - output.wav

# Cross-platform stdin support:
# Linux/macOS: echo "text" | kokoro-desktop - --stream
# Windows: echo "text" | kokoro-desktop - --stream
# All platforms also support: kokoro-desktop /dev/stdin --stream (Linux/macOS) or kokoro-desktop CONIN$ --stream (Windows)

# Use voice blending (60-40 mix)
kokoro-desktop input.txt output.wav --voice "af_sarah:60,am_adam:40"

# Use equal voice blend (50-50)
kokoro-desktop input.txt --stream --voice "am_adam,af_sarah"

# Use multi-way voice blend (40-35-25 mix of three voices)
kokoro-desktop input.txt --stream --voice "am_adam:40,af_sarah:35,bf_emma:25"

# Use 4-way voice blend (30-25-25-20 mix of four voices)
kokoro-desktop input.txt --stream --voice "am_adam:30,af_sarah:25,bf_emma:25,zf_xiaoxiao:20"

# Launch Desktop GUI
kokoro-desktop-gui

# Launch Web GUI
kokoro-web

[!TIP] If you're using Method 3, replace kokoro-desktop with uv run kokoro-desktop in the examples above. If you're using Method 4, replace kokoro-desktop with uv run -m kokoro_tts or python -m kokoro_tts in the examples above.

Features in Detail

EPUB Processing

  • Automatically extracts chapters from EPUB files
  • Preserves chapter titles and structure
  • Creates organized output for each chapter
  • Detailed debug output available for troubleshooting

Audio Processing

  • Chunks long text into manageable segments
  • Supports streaming for immediate playback
  • Voice blending with customizable mix ratios (now supports 3+ voices)
  • Progress indicators for long processes
  • Handles interruptions gracefully

Output Options

  • Single file output
  • Split output with chapter organization
  • Chunk merging capability
  • Multiple audio format support

Debug Mode

  • Shows detailed information about file processing
  • Displays NCX parsing details for EPUB files
  • Lists all found chapters and their metadata
  • Helps troubleshoot processing issues

Input Options

  • Text file input (.txt)
  • EPUB book input (.epub)
  • PDF document input (.pdf)
  • Standard input (stdin)
  • Supports piping from other programs

Contributing

This is a personal project. But if you want to contribute, please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Changelog

Version 2.4.0

  • Added support for multi-voice blending (3+ voices)
  • Enhanced voice blending algorithm to support unlimited voice combinations
  • Updated documentation to reflect new multi-voice capabilities
  • Rebranded from Kokoro TTS to Kokoro Desktop

Version 2.4.1

  • Added desktop GUI for easy access
  • Added web-based GUI with modern interface
  • Fixed voice processing for multi-voice blending
  • Implemented automatic weight normalization

Version 2.4.4

  • Added ASCII art logo with rich formatting
  • Added version variable for easier updates
  • Included rich as dependency for enhanced visuals

Version 2.4.5

  • Added API endpoints for custom emotions and audio effects
  • Added support for registering custom emotion profiles
  • Added support for registering custom audio effects
  • Added comprehensive preset management system
  • Added advanced speaker detection and voice assignment
  • Added model parameter access API
  • Added comprehensive testing suite

Version 2.4.6

  • Added emotion controls to the web GUI
  • Added audio effect controls to the web GUI
  • Enhanced web interface with emotion and effect selectors
  • Improved user experience with visual feedback for emotion/effect settings

Version 2.4.7

  • Replaced emotion controls with speed multiplier controls (since the model only supports speed changes)
  • Added comprehensive web application with full-featured dashboard
  • Enhanced UI with tabbed interface and professional layout
  • Added real-time model status monitoring
  • Improved voice blending controls with dynamic addition/removal
  • Added command: kokoro-app for the enhanced web application

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kokoro_desktop-2.4.7.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kokoro_desktop-2.4.7-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file kokoro_desktop-2.4.7.tar.gz.

File metadata

  • Download URL: kokoro_desktop-2.4.7.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for kokoro_desktop-2.4.7.tar.gz
Algorithm Hash digest
SHA256 faa641acfe2a71a45a704716d2165592c8e8cc0c105ec297f094442bc997d9af
MD5 4c84476463c4f50752fe51429bae6c9c
BLAKE2b-256 7de397a4275570d9c5fa839d8f9f84345a3e16e326e2fdb5977c5ac7fe3e5194

See more details on using hashes here.

File details

Details for the file kokoro_desktop-2.4.7-py3-none-any.whl.

File metadata

  • Download URL: kokoro_desktop-2.4.7-py3-none-any.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for kokoro_desktop-2.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 dadc958f521c4ed6af69d5bc2b0dd7ff5ee7400b7c238ed33da4f0e0a190f5fe
MD5 7220c5d8a4f2fad6e4e28b172906bb8d
BLAKE2b-256 071af2e377e1c52679ad779e476c00f7136739e098b35ed1bf605720343f5e76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page