Skip to main content

Generate audiobooks from EPUB files using Kokoro ONNX TTS.

Project description

PyPI - Version PyPI - Python Version PyPI - Downloads codecov

ttsforge

Convert EPUB files to audiobooks using Kokoro ONNX TTS.

ttsforge is a command-line tool that transforms EPUB ebooks into high-quality audiobooks with support for 54 neural voices across 9 languages.

Features

  • EPUB to Audiobook: Convert EPUB files to M4B, MP3, WAV, FLAC, or OPUS
  • 54 Neural Voices: High-quality TTS in 9 languages
  • SSMD Editing: Edit intermediate SSMD files to fine-tune pronunciation and pacing
  • Custom Phoneme Dictionary: Control pronunciation of names and technical terms
  • Auto Name Extraction: Automatically extract names from books for phoneme customization
  • Mixed-Language Support: Auto-detect and handle multiple languages in text
  • Resumable Conversions: Interrupt and resume long audiobook conversions
  • Phoneme Pre-tokenization: Pre-process text for faster batch conversions
  • Configurable Filenames: Template-based output naming with book metadata
  • Voice Blending: Mix multiple voices for custom narration
  • GPU Acceleration: Optional CUDA support for faster processing
  • Chapter Support: M4B files include chapter markers from EPUB
  • Streaming Read: Listen to EPUB/text directly with the read command

Installation

pip install ttsforge

Optional extras:

# Audio playback (required for --play and read)
pip install "ttsforge[audio]"

# Bundled ffmpeg (if you cannot install system ffmpeg)
pip install "ttsforge[static_ffmpeg]"

# GPU acceleration (CUDA)
pip install "ttsforge[gpu]"

Dependencies

  • ffmpeg: Required for MP3/FLAC/OPUS/M4B output and chapter merging
  • espeak-ng: Required for phonemization
  • sounddevice (optional): Required for audio playback (--play, read)

Ubuntu/Debian:

sudo apt-get install ffmpeg espeak-ng

macOS:

brew install ffmpeg espeak-ng

Quick Start

# Convert an EPUB to audiobook (M4B with chapters)
ttsforge convert book.epub

# Use a specific voice
ttsforge convert book.epub -v am_adam

# Convert specific chapters
ttsforge convert book.epub --chapters 1-5

# List available voices
ttsforge voices

# Generate a voice demo
ttsforge demo

# Read an EPUB aloud (streaming playback)
ttsforge read book.epub

Usage

Basic Conversion

ttsforge convert book.epub

Creates book.m4b with default settings (voice: af_heart, format: M4B).

Voice Selection

# List all voices
ttsforge voices

# List voices for a language
ttsforge voices -l b  # British English

# Convert with specific voice
ttsforge convert book.epub -v bf_emma

Output Formats

ttsforge convert book.epub -f mp3    # MP3
ttsforge convert book.epub -f wav    # WAV (uncompressed)
ttsforge convert book.epub -f flac   # FLAC (lossless)
ttsforge convert book.epub -f opus   # OPUS
ttsforge convert book.epub -f m4b    # M4B audiobook (default)

Chapter Selection

# Preview chapters
ttsforge list book.epub

# Convert range
ttsforge convert book.epub --chapters 1-5

# Convert specific chapters
ttsforge convert book.epub --chapters 1,3,5,7

# Mixed selection
ttsforge convert book.epub --chapters 1-3,5,10-15

Speed Control

ttsforge convert book.epub -s 1.2   # 20% faster
ttsforge convert book.epub -s 0.9   # 10% slower

Resumable Conversions

Conversions are resumable by default. If interrupted, re-run the same command:

ttsforge convert book.epub  # Resumes from last chapter
ttsforge convert book.epub --fresh  # Start over

Phoneme Workflow

For large books or batch processing, pre-tokenize to phonemes:

# Export to phonemes (fast, CPU-only)
ttsforge phonemes export book.epub

# Convert phonemes to audio (can run on different machine)
ttsforge phonemes convert book.phonemes.json -v am_adam

Configuration

# View settings
ttsforge config --show

# Set defaults
ttsforge config --set default_voice am_adam
ttsforge config --set default_format mp3
ttsforge config --set use_gpu true

# Reset to defaults
ttsforge config --reset

Filename Templates

Customize output filenames with metadata:

ttsforge config --set output_filename_template "{author} - {book_title}"

Available variables: {book_title}, {author}, {chapter_title}, {chapter_num}, {input_stem}, {chapters_range}

Voices

ttsforge includes 54 voices across 9 languages:

Language Code Voices Default
American English a 20 af_heart
British English b 8 bf_emma
Spanish e 3 ef_dora
French f 1 ff_siwis
Hindi h 4 hf_alpha
Italian i 2 if_sara
Japanese j 5 jf_alpha
Brazilian Portuguese p 3 pf_dora
Mandarin Chinese z 8 zf_xiaoxiao

Voice naming: {lang}{gender}_{name} (e.g., am_adam = American Male "Adam")

Voice Demo

# Demo all voices
ttsforge demo

# Demo specific language
ttsforge demo -l a

# Save individual voice files
ttsforge demo --separate -o ./voices/

Voice Blending

Mix multiple voices for custom narration:

# Using --voice parameter (auto-detects blend format)
ttsforge convert book.epub --voice "af_nicole:50,am_michael:50"

# Using --voice-blend parameter (traditional method)
ttsforge convert book.epub --voice-blend "af_nicole:50,am_michael:50"

# Weighted blends (70% Nicole, 30% Michael)
ttsforge convert book.epub --voice "af_nicole:70,am_michael:30"

# Works with all commands
ttsforge sample "Hello world" --voice "af_sky:60,bf_emma:40" -p
ttsforge phonemes preview "Test blend" --voice "am_adam:50,am_michael:50" --play

Mixed-Language Support

For books with multiple languages (e.g., German text with English technical terms):

# Enable mixed-language auto-detection
ttsforge convert book.epub \
  --use-mixed-language \
  --mixed-language-primary de \
  --mixed-language-allowed de,en-us

# Test with a sample
ttsforge sample \
  "Das ist ein deutscher Satz. This is an English sentence." \
  --use-mixed-language \
  --mixed-language-primary de \
  --mixed-language-allowed de,en-us

Requirements: Install lingua-language-detector for automatic language detection:

pip install lingua-language-detector

Configuration options:

  • --use-mixed-language - Enable mixed-language mode
  • --mixed-language-primary LANG - Primary language (e.g., de, en-us)
  • --mixed-language-allowed LANGS - Comma-separated list of allowed languages
  • --mixed-language-confidence FLOAT - Detection confidence threshold (0.0-1.0, default: 0.7)

Supported languages: en-us, en-gb, de, fr-fr, es, it, pt, pl, tr, ru, ko, ja, zh/cmn

SSMD Editing

ttsforge uses SSMD (Speech Synthesis Markdown) as an intermediate format between your EPUB and the final audio. This allows you to fine-tune pronunciation, pacing, and emphasis before conversion.

How It Works

During conversion, ttsforge automatically generates .ssmd files for each chapter:

.{book_title}_chapters/
├── chapter_001_intro.ssmd      # Editable text with speech markup
├── chapter_001_intro.wav
├── chapter_002_chapter1.ssmd
├── chapter_002_chapter1.wav

When you resume a conversion, ttsforge detects if you've edited any SSMD files and automatically regenerates the audio.

Basic Workflow

# 1. Start conversion
ttsforge convert book.epub

# 2. Pause conversion (Ctrl+C)

# 3. Edit SSMD files to fix pronunciation or pacing
vim .book_chapters/chapter_001_intro.ssmd

# 4. Resume - automatically detects edits and regenerates audio
ttsforge convert book.epub

SSMD Syntax

SSMD files use a simple markdown-like syntax:

Structural Breaks (control pauses):

...p    # Paragraph break (0.5-1.0s pause)
...s    # Sentence break (0.1-0.3s pause)
...c    # Clause break (shorter pause)

Emphasis:

*text*      # Moderate emphasis
**text**    # Strong emphasis

Custom Phonemes:

[Hermione]{ph="hɝmˈIni"}    # Override pronunciation
[API]{ph="ˌeɪpiˈaɪ"}        # Technical terms

Language Switching (planned):

[Bonjour]{lang="fr"}    # Mark text as French

Example SSMD File

Chapter One ...p

[Harry]{ph="hæɹi"} Potter was a *highly unusual* boy in many ways. ...s
For one thing, he **hated** the summer holidays more than any other
time of year. ...s For another, he really wanted to do his homework,
but was forced to do it in secret, in the dead of the night. ...p

And he also happened to be a wizard. ...p

When to Use SSMD Editing

  • Pronunciation issues: Character names, technical terms, foreign words
  • Pacing problems: Adjust paragraph and sentence breaks
  • Emphasis corrections: Add or remove emphasis on specific words
  • Combine with phoneme dictionary: Phoneme dictionary applied automatically to SSMD

For detailed SSMD syntax and examples, see SSMD_QUICKSTART.md.

Custom Phoneme Dictionary

Control pronunciation of character names, technical terms, and foreign words with custom phoneme dictionaries.

Quick Start

# 1. Extract names from your book (requires spacy)
ttsforge extract-names mybook.epub

# 2. Review the generated custom_phonemes.json file
ttsforge list-names custom_phonemes.json

# 3. Test pronunciation with sample
ttsforge sample "Hermione loves Kubernetes" --phoneme-dict custom_phonemes.json -p

# 4. Convert with custom pronunciations
ttsforge convert mybook.epub --phoneme-dict custom_phonemes.json

Requirements

For automatic name extraction (optional but recommended):

pip install spacy
python -m spacy download en_core_web_sm

Workflow

1. Extract names from your book:

# Extract frequent names (≥3 occurrences)
ttsforge extract-names mybook.epub

# Preview without saving
ttsforge extract-names mybook.epub --preview

# Only very frequent names (≥10 occurrences)
ttsforge extract-names mybook.epub --min-count 10 -o names.json

# Include all proper nouns, not just detected person names
ttsforge extract-names mybook.epub --include-all

This creates a custom_phonemes.json file with auto-generated phoneme suggestions.

2. Review and edit the dictionary:

# List all entries
ttsforge list-names custom_phonemes.json

# Sort alphabetically
ttsforge list-names custom_phonemes.json --sort-by alpha

Edit custom_phonemes.json to fix any incorrect phonemes. The file format is:

{
  "_metadata": {
    "generated_from": "mybook.epub",
    "language": "en-us"
  },
  "entries": {
    "Hermione": {
      "phoneme": "hɝmˈIni",
      "occurrences": 847,
      "verified": false
    },
    "Kubernetes": {
      "phoneme": "kubɚnˈɛtɪs",
      "occurrences": 12,
      "verified": false
    }
  }
}

Or use the simple format:

{
  "Hermione": "hɝmˈIni",
  "Kubernetes": "kubɚnˈɛtɪs"
}

3. Test pronunciation:

# Test specific names
ttsforge sample "Hermione and Harry" --phoneme-dict custom_phonemes.json -p

# Test and save to file
ttsforge sample "Hermione and Harry" --phoneme-dict custom_phonemes.json -o test.wav

4. Convert your book:

# Use the dictionary for conversion
ttsforge convert mybook.epub --phoneme-dict custom_phonemes.json

# Case-sensitive matching (default is case-insensitive)
ttsforge convert mybook.epub \
  --phoneme-dict custom_phonemes.json \
  --phoneme-dict-case-sensitive

Manual Dictionary Creation

You can create a dictionary manually without extraction:

{
  "Katniss": "kætnɪs",
  "Peeta": "pitə",
  "Panem": "pænəm"
}

Getting IPA Phonemes

To find the correct IPA phonemes for a word:

  1. Use ttsforge sample "word" -p to hear the default pronunciation
  2. Look up IPA pronunciation online (e.g., Wiktionary, IPA dictionaries)
  3. Or use the auto-generated phonemes as a starting point

Note: Phoneme matching is case-insensitive by default and respects word boundaries (e.g., "test" won't match "testing").

Commands

Command Description
convert Convert EPUB to audiobook
list List chapters in EPUB
info Show EPUB metadata
sample Generate sample audio
read Stream playback from EPUB/text
voices List available voices
demo Generate voice demo
extract-names Extract names for phoneme dictionary
list-names List names in phoneme dictionary
download Download ONNX models
config Manage configuration
phonemes export Export EPUB to phonemes
phonemes convert Convert phonemes to audio
phonemes info Show phoneme file info
phonemes preview Preview text as phonemes

GPU Acceleration

For faster processing with CUDA:

pip install onnxruntime-gpu
ttsforge config --set use_gpu true

Or use per-command:

ttsforge convert book.epub --gpu

Configuration Options

Option Default Description
default_voice af_heart Default TTS voice
default_language a Default language code
default_speed 1.0 Speech speed (0.5-2.0)
default_format m4b Output format
use_gpu false Enable GPU acceleration
model_quality fp32 Model quality/quantization
model_variant v1.0 Model variant
silence_between_chapters 2.0 Chapter gap (seconds)
pause_clause 0.5 Clause pause (seconds)
pause_sentence 0.7 Sentence pause (seconds)
pause_paragraph 0.9 Paragraph pause (seconds)
pause_variance 0.05 Pause variance (seconds)
pause_mode auto Pause mode (tts, manual, auto)
enable_short_sentence None Handle short sentences
announce_chapters true Speak chapter titles
chapter_pause_after_title 2.0 Pause after chapter title
phonemization_lang None Override phonemization language
output_filename_template {book_title} Output filename template
default_content_mode chapters read mode (chapters/pages)
default_page_size 2000 Page size for read pages mode
use_mixed_language false Enable mixed-language mode
mixed_language_primary None Primary language for mixed mode
mixed_language_allowed None Allowed languages (list)
mixed_language_confidence 0.7 Language detection threshold

Documentation

Full documentation: https://ttsforge.readthedocs.io/

Build locally:

cd docs
pip install sphinx sphinx-rtd-theme
make html

Requirements

  • Python 3.10+
  • ffmpeg (for MP3/FLAC/OPUS/M4B output and chapter merging)
  • espeak-ng (for phonemization)
  • ~330MB disk space (ONNX models)
  • sounddevice (optional, for audio playback)

License

MIT License

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttsforge-0.1.2.tar.gz (155.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttsforge-0.1.2-py3-none-any.whl (100.8 kB view details)

Uploaded Python 3

File details

Details for the file ttsforge-0.1.2.tar.gz.

File metadata

  • Download URL: ttsforge-0.1.2.tar.gz
  • Upload date:
  • Size: 155.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ttsforge-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5b4574e3972dfcfacd1e5163a07ff40716ee1bc543a9962cfc5e031e7b7d0fab
MD5 b45738932629aebbddaa9e36df61bd56
BLAKE2b-256 f44ddd80bc0559556b5e3785df18c55620ed2196504f3d3f067d1c66e0d8882c

See more details on using hashes here.

File details

Details for the file ttsforge-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ttsforge-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 100.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ttsforge-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e930f5b0a895fad06709606390e4ed1a06fef078f99d49482e6f3e561b0815e
MD5 514667c16a2c7c571285bc9971407d78
BLAKE2b-256 e8d687a988b3d0fb4e285fc9abe4252780ebb6f039a7f908f610b25f06788a6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page