Skip to main content

Convert Ebooks to Audiobooks with [custom] voice samples

Project description

kenkui

Python Platform License PyPI

Freaky fast audiobook generation from ebooks. No GPU. No nonsense.

kenkui turns ebooks into high-quality M4B audiobooks using state-of-the-art text-to-speech — entirely on CPU, and faster than anything else I've used.

It's built on top of Kyutai's pocket-tts, with all the annoying parts handled for you: chapter parsing, batching, metadata, covers, voices, and sane defaults.

If you have ebooks and want audiobooks, kenkui is for you.


✨ Features

  • Freaky fast audiobook generation
  • No GPU needed, 100% CPU
  • Super high-quality text-to-speech
  • Multithreaded
  • Supports EPUB, MOBI/AZW, and FB2
  • Interactive wizard with smart defaults and Escape to go back
  • Job queue with live progress dashboard
  • Multi-voice narration — different voices for different characters, powered by an LLM
  • Chapter-voice mode — assign a distinct voice to each chapter
  • Three tiers of voices: compiled, built-in, and custom
  • Flexible chapter selection with presets and manual override
  • Broadcast-quality audio post-processing chain
  • Automatic cover embedding
  • Sensible defaults, minimal configuration

🚀 Quick Start

kenkui is intentionally easy to install and easy to use.

One-line installer (macOS / Linux)

curl -sSL https://raw.githubusercontent.com/D1zzl3D0p/kenkui/main/install.sh | bash

One-line installer (Windows)

powershell -Command "irm https://raw.githubusercontent.com/D1zzl3D0p/kenkui/main/install.ps1 | iex"

Requirements

  • Python 3.12+
  • One Python installer: uv (recommended), pip, or pipx

Manual install

uv tool install kenkui

Or with pip/pipx:

pip install kenkui
# or
pipx install kenkui

Compiled voices (~440 MB) are downloaded automatically on first run. To download them ahead of time:

kenkui voices download

Run

kenkui book.epub

That's it. An interactive wizard walks you through the setup, then kenkui queues the job, starts the worker, and shows a live progress dashboard. You'll get a book.m4b alongside your ebook when it's done.


📚 Usage

Interactive wizard (default)

kenkui book.epub

Walks you through:

  1. Chapter selection
  2. Narration mode (single voice, multi-voice, or chapter-voice)
  3. Optional per-job quality overrides
  4. Output directory
  5. Voice assignment (fast character scan for multi-voice, then voice picker)
  6. Final confirmation

Press Escape at any step to go back to the previous one.

Then auto-starts the queue with a live Rich dashboard.

Headless mode

Pass a config file with -c to skip the wizard entirely:

kenkui book.epub -c my-config.toml

Loads config, queues the job, starts the worker, and streams progress to the terminal. Exits 0 on success, 1 on failure.

Add to queue without starting

kenkui add book.epub

Queues the job interactively but doesn't auto-start processing. Useful when you want to queue several books first.

# Headless queue-only (no auto-start)
kenkui add book.epub -c my-config.toml

Queue management

# Snapshot: show all jobs in a Rich table and exit
kenkui queue

# Live-refreshing dashboard (Ctrl+C to exit)
kenkui queue --live

# Start processing the next pending job
kenkui queue start

# Start processing + enter live dashboard
kenkui queue start --live

# Stop the current job
kenkui queue stop

🎙️ Narration Modes

Single voice

The default. One voice narrates everything.

Multi-voice (character narration)

kenkui uses an NLP pipeline to identify characters in the book and assigns each a distinct voice. The narrator gets its own voice too.

Requirements:

  • Ollama running locally (ollama serve)
  • NLP model pulled (default: llama3.2) — ollama pull llama3.2
  • spaCy model — kenkui downloads this automatically if missing

The wizard checks all requirements and shows a status table before proceeding.

How it works:

The wizard runs a fast character scan (seconds) before voice assignment so you can assign voices and walk away — the slower LLM attribution phase runs unattended in the background before TTS begins.

Assignment modes in the wizard:

  • Simple — all male characters share one voice, all females share another
  • Advanced — individual voice per character; kenkui auto-assigns by gender and resolves conflicts for characters that appear in the same chapter, then lets you review and adjust

When reviewing assignments, the voice picker shows which other characters are already using each voice (e.g. ← Rand al'Thor) so you can avoid conflicts at a glance. Press Enter to accept all assignments.

The Ollama model used for attribution is configurable via nlp_model in your config.

Chapter-voice mode

Assign a distinct voice to each chapter. The wizard presents each chapter title and lets you pick a voice for it.


🎭 Voice System

Voices come in three tiers:

Tier Source Auth required?
Compiled Downloaded from HuggingFace on first run No
Built-in 8 pocket-tts defaults No
Custom .wav files (user-provided or fetched) Yes (HuggingFace)

Built-in voices:

alba, marius, javert, jean, fantine, cosette, eponine, azelma

Listing voices

kenkui voices
# or
kenkui voices list

# Filter by metadata
kenkui voices list --gender Female
kenkui voices list --accent Scottish
kenkui voices list --dataset VCTK
kenkui voices list --source compiled

Downloading compiled voices

Compiled voices are downloaded automatically on first run. To download them manually or re-download:

kenkui voices download

# Force a fresh re-download
kenkui voices download --force

Voices are stored at ~/.local/share/kenkui/voices/.

Fetching custom voices

kenkui voices fetch --repo user/repo-name

# Or set an env var
KENKUI_VOICES_REPO=user/repo-name kenkui voices fetch

Downloads .wav files from a HuggingFace repo to ~/.local/share/kenkui/voices/uncompiled/. Requires a free HuggingFace account (see FAQ).

Using your own voice

Record a 5–10 second clip of clean speech with minimal background noise or crosstalk. Cleaning the audio makes a noticeable difference — tools like Adobe's Enhance Speech work well: https://podcast.adobe.com/en/enhance

You can pass a local .wav file directly in the wizard, or use a Hugging Face URL:

hf://user/repo/voice.wav

⚙️ Configuration

kenkui uses TOML config files stored at ~/.config/kenkui/ (XDG).

# Create or edit the default config
kenkui config default

# Create a named config
kenkui config fast-mode

# Use a named config
kenkui book.epub -c fast-mode

Named configs without a path separator are automatically looked up in ~/.config/kenkui/.

Key settings

Key Default Description
workers cpu_count - 2 Parallel TTS worker processes
m4b_bitrate 96k Output audio bitrate
temp 0.7 Sampling temperature (lower = stable, higher = expressive)
lsd_decode_steps 1 LSD decode steps (higher = better quality, slower)
noise_clamp off Noise clamp (~3.0 reduces audio glitches)
eos_threshold -4.0 End-of-speech detection threshold
frames_after_eos auto Frames after EOS cutoff (0 = suppress trailing noise)
default_voice alba Fallback voice when no per-job override
default_chapter_preset content-only Default chapter filter preset
default_output_dir next to source Where to write output files
pause_line_ms 800 Pause between lines (ms)
pause_chapter_ms 2000 Pause between chapters (ms)
pause_scene_break_ms 4000 Pause at scene breaks (ms)
nlp_model llama3.2 Ollama model for multi-voice speaker attribution

Per-job quality overrides

The interactive wizard lets you override quality settings for a single job without changing your config. These are the same settings as above, prefixed with job_:

job_temp, job_lsd_decode_steps, job_noise_clamp, job_eos_threshold, job_m4b_bitrate, job_pause_line_ms, job_pause_chapter_ms, job_frames_after_eos


📖 Chapter Selection

The wizard lets you choose which chapters to include. Available presets:

Preset Description
content-only Body chapters only, skips front/back matter (default)
chapters-only Titled chapters only
with-parts Chapters and part headings
all Every item in the ebook
none Skip everything

After selecting a preset, the wizard shows a checkbox list of all chapters with the preset's defaults pre-selected. You can toggle individual chapters from there.


🔊 Audio Post-Processing

kenkui applies a broadcast-quality effects chain to every chapter WAV before stitching:

  1. Noise reduction
  2. High-pass filter (removes low-end rumble)
  3. Low shelf EQ (reduces boominess)
  4. Presence boost (clarity)
  5. De-esser
  6. Compressor
  7. Limiter
  8. Autogain (EBU R128 normalization)

Final loudness normalization is available as an optional step. All parameters are configurable via the [post_processing] section of your config.


FAQ

Do I need a GPU? No. kenkui is 100% CPU-based.

Is it actually fast? Yes. That's the entire point of the project.

What output format does it use? M4B, with chapters, metadata, and embedded covers.

What ebook formats does it support? EPUB, MOBI/AZW/AZW3/AZW4, and FB2.

Can it generate MP3s? No. This is intentional — M4B is a significantly better format for audiobooks.

How does multi-voice narration work? kenkui runs a two-stage NLP pipeline: first BookNLP and spaCy identify characters and their dialogue; then an Ollama LLM resolves ambiguous attribution. The result is a speaker map where each character speaks in their assigned voice and the narrator fills everything else.

Why do I need Ollama for multi-voice? The LLM resolves ambiguous dialogue attribution that rule-based systems can't handle reliably. It runs locally via Ollama — nothing leaves your machine.

Why do I need a HuggingFace account for custom voices? The pocket-tts model is gated on HuggingFace, meaning the authors require users to accept their terms before downloading. This only applies to custom uncompiled .wav voices — compiled voices and built-ins require no authentication at all.

When you first use a custom voice, kenkui guides you through creating a free HuggingFace account, generating a read-only token, and accepting the model's terms. You only need to do this once.

Where are voices stored? Compiled voices are downloaded to ~/.local/share/kenkui/voices/compiled/ on first run. Custom voices go to ~/.local/share/kenkui/voices/uncompiled/. Run kenkui voices download --force to re-download from scratch.

Does it upload my books anywhere? No. Everything runs locally. Internet access is only needed to pull models from HuggingFace or Ollama the first time.

Why isn't kenkui finding my ebook in a hidden directory? kenkui doesn't search hidden directories by default. Pass the file directly:

kenkui /path/to/hidden/directory/book.epub

Non-Goals

kenkui is not meant to be:

  • A general-purpose text-to-speech framework
  • A GUI application
  • An MP3 audiobook generator
  • A pluggable frontend for every TTS backend available

The focus is narrow by design: fast, high-quality audiobook generation from ebooks, with minimal friction.


🙏 Special Thanks

Thanks to Project Gutenberg for providing some of the public-domain books included with kenkui.


Voice Dataset Credits

kenkui's compiled voices are derived from two publicly available speech corpora.

CSTR VCTK Corpus

Veaux, Christoph; Yamagishi, Junichi; MacDonald, Kirsten. (2019). CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).

Licensed under Creative Commons Attribution 4.0 (CC BY 4.0). Commercial use is permitted with attribution.

EARS Dataset

Licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).

Note: Compiled voices sourced from EARS (identifiable by EARS in the voice name via kenkui voices list) may not be used for commercial purposes. If you are building a commercial product with kenkui, use only VCTK-sourced or built-in voices.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kenkui-1.1.0.tar.gz (10.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kenkui-1.1.0-py3-none-any.whl (10.8 MB view details)

Uploaded Python 3

File details

Details for the file kenkui-1.1.0.tar.gz.

File metadata

  • Download URL: kenkui-1.1.0.tar.gz
  • Upload date:
  • Size: 10.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kenkui-1.1.0.tar.gz
Algorithm Hash digest
SHA256 86093e2c93b382c16f671f0e29191193a4a2c59654eb8f565b4a47c995559eba
MD5 a40bba6dba8a03265653a6f457af3d59
BLAKE2b-256 a03f6de087931c531ceca8749cadbf97794b602cac3da450c7d45c2ce14f82a9

See more details on using hashes here.

File details

Details for the file kenkui-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: kenkui-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kenkui-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5be82286043ad7b85650cf7c5d3f97cf011792b38b23136a15f068ff660b910
MD5 6aba93b1bf37e43a6fe2c2270fcef923
BLAKE2b-256 c07a28e25745f55c2379f27ce5f3a353390508f8dddd5e6691a36ab1f5375523

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page