Skip to main content

Interactive CLI for kenkui — convert ebooks to audiobooks locally

Project description

kentui

Python Platform License PyPI

Freaky fast audiobook generation from ebooks. No GPU. No nonsense.

kentui is the interactive CLI for kenkui — an ebook-to-audiobook converter powered by Kyutai's pocket-tts, running entirely on CPU.

kentui handles the interactive parts: configuration wizard, voice management, chapter selection, and progress display. The actual conversion engine is kenkui, which kentui depends on.


✨ Features

  • Freaky fast audiobook generation
  • No GPU needed, 100% CPU
  • Super high-quality text-to-speech
  • Interactive hub with live status panel and Escape to go back
  • Multi-voice narration — different voices for different characters, powered by an LLM
  • Chapter-voice mode — assign a distinct voice to each chapter
  • Voice pool template — persistent global defaults for automatic voice assignment
  • Credits chapter — synthesized audio appended to every m4b
  • Three tiers of voices: compiled, built-in, and custom
  • Flexible chapter selection with presets and manual override
  • Broadcast-quality audio post-processing chain
  • Supports EPUB, MOBI/AZW, and FB2

🚀 Quick Start

Requirements

  • Python 3.12+

Install

pip install kentui

Or with uv / pipx:

uv tool install kentui
pipx install kentui

Compiled voices (~440 MB) are downloaded automatically on first run. To download them ahead of time:

kentui voices download

Run

kentui book.epub

That's it. An interactive wizard walks you through the setup, then kentui runs the job and shows a live progress bar. You'll get a book.m4b alongside your ebook when it's done.


📚 Usage

Interactive wizard (default)

kentui book.epub

Opens a configuration hub showing a live status panel of your current settings, then a menu:

┌─ Current Settings ───────────────────────────────────────────┐
│  Mode:          Multi-voice                                   │
│  NLP:           Anthropic · claude-haiku-4-5                 │
│  TTS Provider:  pocket-tts · local                           │
│  Narrator:      sarah                                         │
│  Chapters:      content-only (42 selected)                   │
│  Quality:       temp 0.8 · 30 LSD steps · 96k               │
└───────────────────────────────────────────────────────────────┘

  > Submit Job
    Narrator Voice →
    Chapters →
    Narration Mode →
    Series →
    Advanced Options →
    Cancel

Press Escape at any step to go back. All settings persist to ~/.config/kenkui/last_job_profile.toml and pre-load on the next run.

Headless mode

Pass a config file with -c to skip the wizard entirely:

kentui book.epub -c my-config.toml

Exits 0 on success, 1 on failure.

kentui add

# Interactive wizard
kentui add book.epub

# Headless
kentui add book.epub -c my-config.toml

Pipeline step commands

kentui parse book.epub       # Stage 1-2 NLP: entity scan + character clustering
kentui attribute book.epub   # Stage 3-4 NLP: speaker attribution
kentui generate book.epub    # TTS + stitch (requires prior NLP cache)

🎙️ Narration Modes

Single voice

The default. One voice narrates everything.

Multi-voice (character narration)

kenkui uses an NLP pipeline to identify characters and assigns each a distinct voice. The narrator gets its own voice too.

Two NLP backends are available: Ollama (local, default) and cloud providers (Anthropic, OpenAI, Google).

Ollama (default)

Requirements:

  • Ollama running locally (ollama serve)
  • NLP model pulled (default: llama3.2) — ollama pull llama3.2
  • spaCy model — downloaded automatically if missing

Cloud providers (Anthropic, OpenAI, Google)

Run kentui config and answer yes to "Configure a cloud NLP provider API key?" to set up credentials.

Default models:

Provider Default model
anthropic claude-sonnet-4-6
openai gpt-4o
google gemini/gemini-2.0-flash

How voice assignment works:

After the scan completes, voices are assigned using a three-tier priority system:

  1. Series record — named character → pinned voice (highest priority)
  2. Voice pool template — role + gender + rank → voice
  3. Round-robin pool — any remaining characters

Chapter-voice mode

Assign a distinct voice to each chapter. The wizard presents each chapter title and lets you pick a voice.


🗂️ Voice Pool Template

The voice pool template (~/.config/kenkui/voice_pool.toml) pre-assigns voices by character role, gender, and rank. It applies automatically to every multi-voice job.

[protagonist.male]
1 = "david"
2 = "james"
pool = ["oliver", "ethan"]

[protagonist.female]
1 = "sarah"
pool = ["emma", "claire"]

[supporting.male]
pool = ["oliver", "ethan", "marcus"]

[minor]
pool = []  # fallback: any non-excluded voice

🎙️ Voice System

Voices come in three tiers:

Tier Source Auth required?
Compiled Downloaded from HuggingFace on first run No
Built-in 8 pocket-tts defaults No
Custom .wav files (user-provided or fetched) Yes (HuggingFace)

Built-in voices:

alba, marius, javert, jean, fantine, cosette, eponine, azelma

Voice manager

kentui voices

Launches an interactive voice manager: browse, audition, manage the exclusion pool, and look up the character cast for a completed multi-voice book.

Voice commands

# List voices (with optional filters)
kentui voices list
kentui voices list --gender Female
kentui voices list --accent Scottish
kentui voices list --source compiled

# Audition a voice
kentui voices audition <voice>
kentui voices audition <voice> --text "Your preview text here."

# Download compiled voices
kentui voices download
kentui voices download --force

# Fetch custom voices from HuggingFace
kentui voices fetch --repo user/repo-name

# Manage auto-assignment pool
kentui voices exclude <voice>
kentui voices include <voice>

# Look up a book's character cast
kentui voices cast <title>

⚙️ Configuration

# Create or edit the default config
kentui config

# Create a named config profile
kentui config fast-mode

# Use a named config
kentui book.epub -c fast-mode

Key settings

Key Default Description
workers cpu_count - 2 Parallel TTS worker processes
m4b_bitrate 96k Output audio bitrate
temp 0.7 Sampling temperature
lsd_decode_steps 1 LSD decode steps (higher = better quality, slower)
default_voice alba Fallback voice
default_chapter_preset content-only Default chapter filter preset
pause_line_ms 800 Pause between lines (ms)
pause_chapter_ms 2000 Pause between chapters (ms)
pause_scene_break_ms 4000 Pause at scene breaks (ms)
nlp_provider ollama NLP backend
nlp_model llama3.2 Model for speaker attribution
credits_enabled true Append synthesized credits audio

📖 Chapter Selection

Preset Description
content-only Body chapters only (default)
chapters-only Titled chapters only
with-parts Chapters and part headings
all Every item in the ebook
none Skip everything

After selecting a preset, the wizard shows a checkbox list of all chapters with the preset's defaults pre-selected.


🔊 Audio Post-Processing

kenkui applies a broadcast-quality effects chain: noise reduction → high-pass filter → low shelf EQ → presence boost → de-esser → compressor → limiter → autogain. All parameters are configurable via kentui config.


FAQ

Do I need a GPU? No. kenkui is 100% CPU-based.

What ebook formats does it support? EPUB, MOBI/AZW/AZW3/AZW4, and FB2.

What output format does it use? M4B, with chapters, metadata, and embedded covers.

Do I need Ollama for multi-voice? No. You can use Ollama (local) or Anthropic, OpenAI, or Google. Run kentui config to set up a cloud provider.

Does it upload my books anywhere? With the default Ollama backend: no. With a cloud NLP provider, the book text is sent to that provider's API for the character scan. Nothing else is uploaded.


🙏 Special Thanks

Thanks to Project Gutenberg for providing some of the public-domain books included with kenkui.


Voice Dataset Credits

kenkui's compiled voices are derived from two publicly available speech corpora.

CSTR VCTK Corpus

Veaux, Christoph; Yamagishi, Junichi; MacDonald, Kirsten. (2019). CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).

Licensed under Creative Commons Attribution 4.0 (CC BY 4.0). Commercial use is permitted with attribution.

EARS Dataset

Licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).

Note: Compiled voices sourced from EARS (identifiable by EARS in the voice name via kentui voices list) may not be used for commercial purposes. If you are building a commercial product with kenkui, use only VCTK-sourced or built-in voices.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kentui-0.1.0.tar.gz (56.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kentui-0.1.0-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file kentui-0.1.0.tar.gz.

File metadata

  • Download URL: kentui-0.1.0.tar.gz
  • Upload date:
  • Size: 56.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kentui-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f5c872bf10d5b9f7898a4ff415dc522e6114e6ba70d34cf51c18923ffe3a7451
MD5 58afa24f7e47bbbc8b100123a21cfe0a
BLAKE2b-256 c5449242ed2d74cdebc46445bbc510ba63aae2bf5b6a30bd279e29c8ddae24de

See more details on using hashes here.

File details

Details for the file kentui-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kentui-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kentui-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cb5685cec96553e139b45a0352e1115dcf9a4d80a7b3e248b00e7d64bab1cf1
MD5 c27af57314418f848f8d488308cd98c7
BLAKE2b-256 40b196b14fc0be50955d3523ff2f0836168cd55c6c4a3e0621c31108b12b0d5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page