Interactive CLI for kenkui — convert ebooks to audiobooks locally
Project description
kentui
Freaky fast audiobook generation from ebooks. No GPU. No nonsense.
kentui is the interactive CLI for kenkui — an ebook-to-audiobook converter powered by Kyutai's pocket-tts, running entirely on CPU.
kentui handles the interactive parts: configuration wizard, voice management, chapter selection, and progress display. The actual conversion engine is kenkui, which kentui depends on.
✨ Features
- Freaky fast audiobook generation
- No GPU needed, 100% CPU
- Super high-quality text-to-speech
- Interactive hub with live status panel and Escape to go back
- Multi-voice narration — different voices for different characters, powered by an LLM
- Chapter-voice mode — assign a distinct voice to each chapter
- Voice pool template — persistent global defaults for automatic voice assignment
- Credits chapter — synthesized audio appended to every m4b
- Three tiers of voices: compiled, built-in, and custom
- Flexible chapter selection with presets and manual override
- Broadcast-quality audio post-processing chain
- Supports EPUB, MOBI/AZW, and FB2
🚀 Quick Start
Requirements
- Python 3.12+
Install
pip install kentui
Or with uv / pipx:
uv tool install kentui
pipx install kentui
Compiled voices (~440 MB) are downloaded automatically on first run. To download them ahead of time:
kentui voices download
Run
kentui book.epub
That's it. An interactive wizard walks you through the setup, then kentui runs the job and shows a live progress bar. You'll get a book.m4b alongside your ebook when it's done.
📚 Usage
Interactive wizard (default)
kentui book.epub
Opens a configuration hub showing a live status panel of your current settings, then a menu:
┌─ Current Settings ───────────────────────────────────────────┐
│ Mode: Multi-voice │
│ NLP: Anthropic · claude-haiku-4-5 │
│ TTS Provider: pocket-tts · local │
│ Narrator: sarah │
│ Chapters: content-only (42 selected) │
│ Quality: temp 0.8 · 30 LSD steps · 96k │
└───────────────────────────────────────────────────────────────┘
> Submit Job
Narrator Voice →
Chapters →
Narration Mode →
Series →
Advanced Options →
Cancel
Press Escape at any step to go back. All settings persist to ~/.config/kenkui/last_job_profile.toml and pre-load on the next run.
Headless mode
Pass a config file with -c to skip the wizard entirely:
kentui book.epub -c my-config.toml
Exits 0 on success, 1 on failure.
kentui add
# Interactive wizard
kentui add book.epub
# Headless
kentui add book.epub -c my-config.toml
Pipeline step commands
kentui parse book.epub # Stage 1-2 NLP: entity scan + character clustering
kentui attribute book.epub # Stage 3-4 NLP: speaker attribution
kentui generate book.epub # TTS + stitch (requires prior NLP cache)
🎙️ Narration Modes
Single voice
The default. One voice narrates everything.
Multi-voice (character narration)
kenkui uses an NLP pipeline to identify characters and assigns each a distinct voice. The narrator gets its own voice too.
Two NLP backends are available: Ollama (local, default) and cloud providers (Anthropic, OpenAI, Google).
Ollama (default)
Requirements:
- Ollama running locally (
ollama serve) - NLP model pulled (default:
llama3.2) —ollama pull llama3.2 - spaCy model — downloaded automatically if missing
Cloud providers (Anthropic, OpenAI, Google)
Run kentui config and answer yes to "Configure a cloud NLP provider API key?" to set up credentials.
Default models:
| Provider | Default model |
|---|---|
anthropic |
claude-sonnet-4-6 |
openai |
gpt-4o |
google |
gemini/gemini-2.0-flash |
How voice assignment works:
After the scan completes, voices are assigned using a three-tier priority system:
- Series record — named character → pinned voice (highest priority)
- Voice pool template — role + gender + rank → voice
- Round-robin pool — any remaining characters
Chapter-voice mode
Assign a distinct voice to each chapter. The wizard presents each chapter title and lets you pick a voice.
🗂️ Voice Pool Template
The voice pool template (~/.config/kenkui/voice_pool.toml) pre-assigns voices by character role, gender, and rank. It applies automatically to every multi-voice job.
[protagonist.male]
1 = "david"
2 = "james"
pool = ["oliver", "ethan"]
[protagonist.female]
1 = "sarah"
pool = ["emma", "claire"]
[supporting.male]
pool = ["oliver", "ethan", "marcus"]
[minor]
pool = [] # fallback: any non-excluded voice
🎙️ Voice System
Voices come in three tiers:
| Tier | Source | Auth required? |
|---|---|---|
| Compiled | Downloaded from HuggingFace on first run | No |
| Built-in | 8 pocket-tts defaults | No |
| Custom | .wav files (user-provided or fetched) |
Yes (HuggingFace) |
Built-in voices:
alba, marius, javert, jean, fantine, cosette, eponine, azelma
Voice manager
kentui voices
Launches an interactive voice manager: browse, audition, manage the exclusion pool, and look up the character cast for a completed multi-voice book.
Voice commands
# List voices (with optional filters)
kentui voices list
kentui voices list --gender Female
kentui voices list --accent Scottish
kentui voices list --source compiled
# Audition a voice
kentui voices audition <voice>
kentui voices audition <voice> --text "Your preview text here."
# Download compiled voices
kentui voices download
kentui voices download --force
# Fetch custom voices from HuggingFace
kentui voices fetch --repo user/repo-name
# Manage auto-assignment pool
kentui voices exclude <voice>
kentui voices include <voice>
# Look up a book's character cast
kentui voices cast <title>
⚙️ Configuration
# Create or edit the default config
kentui config
# Create a named config profile
kentui config fast-mode
# Use a named config
kentui book.epub -c fast-mode
Key settings
| Key | Default | Description |
|---|---|---|
workers |
cpu_count - 2 |
Parallel TTS worker processes |
m4b_bitrate |
96k |
Output audio bitrate |
temp |
0.7 |
Sampling temperature |
lsd_decode_steps |
1 |
LSD decode steps (higher = better quality, slower) |
default_voice |
alba |
Fallback voice |
default_chapter_preset |
content-only |
Default chapter filter preset |
pause_line_ms |
800 |
Pause between lines (ms) |
pause_chapter_ms |
2000 |
Pause between chapters (ms) |
pause_scene_break_ms |
4000 |
Pause at scene breaks (ms) |
nlp_provider |
ollama |
NLP backend |
nlp_model |
llama3.2 |
Model for speaker attribution |
credits_enabled |
true |
Append synthesized credits audio |
📖 Chapter Selection
| Preset | Description |
|---|---|
content-only |
Body chapters only (default) |
chapters-only |
Titled chapters only |
with-parts |
Chapters and part headings |
all |
Every item in the ebook |
none |
Skip everything |
After selecting a preset, the wizard shows a checkbox list of all chapters with the preset's defaults pre-selected.
🔊 Audio Post-Processing
kenkui applies a broadcast-quality effects chain: noise reduction → high-pass filter → low shelf EQ → presence boost → de-esser → compressor → limiter → autogain. All parameters are configurable via kentui config.
FAQ
Do I need a GPU? No. kenkui is 100% CPU-based.
What ebook formats does it support? EPUB, MOBI/AZW/AZW3/AZW4, and FB2.
What output format does it use? M4B, with chapters, metadata, and embedded covers.
Do I need Ollama for multi-voice?
No. You can use Ollama (local) or Anthropic, OpenAI, or Google. Run kentui config to set up a cloud provider.
Does it upload my books anywhere? With the default Ollama backend: no. With a cloud NLP provider, the book text is sent to that provider's API for the character scan. Nothing else is uploaded.
🙏 Special Thanks
Thanks to Project Gutenberg for providing some of the public-domain books included with kenkui.
Voice Dataset Credits
kenkui's compiled voices are derived from two publicly available speech corpora.
CSTR VCTK Corpus
Veaux, Christoph; Yamagishi, Junichi; MacDonald, Kirsten. (2019). CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).
Licensed under Creative Commons Attribution 4.0 (CC BY 4.0). Commercial use is permitted with attribution.
EARS Dataset
Licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).
Note: Compiled voices sourced from EARS (identifiable by
EARSin the voice name viakentui voices list) may not be used for commercial purposes. If you are building a commercial product with kenkui, use only VCTK-sourced or built-in voices.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kentui-0.1.0.tar.gz.
File metadata
- Download URL: kentui-0.1.0.tar.gz
- Upload date:
- Size: 56.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5c872bf10d5b9f7898a4ff415dc522e6114e6ba70d34cf51c18923ffe3a7451
|
|
| MD5 |
58afa24f7e47bbbc8b100123a21cfe0a
|
|
| BLAKE2b-256 |
c5449242ed2d74cdebc46445bbc510ba63aae2bf5b6a30bd279e29c8ddae24de
|
File details
Details for the file kentui-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kentui-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cb5685cec96553e139b45a0352e1115dcf9a4d80a7b3e248b00e7d64bab1cf1
|
|
| MD5 |
c27af57314418f848f8d488308cd98c7
|
|
| BLAKE2b-256 |
40b196b14fc0be50955d3523ff2f0836168cd55c6c4a3e0621c31108b12b0d5d
|