AI-powered audiobook generator with GPU/NPU acceleration (up to 8x faster than real-time). Built-in Kokoro-82M TTS with character-aware voices and dialogue detection. Supports EPUB, PDF, TXT, MD, RST (use convertext for DOCX/MOBI/HTML).

These details have not been verified by PyPI

Project links

Project description

AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration

If you like this, please consider supporting via GitHub Sponsors. I created and maintain this alone.

Transform long-form text into professional audiobooks with character-aware voices, dialogue detection, and intelligent processing.

Perfect for novels, articles, textbooks, research papers, and other long-form content that you want to be able to listen to on your own time or offline. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.

✨ Core Features

⚡ High-Performance Conversion

Up to 8x faster than real-time on Apple Silicon (M1/M2/M3/M4) with Neural Engine
GPU acceleration for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)
Efficient CPU processing on all platforms
Kokoro-82M engine optimized for speed + quality balance

🎭 Character-Aware Narration

Automatic character detection in dialogue
Auto-assign different voices with automatic gender detection where possible
Assigns gender-appropriate voices (e.g., Alice gets af_sarah, Bob gets am_adam)
Perfect for fiction, interviews, dialogues, and multi-speaker content

💾 Checkpoint Resumption

Resume interrupted conversions from where you left off
Essential for extra-long texts (500+ page books, textbooks, research papers)
Reliable production workflow for lengthy content

📚 Chapter Management

Automatic chapter detection from EPUB TOC, PDF structure, or text patterns
M4B audiobook format with chapter metadata
Chapter timestamps and navigation

📊 Professional Production Tools

4 progress visualization styles: simple, tqdm, rich, timeseries
Real-time metrics: processing speed, ETA, completion percentage
Batch processing with queue management
Multiple output formats: MP3 (48kHz mono optimized by default), WAV, M4A, M4B

🎙️ Production-Quality TTS

Kokoro-82M: 54 high-quality neural voices across 9 languages
Near-human quality narration
Consistent voice throughout long documents
No voice cloning overhead

⚖️ Copyright Notice

IMPORTANT: This software is a tool for converting text to audio. Users are solely responsible for:

Ensuring they have the legal right to convert any text to audio
Obtaining necessary permissions for copyrighted materials
Complying with all applicable copyright laws and licensing terms
Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement

Recommended Use Cases:

✅ Your own original content
✅ Public domain works
✅ Content you have explicit permission to convert
✅ Educational materials you legally own
✅ Open-source or Creative Commons licensed texts (per their terms)

The developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.

📚 Supported Input Formats

EPUB, PDF, TXT, Markdown, ReStructuredText

Need to convert other formats first? Use convertext to convert DOCX, ODT, MOBI, HTML, and other document formats to supported formats like EPUB or TXT.

📦 Installation

Prerequisites

FFmpeg Required - Install before using audiobook-reader:

# macOS
brew install ffmpeg

# Windows
winget install ffmpeg

# Linux
sudo apt install ffmpeg

FFmpeg is required for audio format conversion (MP3, M4A, M4B). Models (~310MB) auto-download on first use.

Using pip (recommended for users)

# Default installation (Kokoro TTS + core features)
pip install audiobook-reader

# With all progress visualizations (tqdm, rich, plotext)
pip install audiobook-reader[progress-full]

# With system monitoring
pip install audiobook-reader[monitoring]

# With everything
pip install audiobook-reader[all]

Hardware Acceleration Options

audiobook-reader works great on all platforms. For maximum performance, enable hardware acceleration:

✅ Apple Silicon (M1/M2/M3/M4)

Neural Engine (CoreML) works automatically - no additional setup needed!

pip install audiobook-reader
# That's it! CoreML acceleration is built-in

✅ NVIDIA GPU (Windows/Linux)

Get CUDA acceleration with a simple package swap:

pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-gpu

✅ AMD/Intel GPU (Windows)

Get DirectML acceleration:

pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-directml

✅ CPU Only (All Platforms)

No GPU? No problem! The default installation works efficiently on any CPU:

pip install audiobook-reader
# Works great on Intel, AMD, ARM processors

🚀 Quick Start

# 1. Install
pip install audiobook-reader

# 2. Models auto-download on first use (~310MB to ~/.cache/)
#    Or manually: reader download models

# 3. Convert any text file directly
reader convert --file mybook.epub

# 4. Find your audiobook in ~/Downloads/mybook_kokoro_am_michael.mp3

# Choose output location:
reader convert --file mybook.epub --output-dir downloads  # ~/Downloads/ (default)
reader convert --file mybook.epub --output-dir same       # Next to source
reader convert --file mybook.epub --output-dir /custom    # Custom path

🎭 Character Voices (Optional)

For books with dialogue, assign different voices to each character:

# Auto-detect characters and generate config
reader characters detect text/mybook.txt --auto-assign

# OR manually create mybook.characters.yaml:
# characters:
#   - name: Alice
#     voice: af_sarah
#     gender: female
#   - name: Bob
#     voice: am_michael
#     gender: male

# Convert with character voices
reader convert --characters --file text/mybook.txt

🐍 Python API (Jupyter Notebooks & Scripts)

For programmatic access in Python scripts or Jupyter notebooks:

import reader

# Simple conversion
output = reader.convert("mybook.epub")
print(f"Audiobook created: {output}")

# Advanced usage
from reader import Reader
r = Reader()
output = r.convert(
    "mybook.epub",
    voice="af_sarah",
    speed=1.2,
    character_voices=True,
    progress_style="tqdm"
)

See Programmatic API for full documentation.

📖 Documentation

Usage Guide - Complete command reference and workflows
Programmatic API - Python API for Jupyter notebooks and scripts
Examples - Real-world examples and use cases
Advanced Features - Professional audiobook production features
Kokoro Setup - Neural TTS model setup guide

🎙️ Command Reference

Basic Conversion

# Convert single file with Neural Engine acceleration
reader convert --file text/book.epub

# Convert with specific voice
reader convert --file text/book.epub --voice am_michael

# Disable text cleanup (keep broken words, bibliography, etc.)
reader convert --file text/book.epub --no-clean-text

# Enable debug mode to see Neural Engine status
reader convert --file text/book.epub --debug

Text Cleanup (Enabled by Default):

Fixes broken words: "exam-\nple" → "example" (common in PDFs)
Removes metadata: ISBN lines, book catalogs
Skips non-narrative chapters: TOC, Bibliography, Index, "About the Author", "Acknowledgments", etc.
Extracts narrative boundaries: Excludes all front/back matter
Result: Cleaner audio, faster processing, no mispronunciations or metadata narration
Opt-out: Use --no-clean-text to disable

📊 Progress Visualization Options

# Simple text progress (default)
reader convert --progress-style simple --file "book.epub"

# Professional progress bars with speed metrics
reader convert --progress-style tqdm --file "book.epub"

# Beautiful Rich formatted displays with colors
reader convert --progress-style rich --file "book.epub"

# Real-time ASCII charts showing processing speed
reader convert --progress-style timeseries --file "book.epub"

Configuration Management

# Save permanent settings to config file
reader config --voice am_michael --format mp3 --output-dir downloads

# Set custom default output directory
reader config --output-dir /audiobooks
reader config --output-dir same  # Save next to source files

# List available Kokoro voices
reader voices

# View current configuration
reader config

# View application info and features
reader info

Parameter Hierarchy (How Settings Work)

CLI parameters (highest priority) - temporary overrides, never saved
Config file (middle priority) - your saved preferences
Code defaults (lowest priority) - sensible fallbacks

Example:

# Save your preferred settings
reader config --engine kokoro --voice am_michael --format mp3

# Use temporary override (doesn't change your saved config)
reader convert --voice af_sarah

# Your config file still has kokoro/am_michael/mp3 saved

📁 File Support

Input Formats

Format	Extension	Chapter Detection
EPUB	`.epub`	✅ Automatic from TOC
PDF	`.pdf`	✅ Page-based
Text	`.txt`	✅ Simple patterns
Markdown	`.md`	✅ Header-based
ReStructuredText	`.rst`	✅ Header-based

Need other formats? Use convertext to convert DOCX, ODT, MOBI, HTML, and more to supported formats.

Output Formats

MP3 (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)
WAV - Uncompressed, high quality
M4A - Apple-friendly format
M4B - Audiobook format with chapter support

🏗️ File Locations

Reader uses system-standard directories for clean organization:

Working Files (Temporary):

Temp workspace: /tmp/audiobook-reader-{session}/ (auto-cleaned on exit)
Session-specific, isolated from your files
Automatically removed when conversion completes

Persistent Data:

Models: ~/.cache/audiobook-reader/models/ (~310MB, shared across all conversions)
Config: ~/.config/audiobook-reader/ (settings and character mappings)

Output Files (Your Audiobooks):

Default: ~/Downloads/ (configurable)
Options:
- --output-dir downloads → ~/Downloads/
- --output-dir same → Next to source file
- --output-dir /custom/path → Custom location

No directory pollution - only your final audiobooks appear in the output location!

🎨 Example Workflows

Simple Book Conversion

# Convert any book directly
reader convert --file "My Novel.epub"

# Result: ~/Downloads/My Novel_kokoro_am_michael.mp3

# Or output next to source file
reader convert --file "My Novel.epub" --output-dir same

# Result: My Novel_kokoro_am_michael.mp3 (in same directory as source)

Voice Comparison

# Test different Kokoro voices on same content
reader convert --voice af_sarah --file text/sample.txt
reader convert --voice am_adam --file text/sample.txt
reader convert --voice bf_emma --file text/sample.txt

# Compare finished/sample_*.mp3 outputs

Batch Processing

# Convert multiple files with custom output location
reader convert --file book1.epub --output-dir /audiobooks
reader convert --file book2.pdf --output-dir /audiobooks
reader convert --file story.txt --output-dir /audiobooks

# Results: /audiobooks/book1_*.mp3, /audiobooks/book2_*.mp3, /audiobooks/story_*.mp3

# Or set default output directory in config
reader config --output-dir /audiobooks
reader convert --file book1.epub  # → /audiobooks/

⚙️ Configuration

Settings are saved to ~/.config/audiobook-reader/settings.yaml:

tts:
  engine: kokoro           # TTS engine (Kokoro)
  voice: am_michael        # Default voice
  speed: 1.0               # Speech rate multiplier
  volume: 1.0              # Volume level
audio:
  format: mp3              # Output format (mp3, wav, m4a, m4b)
  bitrate: 48k             # MP3 bitrate (32k-64k typical for audiobooks)
  add_metadata: true       # Metadata support
processing:
  chunk_size: 400          # Text chunk size for processing (Kokoro optimal)
  auto_detect_chapters: true  # Chapter detection
output_dir: downloads      # Output location: "downloads", "same", or path

🎯 Quick Examples

See docs/EXAMPLES.md for detailed examples including:

Voice testing and selection
PDF processing workflows
Markdown chapter handling
Batch processing scripts
Configuration optimization

📊 Technical Specs

TTS Engine: Kokoro-82M (82M parameters, Apache 2.0 license)
Model Size: ~310MB ONNX models (auto-downloaded on first use to cache)
Model Cache: Follows XDG standard (~/.cache/audiobook-reader/models/)
Python: 3.10-3.13 compatibility
Platforms: macOS, Linux, Windows (all fully supported)
Audio Quality: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)
Hardware Acceleration:
- ✅ Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic
- ✅ NVIDIA GPUs: CUDA via onnxruntime-gpu
- ✅ AMD/Intel GPUs: DirectML on Windows
- ✅ CPU: Works efficiently on all processors
Performance: Hardware-accelerated on all major platforms
Memory: Efficient streaming processing for large books

🎵 Audio Quality

Kokoro TTS (primary engine):

✅ Near-human quality neural voices
✅ 54 voices across 9 languages (American/British English, Spanish, French, Italian, Portuguese, Japanese, Chinese, Hindi)
✅ Apple Neural Engine acceleration
✅ Professional audiobook production
✅ Consistent narration (no hallucinations)

🔧 Troubleshooting

FFmpeg Not Found

Error: FFmpeg not found or Command 'ffmpeg' not found

Solution:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg

Models Not Downloading

Error: Failed to download Kokoro models

Solution: Models auto-download on first use (~310MB). If automatic download fails:

# Download to system cache (default)
reader download models

# Download to local models/ folder (permanent storage)
reader download models --local

# Force re-download
reader download models --force

Model & File Locations:

Models: ~/.cache/audiobook-reader/models/ (all platforms, ~310MB)
Config: ~/.config/audiobook-reader/ (settings and character mappings)
Temp Files: /tmp/audiobook-reader-{session}/ (auto-cleaned on exit)
Output: ~/Downloads/ by default (configurable with --output-dir)

Neural Engine Not Detected (Apple Silicon)

Error: Neural Engine not available, using CPU

Solution:

Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)
Update macOS to latest version
Reinstall onnxruntime: pip uninstall onnxruntime && pip install onnxruntime
CPU processing works fine but is slower than GPU/NPU

Permission Errors

Error: Permission denied when creating directories

Solution:

# Ensure write permissions in project directory
chmod -R u+w /path/to/reader

# Or run from a directory you own
cd ~/Documents
git clone https://github.com/danielcorsano/reader.git
cd reader

Import Errors

Error: ModuleNotFoundError: No module named 'kokoro_onnx'

Solution:

# Reinstall package
pip install --force-reinstall audiobook-reader

Invalid Input Format

Error: Unsupported file format

Supported formats: .epub, .pdf, .txt, .md, .rst

Solution:

# Use convertext to convert other formats first
pip install convertext
convertext document.docx --format epub  # DOCX to EPUB
convertext book.mobi --format epub      # MOBI to EPUB
convertext file.html --format txt       # HTML to TXT

# Then convert to audiobook
reader convert --file document.epub

GPU Acceleration Issues

NVIDIA GPU: Requires onnxruntime-gpu instead of onnxruntime

pip uninstall onnxruntime
pip install onnxruntime-gpu

AMD/Intel GPU (Windows): Requires onnxruntime-directml

pip uninstall onnxruntime
pip install onnxruntime-directml

Still Having Issues?

Check the GitHub Issues
Run with debug mode: reader convert --debug --file yourfile.txt
Verify Python version: python --version (requires 3.10-3.13)

📜 Credits

Kokoro TTS Model

This project uses the Kokoro-82M text-to-speech model by hexgrad, licensed under Apache 2.0.

Model Credits:

Original Model: hexgrad/Kokoro-82M (Apache 2.0)
ONNX Wrapper: kokoro-onnx by thewh1teagle (MIT)
Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)

💝 Support This Project

If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.

Your support makes a real difference in keeping this project active and growing. Thank you!

License

This tool is licensed under the MIT License. See LICENSE file for details.

Ready to create your first audiobook? Check out the Usage Guide for step-by-step instructions!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Apr 26, 2026

0.4.0

Apr 22, 2026

0.3.1

Apr 1, 2026

0.3.0

Feb 9, 2026

0.2.2

Nov 26, 2025

0.2.1

Nov 22, 2025

This version

0.2.0

Nov 18, 2025

0.1.9

Oct 29, 2025

0.1.8

Oct 24, 2025

0.1.7

Oct 18, 2025

0.1.6

Oct 13, 2025

0.1.5

Oct 13, 2025

0.1.4

Oct 13, 2025

0.1.3

Oct 11, 2025

0.1.2

Oct 10, 2025

0.1.1

Oct 9, 2025

0.1.0

Oct 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiobook_reader-0.2.0.tar.gz (83.0 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audiobook_reader-0.2.0-py3-none-any.whl (92.2 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file audiobook_reader-0.2.0.tar.gz.

File metadata

Download URL: audiobook_reader-0.2.0.tar.gz
Upload date: Nov 18, 2025
Size: 83.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for audiobook_reader-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`26931811ae65e7c17f47ab8254a9fd2222cd0099f906cc3e67cd5a0a208ca956`
MD5	`60e5d5f01784f9a8d2c689ed2875a840`
BLAKE2b-256	`cfef41ecb628ef9b57dffca64311a5e518f8478779c9d8ef49fb7ad3959e5800`

See more details on using hashes here.

File details

Details for the file audiobook_reader-0.2.0-py3-none-any.whl.

File metadata

Download URL: audiobook_reader-0.2.0-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 92.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for audiobook_reader-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d149c64e4960417f3d67dc37eeff3ff56ae921c55edf06e2beb284f2a8285bb`
MD5	`1363ff41b6d0beccbe6ccbd2f2bfff2d`
BLAKE2b-256	`01df16d9c5fb4140c7785383d948e6f08e06db7bd31bab96df145a4631cd6afc`

See more details on using hashes here.

audiobook-reader 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration

✨ Core Features

⚡ High-Performance Conversion

🎭 Character-Aware Narration

💾 Checkpoint Resumption

📚 Chapter Management

📊 Professional Production Tools

🎙️ Production-Quality TTS

⚖️ Copyright Notice

📚 Supported Input Formats

📦 Installation

Prerequisites

Using pip (recommended for users)

Hardware Acceleration Options

✅ Apple Silicon (M1/M2/M3/M4)

✅ NVIDIA GPU (Windows/Linux)

✅ AMD/Intel GPU (Windows)

✅ CPU Only (All Platforms)

🚀 Quick Start

🎭 Character Voices (Optional)

🐍 Python API (Jupyter Notebooks & Scripts)

📖 Documentation

🎙️ Command Reference

Basic Conversion

📊 Progress Visualization Options

Configuration Management

Parameter Hierarchy (How Settings Work)

📁 File Support

Input Formats

Output Formats

🏗️ File Locations

🎨 Example Workflows

Simple Book Conversion

Voice Comparison

Batch Processing

⚙️ Configuration

🎯 Quick Examples

📊 Technical Specs

🎵 Audio Quality

🔧 Troubleshooting

FFmpeg Not Found

Models Not Downloading

Neural Engine Not Detected (Apple Silicon)

Permission Errors

Import Errors

Invalid Input Format

GPU Acceleration Issues

Still Having Issues?

📜 Credits

Kokoro TTS Model

💝 Support This Project

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes