Skip to main content

AI-powered audiobook generator with GPU/NPU acceleration (up to 8x faster than real-time). Built-in Kokoro-82M TTS with character-aware voices and dialogue detection. Supports EPUB, PDF, TXT, MD, RST (use convertext for DOCX/MOBI/HTML).

Project description

AI Audiobook Generator: CLI Tool with GPU & NPU Acceleration

PyPI Python Downloads

If you like this, please consider supporting via GitHub Sponsors. I created and maintain this alone.

Transform long-form text into professional audiobooks with character-aware voices, dialogue detection, and intelligent processing.

Perfect for novels, articles, textbooks, research papers, and other long-form content that you want to be able to listen to on your own time or offline. Built with Kokoro-82M TTS for production-quality narration. Works on all platforms with optimizations for Apple Silicon (M1/M2/M3/M4 Neural Engine), NVIDIA GPUs, and AMD/Intel GPUs.

โœจ Core Features

โšก High-Performance Conversion

  • Up to 8x faster than real-time on Apple Silicon (M1/M2/M3/M4) with Neural Engine
  • GPU acceleration for NVIDIA (CUDA), AMD/Intel (DirectML on Windows)
  • Efficient CPU processing on all platforms
  • Kokoro-82M engine optimized for speed + quality balance

๐ŸŽญ Character-Aware Narration

  • Automatic character detection in dialogue
  • Auto-assign different voices with automatic gender detection where possible
  • Assigns gender-appropriate voices (e.g., Alice gets af_sarah, Bob gets am_adam)
  • Perfect for fiction, interviews, dialogues, and multi-speaker content

๐Ÿ’พ Checkpoint Resumption

  • Resume interrupted conversions from where you left off
  • Essential for extra-long texts (500+ page books, textbooks, research papers)
  • Reliable production workflow for lengthy content

๐Ÿ“š Chapter Management

  • Automatic chapter detection from EPUB TOC, PDF structure, or text patterns
  • M4B audiobook format with chapter metadata
  • Chapter timestamps and navigation

๐Ÿ“Š Professional Production Tools

  • 4 progress visualization styles: simple, tqdm, rich, timeseries
  • Real-time metrics: processing speed, ETA, completion percentage
  • Batch processing with queue management
  • Multiple output formats: MP3 (48kHz mono optimized by default), WAV, M4A, M4B

๐ŸŽ™๏ธ Production-Quality TTS

  • Kokoro-82M: 54 high-quality neural voices across 9 languages
  • Near-human quality narration
  • Consistent voice throughout long documents
  • No voice cloning overhead

โš–๏ธ Copyright Notice

IMPORTANT: This software is a tool for converting text to audio. Users are solely responsible for:

  • Ensuring they have the legal right to convert any text to audio
  • Obtaining necessary permissions for copyrighted materials
  • Complying with all applicable copyright laws and licensing terms
  • Understanding that creating audiobooks from copyrighted text without authorization may constitute copyright infringement

Recommended Use Cases:

  • โœ… Your own original content
  • โœ… Public domain works
  • โœ… Content you have explicit permission to convert
  • โœ… Educational materials you legally own
  • โœ… Open-source or Creative Commons licensed texts (per their terms)

The developers of audiobook-reader do not condone or support copyright infringement. By using this software, you agree to use it only for content you have the legal right to convert.


๐Ÿ“š Supported Input Formats

EPUB, PDF, TXT, Markdown, ReStructuredText

Need to convert other formats first? Use convertext to convert DOCX, ODT, MOBI, HTML, and other document formats to supported formats like EPUB or TXT.

๐Ÿ“ฆ Installation

Prerequisites

FFmpeg Required - Install before using audiobook-reader:

# macOS
brew install ffmpeg

# Windows
winget install ffmpeg

# Linux
sudo apt install ffmpeg

FFmpeg is required for audio format conversion (MP3, M4A, M4B). Models (~310MB) auto-download on first use.

Using pip (recommended for users)

# Default installation (Kokoro TTS + core features)
pip install audiobook-reader

# With all progress visualizations (tqdm, rich, plotext)
pip install audiobook-reader[progress-full]

# With system monitoring
pip install audiobook-reader[monitoring]

# With everything
pip install audiobook-reader[all]

Hardware Acceleration Options

audiobook-reader works great on all platforms. For maximum performance, enable hardware acceleration:

โœ… Apple Silicon (M1/M2/M3/M4)

Neural Engine (CoreML) works automatically - no additional setup needed!

pip install audiobook-reader
# That's it! CoreML acceleration is built-in

โœ… NVIDIA GPU (Windows/Linux)

Get CUDA acceleration with a simple package swap:

pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-gpu

โœ… AMD/Intel GPU (Windows)

Get DirectML acceleration:

pip install audiobook-reader
pip uninstall onnxruntime
pip install onnxruntime-directml

โœ… CPU Only (All Platforms)

No GPU? No problem! The default installation works efficiently on any CPU:

pip install audiobook-reader
# Works great on Intel, AMD, ARM processors

๐Ÿš€ Quick Start

# 1. Install
pip install audiobook-reader

# 2. Models auto-download on first use (~310MB to ~/.cache/)
#    Or manually: reader download models

# 3. Convert any text file directly
reader convert --file mybook.epub

# 4. Find your audiobook in ~/Downloads/mybook_kokoro_am_michael.mp3

# Choose output location:
reader convert --file mybook.epub --output-dir downloads  # ~/Downloads/ (default)
reader convert --file mybook.epub --output-dir same       # Next to source
reader convert --file mybook.epub --output-dir /custom    # Custom path

๐ŸŽญ Character Voices (Optional)

For books with dialogue, assign different voices to each character:

# Auto-detect characters and generate config
reader characters detect text/mybook.txt --auto-assign

# OR manually create mybook.characters.yaml:
# characters:
#   - name: Alice
#     voice: af_sarah
#     gender: female
#   - name: Bob
#     voice: am_michael
#     gender: male

# Convert with character voices
reader convert --characters --file text/mybook.txt

๐Ÿ Python API (Jupyter Notebooks & Scripts)

For programmatic access in Python scripts or Jupyter notebooks:

import reader

# Simple conversion
output = reader.convert("mybook.epub")
print(f"Audiobook created: {output}")

# Advanced usage
from reader import Reader
r = Reader()
output = r.convert(
    "mybook.epub",
    voice="af_sarah",
    speed=1.2,
    character_voices=True,
    progress_style="tqdm"
)

See Programmatic API for full documentation.

๐Ÿ“– Documentation

๐ŸŽ™๏ธ Command Reference

Basic Conversion

# Convert single file with Neural Engine acceleration
reader convert --file text/book.epub

# Convert with specific voice
reader convert --file text/book.epub --voice am_michael

# Disable text cleanup (keep broken words, bibliography, etc.)
reader convert --file text/book.epub --no-clean-text

# Enable debug mode to see Neural Engine status
reader convert --file text/book.epub --debug

Text Cleanup (Enabled by Default):

  • Fixes broken words: "exam-\nple" โ†’ "example" (common in PDFs)
  • Removes metadata: ISBN lines, book catalogs
  • Skips non-narrative chapters: TOC, Bibliography, Index, "About the Author", "Acknowledgments", etc.
  • Extracts narrative boundaries: Excludes all front/back matter
  • Result: Cleaner audio, faster processing, no mispronunciations or metadata narration
  • Opt-out: Use --no-clean-text to disable

๐Ÿ“Š Progress Visualization Options

# Simple text progress (default)
reader convert --progress-style simple --file "book.epub"

# Professional progress bars with speed metrics
reader convert --progress-style tqdm --file "book.epub"

# Beautiful Rich formatted displays with colors
reader convert --progress-style rich --file "book.epub"

# Real-time ASCII charts showing processing speed
reader convert --progress-style timeseries --file "book.epub"

Configuration Management

# Save permanent settings to config file
reader config --voice am_michael --format mp3 --output-dir downloads

# Set custom default output directory
reader config --output-dir /audiobooks
reader config --output-dir same  # Save next to source files

# List available Kokoro voices
reader voices

# View current configuration
reader config

# View application info and features
reader info

Parameter Hierarchy (How Settings Work)

  1. CLI parameters (highest priority) - temporary overrides, never saved
  2. Config file (middle priority) - your saved preferences
  3. Code defaults (lowest priority) - sensible fallbacks

Example:

# Save your preferred settings
reader config --engine kokoro --voice am_michael --format mp3

# Use temporary override (doesn't change your saved config)
reader convert --voice af_sarah

# Your config file still has kokoro/am_michael/mp3 saved

๐Ÿ“ File Support

Input Formats

Format Extension Chapter Detection
EPUB .epub โœ… Automatic from TOC
PDF .pdf โœ… Page-based
Text .txt โœ… Simple patterns
Markdown .md โœ… Header-based
ReStructuredText .rst โœ… Header-based

Need other formats? Use convertext to convert DOCX, ODT, MOBI, HTML, and more to supported formats.

Output Formats

  • MP3 (default) - 48kHz mono, configurable bitrate (32k-64k, default 48k)
  • WAV - Uncompressed, high quality
  • M4A - Apple-friendly format
  • M4B - Audiobook format with chapter support

๐Ÿ—๏ธ File Locations

Reader uses system-standard directories for clean organization:

Working Files (Temporary):

  • Temp workspace: /tmp/audiobook-reader-{session}/ (auto-cleaned on exit)
  • Session-specific, isolated from your files
  • Automatically removed when conversion completes

Persistent Data:

  • Models: ~/.cache/audiobook-reader/models/ (~310MB, shared across all conversions)
  • Config: ~/.config/audiobook-reader/ (settings and character mappings)

Output Files (Your Audiobooks):

  • Default: ~/Downloads/ (configurable)
  • Options:
    • --output-dir downloads โ†’ ~/Downloads/
    • --output-dir same โ†’ Next to source file
    • --output-dir /custom/path โ†’ Custom location

No directory pollution - only your final audiobooks appear in the output location!

๐ŸŽจ Example Workflows

Simple Book Conversion

# Convert any book directly
reader convert --file "My Novel.epub"

# Result: ~/Downloads/My Novel_kokoro_am_michael.mp3

# Or output next to source file
reader convert --file "My Novel.epub" --output-dir same

# Result: My Novel_kokoro_am_michael.mp3 (in same directory as source)

Voice Comparison

# Test different Kokoro voices on same content
reader convert --voice af_sarah --file text/sample.txt
reader convert --voice am_adam --file text/sample.txt
reader convert --voice bf_emma --file text/sample.txt

# Compare finished/sample_*.mp3 outputs

Batch Processing

# Convert multiple files with custom output location
reader convert --file book1.epub --output-dir /audiobooks
reader convert --file book2.pdf --output-dir /audiobooks
reader convert --file story.txt --output-dir /audiobooks

# Results: /audiobooks/book1_*.mp3, /audiobooks/book2_*.mp3, /audiobooks/story_*.mp3

# Or set default output directory in config
reader config --output-dir /audiobooks
reader convert --file book1.epub  # โ†’ /audiobooks/

โš™๏ธ Configuration

Settings are saved to ~/.config/audiobook-reader/settings.yaml:

tts:
  engine: kokoro           # TTS engine (Kokoro)
  voice: am_michael        # Default voice
  speed: 1.0               # Speech rate multiplier
  volume: 1.0              # Volume level
audio:
  format: mp3              # Output format (mp3, wav, m4a, m4b)
  bitrate: 48k             # MP3 bitrate (32k-64k typical for audiobooks)
  add_metadata: true       # Metadata support
processing:
  chunk_size: 400          # Text chunk size for processing (Kokoro optimal)
  auto_detect_chapters: true  # Chapter detection
output_dir: downloads      # Output location: "downloads", "same", or path

๐ŸŽฏ Quick Examples

See docs/EXAMPLES.md for detailed examples including:

  • Voice testing and selection
  • PDF processing workflows
  • Markdown chapter handling
  • Batch processing scripts
  • Configuration optimization

๐Ÿ“Š Technical Specs

  • TTS Engine: Kokoro-82M (82M parameters, Apache 2.0 license)
  • Model Size: ~310MB ONNX models (auto-downloaded on first use to cache)
  • Model Cache: Follows XDG standard (~/.cache/audiobook-reader/models/)
  • Python: 3.10-3.13 compatibility
  • Platforms: macOS, Linux, Windows (all fully supported)
  • Audio Quality: 48kHz mono MP3, configurable bitrate (32k-64k, default 48k)
  • Hardware Acceleration:
    • โœ… Apple Silicon (M1/M2/M3/M4): CoreML (Neural Engine) - automatic
    • โœ… NVIDIA GPUs: CUDA via onnxruntime-gpu
    • โœ… AMD/Intel GPUs: DirectML on Windows
    • โœ… CPU: Works efficiently on all processors
  • Performance: Hardware-accelerated on all major platforms
  • Memory: Efficient streaming processing for large books

๐ŸŽต Audio Quality

Kokoro TTS (primary engine):

  • โœ… Near-human quality neural voices
  • โœ… 54 voices across 9 languages (American/British English, Spanish, French, Italian, Portuguese, Japanese, Chinese, Hindi)
  • โœ… Apple Neural Engine acceleration
  • โœ… Professional audiobook production
  • โœ… Consistent narration (no hallucinations)

๐Ÿ”ง Troubleshooting

FFmpeg Not Found

Error: FFmpeg not found or Command 'ffmpeg' not found

Solution:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html
# Or use: choco install ffmpeg

Models Not Downloading

Error: Failed to download Kokoro models

Solution: Models auto-download on first use (~310MB). If automatic download fails:

# Download to system cache (default)
reader download models

# Download to local models/ folder (permanent storage)
reader download models --local

# Force re-download
reader download models --force

Model & File Locations:

  • Models: ~/.cache/audiobook-reader/models/ (all platforms, ~310MB)
  • Config: ~/.config/audiobook-reader/ (settings and character mappings)
  • Temp Files: /tmp/audiobook-reader-{session}/ (auto-cleaned on exit)
  • Output: ~/Downloads/ by default (configurable with --output-dir)

Neural Engine Not Detected (Apple Silicon)

Error: Neural Engine not available, using CPU

Solution:

  • Ensure you're on Apple Silicon (M1/M2/M3/M4 Mac)
  • Update macOS to latest version
  • Reinstall onnxruntime: pip uninstall onnxruntime && pip install onnxruntime
  • CPU processing works fine but is slower than GPU/NPU

Permission Errors

Error: Permission denied when creating directories

Solution:

# Ensure write permissions in project directory
chmod -R u+w /path/to/reader

# Or run from a directory you own
cd ~/Documents
git clone https://github.com/danielcorsano/reader.git
cd reader

Import Errors

Error: ModuleNotFoundError: No module named 'kokoro_onnx'

Solution:

# Reinstall package
pip install --force-reinstall audiobook-reader

Invalid Input Format

Error: Unsupported file format

Supported formats: .epub, .pdf, .txt, .md, .rst

Solution:

# Use convertext to convert other formats first
pip install convertext
convertext document.docx --format epub  # DOCX to EPUB
convertext book.mobi --format epub      # MOBI to EPUB
convertext file.html --format txt       # HTML to TXT

# Then convert to audiobook
reader convert --file document.epub

GPU Acceleration Issues

NVIDIA GPU: Requires onnxruntime-gpu instead of onnxruntime

pip uninstall onnxruntime
pip install onnxruntime-gpu

AMD/Intel GPU (Windows): Requires onnxruntime-directml

pip uninstall onnxruntime
pip install onnxruntime-directml

Still Having Issues?

  • Check the GitHub Issues
  • Run with debug mode: reader convert --debug --file yourfile.txt
  • Verify Python version: python --version (requires 3.10-3.13)

๐Ÿ“œ Credits

Kokoro TTS Model

This project uses the Kokoro-82M text-to-speech model by hexgrad, licensed under Apache 2.0.

Model Credits:

  • Original Model: hexgrad/Kokoro-82M (Apache 2.0)
  • ONNX Wrapper: kokoro-onnx by thewh1teagle (MIT)
  • Training datasets: Koniwa (CC BY 3.0), SIWIS (CC BY 4.0)

๐Ÿ’ Support This Project

If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.

Your support makes a real difference in keeping this project active and growing. Thank you!

License

This tool is licensed under the MIT License. See LICENSE file for details.

Ready to create your first audiobook? Check out the Usage Guide for step-by-step instructions!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiobook_reader-0.2.0.tar.gz (83.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiobook_reader-0.2.0-py3-none-any.whl (92.2 kB view details)

Uploaded Python 3

File details

Details for the file audiobook_reader-0.2.0.tar.gz.

File metadata

  • Download URL: audiobook_reader-0.2.0.tar.gz
  • Upload date:
  • Size: 83.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for audiobook_reader-0.2.0.tar.gz
Algorithm Hash digest
SHA256 26931811ae65e7c17f47ab8254a9fd2222cd0099f906cc3e67cd5a0a208ca956
MD5 60e5d5f01784f9a8d2c689ed2875a840
BLAKE2b-256 cfef41ecb628ef9b57dffca64311a5e518f8478779c9d8ef49fb7ad3959e5800

See more details on using hashes here.

File details

Details for the file audiobook_reader-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: audiobook_reader-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 92.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for audiobook_reader-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d149c64e4960417f3d67dc37eeff3ff56ae921c55edf06e2beb284f2a8285bb
MD5 1363ff41b6d0beccbe6ccbd2f2bfff2d
BLAKE2b-256 01df16d9c5fb4140c7785383d948e6f08e06db7bd31bab96df145a4631cd6afc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page