Convert ebooks and PDFs to audiobooks using AI text-to-speech

These details have not been verified by PyPI

Project description

Audify

Convert ebooks and PDFs to audiobooks using AI text-to-speech and translation services.

Audify is a pipeline and REST API that transforms written content into high-quality audio using:

Multiple TTS Providers - Choose from Kokoro (local), Qwen-TTS (local), OpenAI, AWS Polly, or Google Cloud TTS
Ollama + LiteLLM for intelligent translation
LLM-powered audiobook generation for engaging audio content

🚀 Features

📚 Multiple Formats: Convert EPUB ebooks, PDF documents, TXT, and MD files
📁 Directory Processing: Create audiobooks from multiple files in a directory
🎙️ Audiobook Creation: Generate audiobook-style content from books using LLM
🎛️ Flexible Task System: Transform content into audiobooks, podcasts, summaries, meditations, or custom styles
🌐 REST API: HTTP API for programmatic synthesis and audiobook creation
🔒 Multiple TTS Providers: Choose from Kokoro (local), Qwen-TTS (local), OpenAI, AWS Polly, or Google Cloud TTS
🌍 Multi-language Support: Translate content
🎵 High-Quality TTS: Natural-sounding speech with multiple provider options
⚙️ Flexible Configuration: Environment-based settings and .keys file support

📋 Prerequisites

Core Requirements

Python 3.10-3.13
UV package manager (installation guide)

For Local TTS Providers (Optional)

Kokoro TTS

Docker & Docker Compose (for API services)
CUDA-capable GPU (recommended for optimal performance)

Qwen-TTS

Qwen-TTS API Server running on port 8890 (see Qwen3-TTS)
CUDA-capable GPU (recommended for optimal performance)

For Cloud TTS Providers (Optional)

OpenAI TTS: OpenAI API key (get one here)
AWS Polly: AWS account with access keys (AWS setup)
Google Cloud TTS: Google Cloud project with credentials (GCP setup)

📦 Installation as a command-line tool

You can install Audify as a standalone command-line tool using pip or uv:

pip install audify-cli

Or using uv (recommended):

uv pip install audify-cli

This will install the audify command with subcommands:

audify run: Basic TTS conversion of EPUB/PDF files
audify audiobook: LLM-powered audiobook generation

Alternatively, you can use the direct commands:

audify-run: Alias for audify run
audify-audiobook: Alias for audify audiobook

After installation, you can run audify --help to see available options.

🐳 Quick Start with Docker (For Kokoro TTS)

Note: Docker is only required if you want to use the local Kokoro TTS provider. For Qwen-TTS, you'll need to run the Qwen-TTS API separately (see Qwen-TTS Setup below). You can skip to "Quick Start with Cloud TTS" if you prefer using OpenAI, AWS Polly, or Google Cloud TTS.

1. Clone and Setup

git clone https://github.com/garciadias/audify.git
cd audify

2. Start API Services

# Start Kokoro TTS and Ollama services
docker compose up -d

# Wait for services to be ready (~2-3 minutes)
# Check status: docker compose ps

3. Install Python Dependencies

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

4. Setup Ollama Models

# Pull required models for translation and audiobook generation
docker compose exec ollama ollama pull qwen3:30b

# Or use lighter models for testing:
# docker compose exec ollama ollama pull llama3.2:3b

5. Convert Your First Book

# Convert EPUB to audiobook (using Kokoro TTS)
task run path/to/your/book.epub

# Convert PDF to audiobook
task run path/to/your/document.pdf

# Create audiobook from EPUB
task audiobook path/to/your/book.epub

🚀 Quick Start with Qwen-TTS (Local)

Qwen-TTS is a high-quality, free, and privacy-friendly local TTS solution with excellent multilingual support.

1. Setup Qwen-TTS API

First, set up the Qwen-TTS API server (requires GPU):

# Clone Qwen-TTS API repository
git clone https://github.com/QwenLM/Qwen3-TTS
cd Qwen3-TTS

# Start with Docker (recommended)
make up

# The API will be available at http://localhost:8890

For detailed setup instructions, see the Qwen3-TTS documentation.

2. Install Audify

git clone https://github.com/garciadias/audify.git
cd audify
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

3. Configure Qwen-TTS

Create a .keys file:

TTS_PROVIDER=qwen
QWEN_API_URL=http://localhost:8890
QWEN_TTS_VOICE=Vivian

4. Convert Your First Book

# Convert using Qwen-TTS
task run path/to/your/book.epub

# Or specify provider explicitly
task --tts-provider qwen run path/to/your/book.epub

🚀 Quick Start with Cloud TTS

If you prefer to use cloud TTS providers without Docker:

1. Clone and Install

git clone https://github.com/garciadias/audify.git
cd audify
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

2. Configure Your TTS Provider

Create a .keys file with your credentials:

cp .keys.example .keys
# Edit .keys and add your provider credentials
# See Configuration section for details

3. Convert Books with Cloud TTS

# Using OpenAI TTS
task --tts-provider openai run "book.epub"

# Using AWS Polly
task --tts-provider aws run "book.epub"

# Using Google Cloud TTS
task --tts-provider google run "book.epub"

📖 Usage Examples

Basic Audiobook Conversion

# English EPUB to audiobook
task run "book.epub"

# PDF with specific language
task --language pt run "document.pdf"

# With translation (English to Spanish)
task --language en --translate es run "book.epub"

Audiobook Generation

# Create audiobook from EPUB
task audiobook "book.epub"

# Limit to first 5 chapters
task audiobook "book.epub" --max-chapters 5

# Custom voice and language
task audiobook "book.epub" --voice af_bella --language en

# With translation
task audiobook "book.epub" --translate pt

Task System (Audiobook Styles)

Choose different transformation styles using the --task option or provide custom prompts:

# Podcast-style narration
task audiobook "book.epub" --task podcast

# Concise summary
task audiobook "book.epub" --task summary

# Guided meditation
task audiobook "book.epub" --task meditation

# Classroom lecture
task audiobook "book.epub" --task lecture

# Custom prompt file
task audiobook "book.epub" --prompt-file my-prompt.txt

# List available tasks
audify list-tasks

# Validate a custom prompt file
audify validate-prompt my-prompt.txt

See Tasks Guide for details on creating custom prompts.

Using Commercial APIs (DeepSeek, Claude, GPT-4, Gemini)

Instead of local Ollama models, you can use commercial APIs for better quality or faster processing:

# Using DeepSeek (cost-effective)
task audiobook "book.epub" -m "api:deepseek/deepseek-chat"

# Using Claude 3.5 Sonnet (high quality)
task audiobook "book.epub" -m "api:anthropic/claude-3-5-sonnet-20240620"

# Using GPT-4 (reliable)
task audiobook "book.epub" -m "api:openai/gpt-4-turbo-preview"

# Using Gemini Pro
task audiobook "book.epub" -m "api:gemini/gemini-1.5-pro"

Setup Required: Create a .keys file with your API keys for the provider(s) you intend to use. See Commercial APIs Guide for detailed instructions.

# Copy example file and add your keys
cp .keys.example .keys
# Edit .keys and add keys for your chosen provider(s):
# DEEPSEEK=your-deepseek-api-key-here
# ANTHROPIC=your-anthropic-api-key-here
# OPENAI=your-openai-api-key-here
# GEMINI=your-google-api-key-here

Directory Input (Multi-file Processing)

Process multiple files from a directory into a single audiobook:

# Create audiobook from directory of files
task audiobook "path/to/directory/"

# Process directory with translation
task --translate es audiobook "path/to/articles/" 

# Directory with custom voice
task --voice af_bella --language en audiobook "path/to/papers/"

Supported file types in directory: EPUB, PDF, TXT, MD

The directory mode will:

Process each file as a separate episode
Use the filename as the episode title
Combine all episodes into a single M4B audiobook with chapter markers
Synthesize the title audio for each episode

Advanced Options

# List available languages
task run --list-languages

# List available TTS models
task --list-models run

# Save extracted text
task --save-text run "book.epub"

# Skip confirmation prompts
task -y run "book.epub"

# Use different TTS provider
task --tts-provider openai run "book.epub"    # OpenAI TTS
task --tts-provider aws run "book.epub"       # AWS Polly
task --tts-provider google run "book.epub"    # Google Cloud TTS
task --tts-provider qwen run "book.epub"      # Qwen-TTS (local)

# List available TTS providers
task --list-tts-providers run

# List available tasks
audify list-tasks

# Validate a custom prompt file
audify validate-prompt my-prompt.txt

⚙️ Configuration

TTS Provider Configuration

Audify supports multiple TTS providers. Configure your preferred provider using environment variables or a .keys file:

Option 1: Using `.keys` File (Recommended)

Create a .keys file in the project root:

cp .keys.example .keys

Edit .keys and add your credentials:

# OpenAI TTS
OPENAI_API_KEY=sk-your-openai-api-key
OPENAI_TTS_MODEL=tts-1-hd
OPENAI_TTS_VOICE=alloy

# AWS Polly
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
AWS_POLLY_VOICE=Joanna
AWS_POLLY_ENGINE=neural

# Google Cloud TTS
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
GOOGLE_TTS_VOICE=en-US-Chirp-HD-F
GOOGLE_TTS_LANGUAGE_CODE=en-US

# Qwen-TTS (Local)
QWEN_API_URL=http://localhost:8890
QWEN_TTS_VOICE=Vivian

# Default TTS Provider
TTS_PROVIDER=kokoro  # Options: kokoro, qwen, openai, aws, google

Option 2: Environment Variables

# Kokoro TTS API (Local)
export KOKORO_API_URL="http://localhost:8887/v1/audio"

# OpenAI TTS
export OPENAI_API_KEY="sk-your-key"
export OPENAI_TTS_MODEL="tts-1-hd"  # or "tts-1"
export OPENAI_TTS_VOICE="alloy"     # alloy, echo, fable, onyx, nova, shimmer

# AWS Polly
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_REGION="us-east-1"
export AWS_POLLY_VOICE="Joanna"     # Neural voices recommended
export AWS_POLLY_ENGINE="neural"    # "standard" or "neural"

# Google Cloud TTS
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
export GOOGLE_TTS_VOICE="en-US-Chirp-HD-F"
export GOOGLE_TTS_LANGUAGE_CODE="en-US"

# Qwen-TTS (Local)
export QWEN_API_URL="http://localhost:8890"
export QWEN_TTS_VOICE="Vivian"

# Default Provider
export TTS_PROVIDER="kokoro"  # Options: kokoro, qwen, openai, aws, google

# Ollama Configuration
export OLLAMA_API_BASE_URL="http://localhost:11434"
export OLLAMA_TRANSLATION_MODEL="qwen3:30b"
export OLLAMA_MODEL="magistral:24b"

Choosing a TTS Provider

Provider	Pros	Cons	Best For
Kokoro (Local)	Free, privacy-friendly, GPU-accelerated	Requires local setup	Development, privacy-sensitive projects
Qwen-TTS (Local)	Free, privacy-friendly, GPU-accelerated, multilingual	Requires separate API setup	Multilingual projects, privacy-sensitive content
OpenAI	High quality, easy setup	Pay per character	Production, high-quality output
AWS Polly	Neural voices, scalable	AWS account required	Enterprise, AWS-integrated projects
Google Cloud TTS	Natural voices, many languages	GCP account required	Multi-language projects

Docker Services

The docker-compose.yml configures (only needed for local/Kokoro TTS):

Kokoro TTS: Port 8887 (GPU-accelerated speech synthesis, local)
Ollama: Port 11434 (LLM for translation and audiobook generation, optional)
Audify API: Port 8000 (REST API server, starts after Kokoro and Ollama are healthy)

The api service waits for Kokoro and Ollama to pass their healthchecks before starting, so services are always ready when the API accepts requests.

Note: Docker services are only required for Kokoro (local TTS). Commercial TTS providers (OpenAI, AWS, Google) and LLM APIs (DeepSeek, Claude, GPT-4, Gemini) work without Docker.

📁 Output Structure

data/output/
├── [book_name]/
│   ├── chapters.txt           # Book metadata
│   ├── cover.jpg              # Book cover image
│   ├── chapters_001.mp3       # Individual chapter audio
│   ├── chapters_002.mp3
│   ├── chapters_003.mp3
│   ├── ...                    # More chapters
│   └── book_name.m4b          # Final audiobook
│
└── audiobooks/
    └── [book_name]/
        ├── episodes/
        │   ├── episode_001.mp3     # Audiobook episodes
        │   ├── episode_002.mp3
        │   └── ...
        ├── scripts/                # Generated scripts
        │   ├── episode_001_script.txt
        │   ├── original_text_001.txt
        │   └── ...
        ├── chapters.txt            # FFmpeg metadata
        └── [book_name].m4b         # Final M4B audiobook

Directory audiobook output:

data/output/
└── [directory_name]/
    ├── episodes/
    │   ├── episode_001.mp3     # Episode from first file
    │   ├── episode_002.mp3     # Episode from second file
    │   └── ...
    ├── scripts/
    │   ├── episode_001_script.txt
    │   └── ...
    ├── chapters.txt            # Chapter metadata
    └── [directory_name].m4b    # Combined audiobook

🛠️ Development

Available Tasks

task test      # Run tests with coverage
task format    # Format code with ruff
task run       # Convert ebook to audiobook
task audiobook # Create audiobook from content
task up        # Start Docker services
task api       # Start REST API server (dev mode, port 8000)

You can also use the installed CLI commands directly:

audify run (or audify-run) - equivalent to task run
audify audiobook (or audify-audiobook) - equivalent to task audiobook

Local Development Setup

# Install development dependencies
uv sync --group dev

# Run tests
task test

# Format code
task format

# Type checking (included in pre_test)
mypy ./audify ./tests --ignore-missing-imports

🌐 REST API

Audify exposes a FastAPI HTTP server for programmatic access to synthesis and audiobook creation.

Starting the API

# Development mode (auto-reload)
task api

# Or via Docker (starts with Kokoro and Ollama)
docker compose up -d

The API runs on http://localhost:8000 by default.

Endpoints

Method	Path	Description
`GET`	`/health`	Health check
`GET`	`/providers`	List available TTS providers
`GET`	`/voices?provider=kokoro&language=en`	List voices for a provider
`POST`	`/synthesize`	Convert EPUB or PDF to MP3
`POST`	`/audiobook`	Convert EPUB or PDF to M4B audiobook

Example: Synthesize an EPUB

curl -X POST http://localhost:8000/synthesize \
  -F "file=@book.epub" \
  -F "voice=af_bella" \
  -F "language=en" \
  --output book.mp3

Example: Create an M4B Audiobook

curl -X POST http://localhost:8000/audiobook \
  -F "file=@book.epub" \
  -F "voice=af_bella" \
  -F "language=en" \
  --output book.m4b

API Reference

Interactive docs are available at http://localhost:8000/docs (Swagger UI) once the server is running.

🏗️ Architecture

Audify uses a flexible multi-provider architecture supporting both local and cloud services:

┌──────────────────────────────────┐
│   Audify REST API (port 8000)    │
│ • POST /synthesize               │
│ • POST /audiobook                │
│ • GET  /voices, /providers       │
└──────────────┬───────────────────┘
               │
┌──────────────▼───────────────────┐
│   Audify CLI / Python API        │
│ • EPUB/PDF/TXT Reader            │
│ • LLM Script Generation          │
│ • Audio Combine & M4B Assembly   │
└──────┬───────────────────────────┘
       │
       ├─── TTS Providers ───────────┐
       │    ├─ Kokoro (local)        │
       │    ├─ Qwen-TTS (local)      │
       │    ├─ OpenAI TTS            │
       │    ├─ AWS Polly             │
       │    └─ Google Cloud TTS      │
       │                             │
       └─── LLM APIs ───────────────┘
            ├─ Ollama (local)
            ├─ DeepSeek
            ├─ Claude
            ├─ GPT-4
            └─ Gemini

Key Components

Text Extraction: EPUB/PDF parsing with chapter detection
Translation: LiteLLM + Commercial/Local LLMs for high-quality translation
Task System: Flexible prompt management for audiobook, podcast, summary, meditation, and lecture styles
TTS: Multi-provider support (Kokoro, OpenAI, AWS Polly, Google Cloud TTS)
Audiobook Generation: LLM-powered script creation with commercial API support
Audio Processing: Pydub for format conversion and combining
API Management: Unified API key management via .keys file or environment variables

🌍 Supported Languages

Primary: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Hungarian, Korean, Japanese, Hindi

Translation: Any language pair supported by your Ollama model

🔧 Troubleshooting

Common Issues

Services not responding (Docker/Kokoro):

# Check service status
docker compose ps

# Restart services
docker compose restart

# Check logs
docker compose logs kokoro
docker compose logs ollama

Commercial API errors:

# Verify API key configuration
cat .keys

# Test API connectivity
uv run audify translate test.txt --model api:deepseek-chat

# Check API key is loaded
# The system will show an error if the API key is missing or invalid

TTS Provider issues:

# List available TTS providers
uv run audify --list-tts-providers

# Test specific provider
uv run audify translate test.txt --tts-provider openai

# Check provider credentials in .keys file
# OpenAI: OPENAI_API_KEY
# AWS: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
# Google: GOOGLE_APPLICATION_CREDENTIALS (path to JSON file)

Ollama model not found:

# List available models
docker compose exec ollama ollama list

# Pull required model
docker compose exec ollama ollama pull qwen3:30b

GPU issues:

# Check GPU availability
docker compose exec kokoro nvidia-smi

# If no GPU, services will run on CPU (slower)

Performance Tips

Use SSD storage for model caching
Ensure adequate GPU memory (8GB+ recommended) for Kokoro
Use lighter models for testing: llama3.2:3b instead of magistral:24b
Commercial TTS providers (OpenAI, AWS, Google) are faster than local Kokoro
Commercial LLM APIs often provide better latency than local Ollama
Consider running local services on separate machines for large workloads
Use cloud providers for production workloads requiring high reliability

📚 Examples

Check the examples/ directory for sample usage patterns and configuration files.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Run tests: task test
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Kokoro TTS for high-quality speech synthesis
Kokoro-FastAPI accessible kokoro via FastAPI
Ollama for local LLM inference
LiteLLM for unified LLM API interface
OpenAI for GPT and TTS APIs
Anthropic for Claude API
DeepSeek for DeepSeek API
Google for Gemini and Cloud TTS
AWS Polly for Text-to-Speech service

Test release automation

Release automation v2 test

Release Workflow Test v3

Testing the fixed release workflow with proper breaking change detection.

Release Test v4 - Fixed grep patterns

Release validation complete - workflow is working

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audify_cli-0.2.1.tar.gz (76.5 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audify_cli-0.2.1-py3-none-any.whl (83.3 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file audify_cli-0.2.1.tar.gz.

File metadata

Download URL: audify_cli-0.2.1.tar.gz
Upload date: Apr 17, 2026
Size: 76.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audify_cli-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`099835926a55440c871dc6be1e55ae001b6772c477064d86ca0aaefe51a1915e`
MD5	`0e45305ceabe337ab482b87bccd8148f`
BLAKE2b-256	`d39824bbc2b0a506795b5c6377dc2fcd946d6e505646ab4605d1f72a01cf5edd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audify_cli-0.2.1.tar.gz:

Publisher: release.yml on garciadias/audify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audify_cli-0.2.1.tar.gz
- Subject digest: 099835926a55440c871dc6be1e55ae001b6772c477064d86ca0aaefe51a1915e
- Sigstore transparency entry: 1327214863
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: garciadias/audify@573e7dfc80a61997374e51346045688ba12a14c0
- Branch / Tag: refs/heads/main
- Owner: https://github.com/garciadias
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@573e7dfc80a61997374e51346045688ba12a14c0
- Trigger Event: push

File details

Details for the file audify_cli-0.2.1-py3-none-any.whl.

File metadata

Download URL: audify_cli-0.2.1-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 83.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audify_cli-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`476d74dc3c173346d788bedad84b1c14341675ce39e6a8889bf309ec8013d59f`
MD5	`27f4eac39be650f516a1ccd16c5a4076`
BLAKE2b-256	`3de11bac161f027e0517e30c764fb355c4d948dadb1777cfad1893055bb75789`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audify_cli-0.2.1-py3-none-any.whl:

Publisher: release.yml on garciadias/audify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audify_cli-0.2.1-py3-none-any.whl
- Subject digest: 476d74dc3c173346d788bedad84b1c14341675ce39e6a8889bf309ec8013d59f
- Sigstore transparency entry: 1327214926
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: garciadias/audify@573e7dfc80a61997374e51346045688ba12a14c0
- Branch / Tag: refs/heads/main
- Owner: https://github.com/garciadias
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@573e7dfc80a61997374e51346045688ba12a14c0
- Trigger Event: push

audify-cli 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Audify

🚀 Features

📋 Prerequisites

Core Requirements

For Local TTS Providers (Optional)

Kokoro TTS

Qwen-TTS

For Cloud TTS Providers (Optional)

📦 Installation as a command-line tool

🐳 Quick Start with Docker (For Kokoro TTS)

1. Clone and Setup

2. Start API Services

3. Install Python Dependencies

4. Setup Ollama Models

5. Convert Your First Book

🚀 Quick Start with Qwen-TTS (Local)

1. Setup Qwen-TTS API

2. Install Audify

3. Configure Qwen-TTS

4. Convert Your First Book

🚀 Quick Start with Cloud TTS

1. Clone and Install

2. Configure Your TTS Provider

3. Convert Books with Cloud TTS

📖 Usage Examples

Basic Audiobook Conversion

Audiobook Generation

Task System (Audiobook Styles)

Using Commercial APIs (DeepSeek, Claude, GPT-4, Gemini)

Directory Input (Multi-file Processing)

Advanced Options

⚙️ Configuration

TTS Provider Configuration

Option 1: Using .keys File (Recommended)

Option 2: Environment Variables

Choosing a TTS Provider

Docker Services

📁 Output Structure

🛠️ Development

Available Tasks

Local Development Setup

🌐 REST API

Starting the API

Endpoints

Example: Synthesize an EPUB

Example: Create an M4B Audiobook

API Reference

🏗️ Architecture

Key Components

🌍 Supported Languages

🔧 Troubleshooting

Common Issues

Performance Tips

📚 Examples

🤝 Contributing

Development Workflow

📄 License

🙏 Acknowledgments

Test release automation

Release automation v2 test

Release Workflow Test v3

Release Test v4 - Fixed grep patterns

Release validation complete - workflow is working

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Option 1: Using `.keys` File (Recommended)