A Python library to convert PDF documents into podcasts

Project description

PDF2Podcast 🎙️

Transform PDF documents into engaging, narrative-driven audio content using state-of-the-art AI technology.

Overview

PDF2Podcast leverages advanced RAG (Retrieval Augmented Generation) technology, LLMs, and TTS capabilities to convert technical documents into professional, narrative-style podcasts. It intelligently processes PDF content, maintains context across sections, and generates natural-sounding audio output.

Key Features

🔍 Smart Document Processing

Advanced PDF text extraction with support for complex layouts
Image caption extraction and metadata integration
Intelligent chunking with semantic context preservation

🧠 AI-Powered Content Generation

Context-aware content transformation using RAG technology
Customizable complexity levels and audience targeting
Support for narrative-style content generation

🗣️ Professional Audio Output

High-quality text-to-speech synthesis
Multiple voice provider options (Google TTS, AWS Polly, Azure Speech)
Natural pacing and pronunciation

⚙️ Flexible Configuration

Modular architecture for easy extension
Configurable LLM and TTS providers
Custom prompt builder support

Installation
Quick Start
Advanced Usage
- Custom Prompt Builders
- Provider Configuration
Configuration Reference
Environment Setup
Error Handling
License

Installation

Install PDF2Podcast using pip:

pip install pdf2podcast

Quick Start

Here's a basic example to get you started with PDF2Podcast:

from pdf2podcast import PodcastGenerator, SimplePDFProcessor
import os
from dotenv import load_dotenv

# Load environment variables for API keys
load_dotenv()

# Initialize the PDF processor
# This component handles document reading, content extraction, and chunking
pdf_processor = SimplePDFProcessor()

# Create a podcast generator instance with basic configuration
# - llm_provider: Specifies which LLM service to use (currently supports "gemini")
# - tts_provider: Specifies which TTS service to use ("google", "aws", or "azure")
# - llm_config: Configuration for the LLM service
# - tts_config: Configuration for the TTS service
generator = PodcastGenerator(
    rag_system=pdf_processor,
    llm_provider="gemini",         # Using Google's Gemini model
    tts_provider="google",         # Using Google's TTS service
    llm_config={
        "api_key": os.getenv("GENAI_API_KEY"),  # API key from environment variables
        "max_output_tokens": 4096,  # Maximum length of generated content
        "temperature": 0.2         # Controls creativity vs determinism (0.0-1.0)
    },
    tts_config={
        "language": "en",          # Output language
        "tld": "com",             # TLD for accent selection
        "slow": False             # Normal speech rate
    }
)

# Generate the podcast
# - complexity: Controls content complexity ("simple", "intermediate", "advanced")
# - audience: Target audience type ("general", "enthusiasts", "professionals", "experts")
result = generator.generate(
    pdf_path="sample.pdf",         # Input PDF file
    output_path="output.mp3",      # Output audio file
    complexity="intermediate",      # Moderate technical depth
    audience="general"             # General audience targeting
)

# The result contains:
# - script: The generated narrative script
# - audio: Dictionary with audio file details (path, size)
print(f"Generated audio file: {result['audio']['path']}")
print(f"Script length: {len(result['script'])} characters")

This basic example demonstrates:

Setting up the PDF processor for content extraction
Configuring the podcast generator with LLM and TTS providers
Generating a podcast with customized complexity and audience targeting
Accessing the generated content and audio file

Advanced Usage

Custom Prompt Builders

PDF2Podcast supports custom prompt builders to control how content is transformed into narrative form. The library includes a StorytellingPromptBuilder that creates engaging, story-like narratives from technical content:

from pdf2podcast import PodcastGenerator, SimplePDFProcessor
from pdf2podcast.examples.custom_prompts import StorytellingPromptBuilder

# Initialize PDF processor with advanced settings
pdf_processor = SimplePDFProcessor(
    max_chars_per_chunk=6000,  # Larger chunks for better context
    extract_images=True,       # Include image captions in processing
    metadata=True             # Include document metadata
)

# Create generator with storytelling configuration
generator = PodcastGenerator(
    rag_system=pdf_processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={
        "api_key": os.getenv("GENAI_API_KEY"),
        "model_name": "gemini-1.5-flash",    # Specific model version
        "max_output_tokens": 8000,           # Longer output for stories
        "temperature": 0.3,                  # Slightly more creative
        "prompt_builder": StorytellingPromptBuilder()  # Enable storytelling mode
    },
    tts_config={
        "language": "en",
        "tld": "com",
        "slow": False
    }
)

# Generate a narrative-style podcast
result = generator.generate(
    pdf_path="sample.pdf",
    output_path="output.mp3",
    complexity="advanced",        # Detailed technical content
    audience="enthusiasts",      # For technically interested listeners
    query="Explain the main concepts and their practical applications"
)

The StorytellingPromptBuilder transforms content by:

Creating engaging hooks and openings
Building narrative tension around technical concepts
Presenting solutions as revelations
Maintaining technical accuracy while being engaging
Adapting language and examples to the target audience

Provider Configuration

LLM Provider Settings

The library currently supports Google's Gemini as the LLM provider:

# Full configuration options for Gemini
generator = PodcastGenerator(
    llm_provider="gemini",  # Specify Gemini as provider
    llm_config={
        "api_key": os.getenv("GENAI_API_KEY"),
        "model_name": "gemini-1.5-flash",  # Model version (optional)
        "max_output_tokens": 4096,         # Maximum response length
        "temperature": 0.2,                # Creativity control (0.0-1.0)
        "top_p": 0.9,                     # Nucleus sampling parameter
        "streaming": False                 # Enable/disable streaming
    },
    ...
    
)

TTS Provider Settings

Two TTS providers are supported with their own configuration options:

Google TTS

# Google TTS configuration
generator = PodcastGenerator(
    tts_provider="google",  # Use Google's TTS service
    tts_config={
        "language": "en",    # Language code
        "tld": "com",       # Top-level domain for accent
        "slow": False,      # Speech rate
    },
    ...
)

AWS Polly

# AWS Polly configuration
generator = PodcastGenerator(
    tts_provider="aws",  # Use AWS Polly
    tts_config={
        "voice_id": "Joanna",          # Voice selection
        "region_name": "eu-central-1", # AWS region
        "engine": "neural",            # Neural TTS engine
        # Additional options:
        # "sample_rate": 22050,        # Audio sample rate
        # "audio_format": "mp3"        # Output format
    },
    ...
)

Azure TTS

# Azure TTS configuration
generator = PodcastGenerator(
    tts_provider="azure",  # Use Azure TTS service
    tts_config={
        "subscription_key": os.getenv("AZURE_SPEECH_KEY"),
        "region_name": "eastus",           # Azure region
        "voice_id": "en-US-AvaMultilingualNeural"  # Voice selection
    },
    ...
)

Configuration Reference

Complexity Levels

Level	Description	Best For
simple	Basic terms, clear explanations	General audience, introductory content
intermediate	Balanced technical depth	Students, professionals
advanced	Full technical detail	Experts, technical documentation

Audience Types

Type	Description	Content Adaptation
general	No technical background	Focus on practical understanding
enthusiasts	Interest-driven knowledge	Hobby and DIY applications
professionals	Working knowledge	Industry applications
experts	Deep domain knowledge	Advanced concepts

Environment Setup

Required environment variables:

# Gemini API Configuration
GENAI_API_KEY=your_gemini_api_key

# AWS Polly Configuration (if using)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region

# Azure Speech Services Configuration (if using)
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=your_azure_region

Error Handling

PDF2Podcast provides comprehensive error handling with specific exceptions:

try:
    # Attempt to generate podcast
    result = generator.generate(
        pdf_path="sample.pdf",
        output_path="output.mp3"
    )
except ValueError as e:
    # Handle configuration errors
    print(f"Configuration error: {str(e)}")
    # Example: Invalid API keys, unsupported providers
except FileNotFoundError as e:
    # Handle file access errors
    print(f"File error: {str(e)}")
    # Example: PDF not found, output directory issues
except Exception as e:
    # Handle other processing errors
    print(f"Processing error: {str(e)}")
    # Example: Network issues, service failures

Common errors and solutions:

Configuration errors: Check API keys and provider settings
File errors: Verify file paths and permissions
Processing errors: Check network connection and service status

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.1.28

Dec 17, 2025

0.1.27

Dec 17, 2025

0.1.26

Dec 17, 2025

0.1.25

Dec 17, 2025

0.1.24

Dec 17, 2025

0.1.23

Dec 17, 2025

0.1.22

Dec 17, 2025

0.1.21

Nov 17, 2025

0.1.20

Nov 14, 2025

0.1.19

Nov 13, 2025

0.1.18

Sep 4, 2025

0.1.17

Aug 4, 2025

0.1.16

Aug 1, 2025

0.1.15

Jul 28, 2025

0.1.14

Jun 16, 2025

This version

0.1.13

May 26, 2025

0.1.12

May 23, 2025

0.1.11

May 16, 2025

0.1.10

Apr 22, 2025

0.1.9

Apr 22, 2025

0.1.8

Apr 22, 2025

0.1.7

Apr 18, 2025

0.1.6

Apr 3, 2025

0.1.5

Apr 3, 2025

0.1.4

Mar 28, 2025

0.1.3

Mar 22, 2025

0.1.2

Mar 20, 2025

0.1.1

Mar 18, 2025

0.1.0

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.13.tar.gz (29.8 kB view details)

Uploaded May 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf2podcast-0.1.13-py3-none-any.whl (29.5 kB view details)

Uploaded May 26, 2025 Python 3

File details

Details for the file pdf2podcast-0.1.13.tar.gz.

File metadata

Download URL: pdf2podcast-0.1.13.tar.gz
Upload date: May 26, 2025
Size: 29.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`e89ec03c1eb772ba273ca7c3f8dca2fde74d46c405a515fd2a6919b58fe856da`
MD5	`85eb0495ed8a01bef41e9a6c6fad8ecf`
BLAKE2b-256	`a78b1c1d3ab21dd2218e38797b1fa878c9d9173e94b0a480613353014d6df02b`

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.13-py3-none-any.whl.

File metadata

Download URL: pdf2podcast-0.1.13-py3-none-any.whl
Upload date: May 26, 2025
Size: 29.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79948edf0c5cac2f6609e6b8e21b9beecc79e2ef3b58f80200fefe04b620d7a6`
MD5	`a8f622faeca0e0f9174c57fb7cc52631`
BLAKE2b-256	`5d83130168688151768477e3111b4382899f391546feed4162da29030935fb13`

See more details on using hashes here.

pdf2podcast 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

PDF2Podcast 🎙️

Overview

Key Features

Table of Contents

Installation

Quick Start

Advanced Usage

Custom Prompt Builders

Provider Configuration

LLM Provider Settings

TTS Provider Settings

Configuration Reference

Complexity Levels

Audience Types

Environment Setup

Error Handling

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes