A Python library to convert PDF documents into podcasts

Project description

🎙️ PDF2Podcast

An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.

✨ Key Features

📄 PDF Extraction: Advanced PDF document processing
🤖 LLM Integration: Support for Google Gemini and other models
🗣️ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
💬 Natural Dialogues: Realistic conversations between host and expert
🔍 Semantic Retrieval: Intelligent content selection
🎯 Structured Chapters: Automatic organization into thematic chapters
🌍 Multilingual: Support for multiple languages
🛠️ Modular: Extensible and customizable architecture

🚀 Quick Install

pip install pdf2podcast

📖 Basic Usage

Simple Example

from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor

# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"api_key": "your-gemini-key"},
    tts_config={"language": "en"}
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    dialogue=True,  # Dialogue between host and expert
    query="Explain the main concepts of the document"
)

print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")

With Direct Text

# From text instead of PDF
result = generator.generate(
    text="Your text content here...",
    output_path="podcast.mp3",
    dialogue=True,
    query="Discuss the key points"
)

🔧 Advanced Configuration

Semantic Retrieval

from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever

chunker = SimpleChunker()
retriever = SemanticRetriever()

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    chunker=chunker,
    retriever=retriever,
    k=5  # Top 5 most relevant chunks
)

Custom Prompts

from pdf2podcast.core.prompts import PodcastPromptBuilder

# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""

prompt_builder = PodcastPromptBuilder(
    instructions=instructions,
    dialogue=True
)

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="azure",
    llm_config={"prompt_builder": prompt_builder}
)

Multi-Voice TTS (Kokoro)

# Different voices for host and expert
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    tts_config={
        "voice_id": ["af_heart", "am_liam"],  # [host, expert]
        "language": "en"
    }
)

🎭 Output Modes

Dialogue (Recommended)

Activation: dialogue=True
Format: Natural conversation between S1 (host) and S2 (expert)
Features: Interruptions, questions, clarifications
Ideal for: Complex content, educational material

Monologue

Activation: dialogue=False
Format: Continuous narration
Features: Linear flow, storytelling
Ideal for: Narrative content, summaries

🔌 Supported Providers

LLM (Large Language Models)

Google Gemini ✅ (Recommended)
OpenAI 🔄 (In development)

TTS (Text-to-Speech)

Google TTS ✅ - Basic quality, easy setup
Amazon Polly ✅ - Professional quality
Azure TTS ✅ - Advanced neural voices
Kokoro TTS ✅ - Local, multiple voices, precise timing

🛠️ Architecture

📦 pdf2podcast
├── 🎯 PodcastGenerator (Main orchestrator)
├── 📄 RAG Systems (Text extraction)
├── 🧠 LLM Integration (Script generation)
├── 🎵 TTS Engines (Voice synthesis)
├── 🔍 Semantic Retrieval (Intelligent search)
└── 📝 Parser System (Output validation)

Processing Flow

Input → PDF/Text
RAG → Content extraction and cleaning
Chunking → Division into segments
Retrieval → Relevant content selection
LLM → Structured script generation
Parsing → Format validation
TTS → Audio conversion
Output → Audio file + metadata

📚 Complete Documentation

📖 Module Documentation - Detailed architecture
💡 Advanced Examples - Use cases and configurations
🔧 API Reference - Complete API documentation

⚙️ Environment Setup

Environment Variables

# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region

System Dependencies

# For Kokoro TTS (optional)
pip install torch torchaudio

# For advanced PDF processing
pip install pypdf2 pdfplumber

🎯 Practical Examples

Scientific Paper Podcast

result = generator.generate(
    pdf_path="research_paper.pdf",
    output_path="research_podcast.mp3",
    dialogue=True,
    query="Explain methodology, results and implications",
    instructions="Focus on practical applications and limitations"
)

Technical Documentation Podcast

result = generator.generate(
    pdf_path="technical_manual.pdf",
    output_path="tutorial_podcast.mp3",
    dialogue=True,
    query="Create a step-by-step tutorial",
    instructions="Use concrete examples and common troubleshooting"
)

Multilingual Podcast

# Italian
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"language": "it"},
    tts_config={"language": "it"}
)

result = generator.generate(
    text="Italian content here...",
    query="Discuss the main points in Italian"
)

🔍 Troubleshooting

Common Issues

1. API Key Error

# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")

2. Low Audio Quality

# Use premium TTS provider
tts_config = {
    "voice_id": "Joanna",  # Polly
    "engine": "neural"     # Higher quality
}

3. Short Scripts

# Increase retrieval chunks
generator = PodcastGenerator(..., k=10)  # More content

🤝 Contributing

Fork the repository
Create feature branch: git checkout -b feature/new-feature
Commit: git commit -am 'Add new feature'
Push: git push origin feature/new-feature
Create Pull Request

📜 License

MIT License - see LICENSE for details.

🙏 Credits

LangChain - LLM Framework
Pydantic - Data validation
Sentence Transformers - Semantic embeddings
Community Contributors - Feedback and improvements

🚀 Roadmap

OpenAI Integration - GPT-4 support
Batch Processing - Multiple file processing
Web Interface - Web-based GUI
Audio Effects - Background music, effects
Export Formats - MP3, WAV, OGG
Cloud Deployment - Docker, AWS Lambda

Last updated: December 2024 Version: 1.0.0

Project details

Release history Release notifications | RSS feed

0.1.28

Dec 17, 2025

0.1.27

Dec 17, 2025

0.1.26

Dec 17, 2025

0.1.25

Dec 17, 2025

0.1.24

Dec 17, 2025

0.1.23

Dec 17, 2025

0.1.22

Dec 17, 2025

0.1.21

Nov 17, 2025

0.1.20

Nov 14, 2025

0.1.19

Nov 13, 2025

0.1.18

Sep 4, 2025

0.1.17

Aug 4, 2025

0.1.16

Aug 1, 2025

This version

0.1.15

Jul 28, 2025

0.1.14

Jun 16, 2025

0.1.13

May 26, 2025

0.1.12

May 23, 2025

0.1.11

May 16, 2025

0.1.10

Apr 22, 2025

0.1.9

Apr 22, 2025

0.1.8

Apr 22, 2025

0.1.7

Apr 18, 2025

0.1.6

Apr 3, 2025

0.1.5

Apr 3, 2025

0.1.4

Mar 28, 2025

0.1.3

Mar 22, 2025

0.1.2

Mar 20, 2025

0.1.1

Mar 18, 2025

0.1.0

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.15.tar.gz (36.2 kB view details)

Uploaded Jul 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf2podcast-0.1.15-py3-none-any.whl (36.3 kB view details)

Uploaded Jul 28, 2025 Python 3

File details

Details for the file pdf2podcast-0.1.15.tar.gz.

File metadata

Download URL: pdf2podcast-0.1.15.tar.gz
Upload date: Jul 28, 2025
Size: 36.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.15.tar.gz
Algorithm	Hash digest
SHA256	`dcd5592af42cddc682b29d11958832966a08f1c9218210c6fcfa92cf66d82e6c`
MD5	`b9a2651ff5badf5d41fec38adcd502e3`
BLAKE2b-256	`697793d66cf327e252803fd3c58de5f36e362aa804eb7018c87e9d2609059cba`

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.15-py3-none-any.whl.

File metadata

Download URL: pdf2podcast-0.1.15-py3-none-any.whl
Upload date: Jul 28, 2025
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd1ba650e20e6e949c1e284fb6a3b963e1957c4d7393d27295dd9e3b9bcd5147`
MD5	`9ac159f160f3b08a2e5f32997e173a8a`
BLAKE2b-256	`6142ec2e65f067fffe12e78d655f5e93b6badcf145e4b5063aac92dd718a4d17`

See more details on using hashes here.

pdf2podcast 0.1.15

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🎙️ PDF2Podcast

✨ Key Features

🚀 Quick Install

📖 Basic Usage

Simple Example

With Direct Text

🔧 Advanced Configuration

Semantic Retrieval

Custom Prompts

Multi-Voice TTS (Kokoro)

🎭 Output Modes

Dialogue (Recommended)

Monologue

🔌 Supported Providers

LLM (Large Language Models)

TTS (Text-to-Speech)

🛠️ Architecture

Processing Flow

📚 Complete Documentation

⚙️ Environment Setup

Environment Variables

System Dependencies

🎯 Practical Examples

Scientific Paper Podcast

Technical Documentation Podcast

Multilingual Podcast

🔍 Troubleshooting

Common Issues

🤝 Contributing

📜 License

🙏 Credits

🚀 Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes