Skip to main content

A Python library to convert PDF documents into podcasts

Project description

๐ŸŽ™๏ธ PDF2Podcast

An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.

โœจ Key Features

  • ๐Ÿ“„ PDF Extraction: Advanced PDF document processing
  • ๐Ÿค– LLM Integration: Support for Google Gemini and other models
  • ๐Ÿ—ฃ๏ธ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
  • ๐Ÿ’ฌ Natural Dialogues: Realistic conversations between host and expert
  • ๐Ÿ” Semantic Retrieval: Intelligent content selection
  • ๐ŸŽฏ Structured Chapters: Automatic organization into thematic chapters
  • ๐ŸŒ Multilingual: Support for multiple languages
  • ๐Ÿ› ๏ธ Modular: Extensible and customizable architecture

๐Ÿš€ Quick Install

pip install pdf2podcast

๐Ÿ“– Basic Usage

Simple Example

from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor

# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"api_key": "your-gemini-key"},
    tts_config={"language": "en"}
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    dialogue=True,  # Dialogue between host and expert
    query="Explain the main concepts of the document"
)

print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")

With Direct Text

# From text instead of PDF
result = generator.generate(
    text="Your text content here...",
    output_path="podcast.mp3",
    dialogue=True,
    query="Discuss the key points"
)

๐Ÿ”ง Advanced Configuration

Semantic Retrieval

from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever

chunker = SimpleChunker()
retriever = SemanticRetriever()

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    chunker=chunker,
    retriever=retriever,
    k=5  # Top 5 most relevant chunks
)

Custom Prompts

from pdf2podcast.core.prompts import PodcastPromptBuilder

# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""

prompt_builder = PodcastPromptBuilder(
    instructions=instructions,
    dialogue=True
)

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="azure",
    llm_config={"prompt_builder": prompt_builder}
)

Multi-Voice TTS (Kokoro)

# Different voices for host and expert
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    tts_config={
        "voice_id": ["af_heart", "am_liam"],  # [host, expert]
        "language": "en"
    }
)

๐ŸŽญ Output Modes

Dialogue (Recommended)

  • Activation: dialogue=True
  • Format: Natural conversation between S1 (host) and S2 (expert)
  • Features: Interruptions, questions, clarifications
  • Ideal for: Complex content, educational material

Monologue

  • Activation: dialogue=False
  • Format: Continuous narration
  • Features: Linear flow, storytelling
  • Ideal for: Narrative content, summaries

๐Ÿ”Œ Supported Providers

LLM (Large Language Models)

  • Google Gemini โœ… (Recommended)
  • OpenAI ๐Ÿ”„ (In development)

TTS (Text-to-Speech)

  • Google TTS โœ… - Basic quality, easy setup
  • Amazon Polly โœ… - Professional quality
  • Azure TTS โœ… - Advanced neural voices
  • Kokoro TTS โœ… - Local, multiple voices, precise timing

๐Ÿ› ๏ธ Architecture

๐Ÿ“ฆ pdf2podcast
โ”œโ”€โ”€ ๐ŸŽฏ PodcastGenerator (Main orchestrator)
โ”œโ”€โ”€ ๐Ÿ“„ RAG Systems (Text extraction)
โ”œโ”€โ”€ ๐Ÿง  LLM Integration (Script generation)
โ”œโ”€โ”€ ๐ŸŽต TTS Engines (Voice synthesis)
โ”œโ”€โ”€ ๐Ÿ” Semantic Retrieval (Intelligent search)
โ””โ”€โ”€ ๐Ÿ“ Parser System (Output validation)

Processing Flow

  1. Input โ†’ PDF/Text
  2. RAG โ†’ Content extraction and cleaning
  3. Chunking โ†’ Division into segments
  4. Retrieval โ†’ Relevant content selection
  5. LLM โ†’ Structured script generation
  6. Parsing โ†’ Format validation
  7. TTS โ†’ Audio conversion
  8. Output โ†’ Audio file + metadata

๐Ÿ“š Complete Documentation

โš™๏ธ Environment Setup

Environment Variables

# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region

System Dependencies

# For Kokoro TTS (optional)
pip install torch torchaudio

# For advanced PDF processing
pip install pypdf2 pdfplumber

๐ŸŽฏ Practical Examples

Scientific Paper Podcast

result = generator.generate(
    pdf_path="research_paper.pdf",
    output_path="research_podcast.mp3",
    dialogue=True,
    query="Explain methodology, results and implications",
    instructions="Focus on practical applications and limitations"
)

Technical Documentation Podcast

result = generator.generate(
    pdf_path="technical_manual.pdf",
    output_path="tutorial_podcast.mp3",
    dialogue=True,
    query="Create a step-by-step tutorial",
    instructions="Use concrete examples and common troubleshooting"
)

Multilingual Podcast

# Italian
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"language": "it"},
    tts_config={"language": "it"}
)

result = generator.generate(
    text="Italian content here...",
    query="Discuss the main points in Italian"
)

๐Ÿ” Troubleshooting

Common Issues

1. API Key Error

# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")

2. Low Audio Quality

# Use premium TTS provider
tts_config = {
    "voice_id": "Joanna",  # Polly
    "engine": "neural"     # Higher quality
}

3. Short Scripts

# Increase retrieval chunks
generator = PodcastGenerator(..., k=10)  # More content

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit: git commit -am 'Add new feature'
  4. Push: git push origin feature/new-feature
  5. Create Pull Request

๐Ÿ“œ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

  • LangChain - LLM Framework
  • Pydantic - Data validation
  • Sentence Transformers - Semantic embeddings
  • Community Contributors - Feedback and improvements

๐Ÿš€ Roadmap

  • OpenAI Integration - GPT-4 support
  • Batch Processing - Multiple file processing
  • Web Interface - Web-based GUI
  • Audio Effects - Background music, effects
  • Export Formats - MP3, WAV, OGG
  • Cloud Deployment - Docker, AWS Lambda

Last updated: December 2024 Version: 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.26.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2podcast-0.1.26-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf2podcast-0.1.26.tar.gz.

File metadata

  • Download URL: pdf2podcast-0.1.26.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.26.tar.gz
Algorithm Hash digest
SHA256 c49b08bbbb48dae2b5e43a6f2403e7eb242ca157f6a093d497bbcdd2f503cc2f
MD5 bdac06db3c7a5675d739bae9373010e4
BLAKE2b-256 fe5fd447db7352d132a9a078c0b6f99c30e8c1ad0b79c35edad44adfc48da91e

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.26-py3-none-any.whl.

File metadata

  • Download URL: pdf2podcast-0.1.26-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.26-py3-none-any.whl
Algorithm Hash digest
SHA256 02de872341c208ce7b5cd7b07f6e0baa41326f68010ce9ea0918c53887fdc490
MD5 4d987a1129c5aca521303d0a21f8bc66
BLAKE2b-256 db27f2d5b220ce5f34d3f7b9c540406355f14e1c0cf2bce5cc861e08ac3f1613

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page