Skip to main content

A Python library to convert PDF documents into podcasts

Project description

๐ŸŽ™๏ธ PDF2Podcast

An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.

โœจ Key Features

  • ๐Ÿ“„ PDF Extraction: Advanced PDF document processing
  • ๐Ÿค– LLM Integration: Support for Google Gemini and other models
  • ๐Ÿ—ฃ๏ธ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
  • ๐Ÿ’ฌ Natural Dialogues: Realistic conversations between host and expert
  • ๐Ÿ” Semantic Retrieval: Intelligent content selection
  • ๐ŸŽฏ Structured Chapters: Automatic organization into thematic chapters
  • ๐ŸŒ Multilingual: Support for multiple languages
  • ๐Ÿ› ๏ธ Modular: Extensible and customizable architecture

๐Ÿš€ Quick Install

pip install pdf2podcast

๐Ÿ“– Basic Usage

Simple Example

from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor

# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"api_key": "your-gemini-key"},
    tts_config={"language": "en"}
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    dialogue=True,  # Dialogue between host and expert
    query="Explain the main concepts of the document"
)

print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")

With Direct Text

# From text instead of PDF
result = generator.generate(
    text="Your text content here...",
    output_path="podcast.mp3",
    dialogue=True,
    query="Discuss the key points"
)

๐Ÿ”ง Advanced Configuration

Semantic Retrieval

from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever

chunker = SimpleChunker()
retriever = SemanticRetriever()

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    chunker=chunker,
    retriever=retriever,
    k=5  # Top 5 most relevant chunks
)

Custom Prompts

from pdf2podcast.core.prompts import PodcastPromptBuilder

# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""

prompt_builder = PodcastPromptBuilder(
    instructions=instructions,
    dialogue=True
)

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="azure",
    llm_config={"prompt_builder": prompt_builder}
)

Multi-Voice TTS (Kokoro)

# Different voices for host and expert
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    tts_config={
        "voice_id": ["af_heart", "am_liam"],  # [host, expert]
        "language": "en"
    }
)

๐ŸŽญ Output Modes

Dialogue (Recommended)

  • Activation: dialogue=True
  • Format: Natural conversation between S1 (host) and S2 (expert)
  • Features: Interruptions, questions, clarifications
  • Ideal for: Complex content, educational material

Monologue

  • Activation: dialogue=False
  • Format: Continuous narration
  • Features: Linear flow, storytelling
  • Ideal for: Narrative content, summaries

๐Ÿ”Œ Supported Providers

LLM (Large Language Models)

  • Google Gemini โœ… (Recommended)
  • OpenAI ๐Ÿ”„ (In development)

TTS (Text-to-Speech)

  • Google TTS โœ… - Basic quality, easy setup
  • Amazon Polly โœ… - Professional quality
  • Azure TTS โœ… - Advanced neural voices
  • Kokoro TTS โœ… - Local, multiple voices, precise timing

๐Ÿ› ๏ธ Architecture

๐Ÿ“ฆ pdf2podcast
โ”œโ”€โ”€ ๐ŸŽฏ PodcastGenerator (Main orchestrator)
โ”œโ”€โ”€ ๐Ÿ“„ RAG Systems (Text extraction)
โ”œโ”€โ”€ ๐Ÿง  LLM Integration (Script generation)
โ”œโ”€โ”€ ๐ŸŽต TTS Engines (Voice synthesis)
โ”œโ”€โ”€ ๐Ÿ” Semantic Retrieval (Intelligent search)
โ””โ”€โ”€ ๐Ÿ“ Parser System (Output validation)

Processing Flow

  1. Input โ†’ PDF/Text
  2. RAG โ†’ Content extraction and cleaning
  3. Chunking โ†’ Division into segments
  4. Retrieval โ†’ Relevant content selection
  5. LLM โ†’ Structured script generation
  6. Parsing โ†’ Format validation
  7. TTS โ†’ Audio conversion
  8. Output โ†’ Audio file + metadata

๐Ÿ“š Complete Documentation

โš™๏ธ Environment Setup

Environment Variables

# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region

System Dependencies

# For Kokoro TTS (optional)
pip install torch torchaudio

# For advanced PDF processing
pip install pypdf2 pdfplumber

๐ŸŽฏ Practical Examples

Scientific Paper Podcast

result = generator.generate(
    pdf_path="research_paper.pdf",
    output_path="research_podcast.mp3",
    dialogue=True,
    query="Explain methodology, results and implications",
    instructions="Focus on practical applications and limitations"
)

Technical Documentation Podcast

result = generator.generate(
    pdf_path="technical_manual.pdf",
    output_path="tutorial_podcast.mp3",
    dialogue=True,
    query="Create a step-by-step tutorial",
    instructions="Use concrete examples and common troubleshooting"
)

Multilingual Podcast

# Italian
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"language": "it"},
    tts_config={"language": "it"}
)

result = generator.generate(
    text="Italian content here...",
    query="Discuss the main points in Italian"
)

๐Ÿ” Troubleshooting

Common Issues

1. API Key Error

# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")

2. Low Audio Quality

# Use premium TTS provider
tts_config = {
    "voice_id": "Joanna",  # Polly
    "engine": "neural"     # Higher quality
}

3. Short Scripts

# Increase retrieval chunks
generator = PodcastGenerator(..., k=10)  # More content

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit: git commit -am 'Add new feature'
  4. Push: git push origin feature/new-feature
  5. Create Pull Request

๐Ÿ“œ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

  • LangChain - LLM Framework
  • Pydantic - Data validation
  • Sentence Transformers - Semantic embeddings
  • Community Contributors - Feedback and improvements

๐Ÿš€ Roadmap

  • OpenAI Integration - GPT-4 support
  • Batch Processing - Multiple file processing
  • Web Interface - Web-based GUI
  • Audio Effects - Background music, effects
  • Export Formats - MP3, WAV, OGG
  • Cloud Deployment - Docker, AWS Lambda

Last updated: December 2024 Version: 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.15.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2podcast-0.1.15-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file pdf2podcast-0.1.15.tar.gz.

File metadata

  • Download URL: pdf2podcast-0.1.15.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.15.tar.gz
Algorithm Hash digest
SHA256 dcd5592af42cddc682b29d11958832966a08f1c9218210c6fcfa92cf66d82e6c
MD5 b9a2651ff5badf5d41fec38adcd502e3
BLAKE2b-256 697793d66cf327e252803fd3c58de5f36e362aa804eb7018c87e9d2609059cba

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: pdf2podcast-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 bd1ba650e20e6e949c1e284fb6a3b963e1957c4d7393d27295dd9e3b9bcd5147
MD5 9ac159f160f3b08a2e5f32997e173a8a
BLAKE2b-256 6142ec2e65f067fffe12e78d655f5e93b6badcf145e4b5063aac92dd718a4d17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page