Skip to main content

A Python library to convert PDF documents into podcasts

Project description

๐ŸŽ™๏ธ PDF2Podcast

An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.

โœจ Key Features

  • ๐Ÿ“„ PDF Extraction: Advanced PDF document processing
  • ๐Ÿค– LLM Integration: Support for Google Gemini and other models
  • ๐Ÿ—ฃ๏ธ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
  • ๐Ÿ’ฌ Natural Dialogues: Realistic conversations between host and expert
  • ๐Ÿ” Semantic Retrieval: Intelligent content selection
  • ๐ŸŽฏ Structured Chapters: Automatic organization into thematic chapters
  • ๐ŸŒ Multilingual: Support for multiple languages
  • ๐Ÿ› ๏ธ Modular: Extensible and customizable architecture

๐Ÿš€ Quick Install

pip install pdf2podcast

๐Ÿ“– Basic Usage

Simple Example

from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor

# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"api_key": "your-gemini-key"},
    tts_config={"language": "en"}
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    dialogue=True,  # Dialogue between host and expert
    query="Explain the main concepts of the document"
)

print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")

With Direct Text

# From text instead of PDF
result = generator.generate(
    text="Your text content here...",
    output_path="podcast.mp3",
    dialogue=True,
    query="Discuss the key points"
)

๐Ÿ”ง Advanced Configuration

Semantic Retrieval

from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever

chunker = SimpleChunker()
retriever = SemanticRetriever()

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    chunker=chunker,
    retriever=retriever,
    k=5  # Top 5 most relevant chunks
)

Custom Prompts

from pdf2podcast.core.prompts import PodcastPromptBuilder

# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""

prompt_builder = PodcastPromptBuilder(
    instructions=instructions,
    dialogue=True
)

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="azure",
    llm_config={"prompt_builder": prompt_builder}
)

Multi-Voice TTS (Kokoro)

# Different voices for host and expert
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    tts_config={
        "voice_id": ["af_heart", "am_liam"],  # [host, expert]
        "language": "en"
    }
)

๐ŸŽญ Output Modes

Dialogue (Recommended)

  • Activation: dialogue=True
  • Format: Natural conversation between S1 (host) and S2 (expert)
  • Features: Interruptions, questions, clarifications
  • Ideal for: Complex content, educational material

Monologue

  • Activation: dialogue=False
  • Format: Continuous narration
  • Features: Linear flow, storytelling
  • Ideal for: Narrative content, summaries

๐Ÿ”Œ Supported Providers

LLM (Large Language Models)

  • Google Gemini โœ… (Recommended)
  • OpenAI ๐Ÿ”„ (In development)

TTS (Text-to-Speech)

  • Google TTS โœ… - Basic quality, easy setup
  • Amazon Polly โœ… - Professional quality
  • Azure TTS โœ… - Advanced neural voices
  • Kokoro TTS โœ… - Local, multiple voices, precise timing

๐Ÿ› ๏ธ Architecture

๐Ÿ“ฆ pdf2podcast
โ”œโ”€โ”€ ๐ŸŽฏ PodcastGenerator (Main orchestrator)
โ”œโ”€โ”€ ๐Ÿ“„ RAG Systems (Text extraction)
โ”œโ”€โ”€ ๐Ÿง  LLM Integration (Script generation)
โ”œโ”€โ”€ ๐ŸŽต TTS Engines (Voice synthesis)
โ”œโ”€โ”€ ๐Ÿ” Semantic Retrieval (Intelligent search)
โ””โ”€โ”€ ๐Ÿ“ Parser System (Output validation)

Processing Flow

  1. Input โ†’ PDF/Text
  2. RAG โ†’ Content extraction and cleaning
  3. Chunking โ†’ Division into segments
  4. Retrieval โ†’ Relevant content selection
  5. LLM โ†’ Structured script generation
  6. Parsing โ†’ Format validation
  7. TTS โ†’ Audio conversion
  8. Output โ†’ Audio file + metadata

๐Ÿ“š Complete Documentation

โš™๏ธ Environment Setup

Environment Variables

# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region

System Dependencies

# For Kokoro TTS (optional)
pip install torch torchaudio

# For advanced PDF processing
pip install pypdf2 pdfplumber

๐ŸŽฏ Practical Examples

Scientific Paper Podcast

result = generator.generate(
    pdf_path="research_paper.pdf",
    output_path="research_podcast.mp3",
    dialogue=True,
    query="Explain methodology, results and implications",
    instructions="Focus on practical applications and limitations"
)

Technical Documentation Podcast

result = generator.generate(
    pdf_path="technical_manual.pdf",
    output_path="tutorial_podcast.mp3",
    dialogue=True,
    query="Create a step-by-step tutorial",
    instructions="Use concrete examples and common troubleshooting"
)

Multilingual Podcast

# Italian
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"language": "it"},
    tts_config={"language": "it"}
)

result = generator.generate(
    text="Italian content here...",
    query="Discuss the main points in Italian"
)

๐Ÿ” Troubleshooting

Common Issues

1. API Key Error

# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")

2. Low Audio Quality

# Use premium TTS provider
tts_config = {
    "voice_id": "Joanna",  # Polly
    "engine": "neural"     # Higher quality
}

3. Short Scripts

# Increase retrieval chunks
generator = PodcastGenerator(..., k=10)  # More content

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit: git commit -am 'Add new feature'
  4. Push: git push origin feature/new-feature
  5. Create Pull Request

๐Ÿ“œ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

  • LangChain - LLM Framework
  • Pydantic - Data validation
  • Sentence Transformers - Semantic embeddings
  • Community Contributors - Feedback and improvements

๐Ÿš€ Roadmap

  • OpenAI Integration - GPT-4 support
  • Batch Processing - Multiple file processing
  • Web Interface - Web-based GUI
  • Audio Effects - Background music, effects
  • Export Formats - MP3, WAV, OGG
  • Cloud Deployment - Docker, AWS Lambda

Last updated: December 2024 Version: 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.24.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2podcast-0.1.24-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file pdf2podcast-0.1.24.tar.gz.

File metadata

  • Download URL: pdf2podcast-0.1.24.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.24.tar.gz
Algorithm Hash digest
SHA256 5ef5e04bab2a3497ac2c31784272d9d3a65eefd3d0710c51c9678a536d3e01ad
MD5 6439c277759cc4eef440c7a4747d4dc0
BLAKE2b-256 e053b5a55423bee498f020a98978ca967549307e451cb0e4df54539e157a6aff

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.24-py3-none-any.whl.

File metadata

  • Download URL: pdf2podcast-0.1.24-py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.24-py3-none-any.whl
Algorithm Hash digest
SHA256 83cbd5cc16ab78e2f33798b1a0975a11d900d9f0041971933486e4afce5ab4ef
MD5 bc5e2fdc791e5f5f4cb8d1bc812117ef
BLAKE2b-256 54093b362ba711a474b89bd2537af5bb30d4452693e971e68236fe2a0f473d38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page