Skip to main content

A Python library to convert PDF documents into podcasts

Project description

๐ŸŽ™๏ธ PDF2Podcast

An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.

โœจ Key Features

  • ๐Ÿ“„ PDF Extraction: Advanced PDF document processing
  • ๐Ÿค– LLM Integration: Support for Google Gemini and other models
  • ๐Ÿ—ฃ๏ธ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
  • ๐Ÿ’ฌ Natural Dialogues: Realistic conversations between host and expert
  • ๐Ÿ” Semantic Retrieval: Intelligent content selection
  • ๐ŸŽฏ Structured Chapters: Automatic organization into thematic chapters
  • ๐ŸŒ Multilingual: Support for multiple languages
  • ๐Ÿ› ๏ธ Modular: Extensible and customizable architecture

๐Ÿš€ Quick Install

pip install pdf2podcast

๐Ÿ“– Basic Usage

Simple Example

from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor

# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"api_key": "your-gemini-key"},
    tts_config={"language": "en"}
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    dialogue=True,  # Dialogue between host and expert
    query="Explain the main concepts of the document"
)

print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")

With Direct Text

# From text instead of PDF
result = generator.generate(
    text="Your text content here...",
    output_path="podcast.mp3",
    dialogue=True,
    query="Discuss the key points"
)

๐Ÿ”ง Advanced Configuration

Semantic Retrieval

from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever

chunker = SimpleChunker()
retriever = SemanticRetriever()

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    chunker=chunker,
    retriever=retriever,
    k=5  # Top 5 most relevant chunks
)

Custom Prompts

from pdf2podcast.core.prompts import PodcastPromptBuilder

# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""

prompt_builder = PodcastPromptBuilder(
    instructions=instructions,
    dialogue=True
)

generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="azure",
    llm_config={"prompt_builder": prompt_builder}
)

Multi-Voice TTS (Kokoro)

# Different voices for host and expert
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="kokoro",
    tts_config={
        "voice_id": ["af_heart", "am_liam"],  # [host, expert]
        "language": "en"
    }
)

๐ŸŽญ Output Modes

Dialogue (Recommended)

  • Activation: dialogue=True
  • Format: Natural conversation between S1 (host) and S2 (expert)
  • Features: Interruptions, questions, clarifications
  • Ideal for: Complex content, educational material

Monologue

  • Activation: dialogue=False
  • Format: Continuous narration
  • Features: Linear flow, storytelling
  • Ideal for: Narrative content, summaries

๐Ÿ”Œ Supported Providers

LLM (Large Language Models)

  • Google Gemini โœ… (Recommended)
  • OpenAI ๐Ÿ”„ (In development)

TTS (Text-to-Speech)

  • Google TTS โœ… - Basic quality, easy setup
  • Amazon Polly โœ… - Professional quality
  • Azure TTS โœ… - Advanced neural voices
  • Kokoro TTS โœ… - Local, multiple voices, precise timing

๐Ÿ› ๏ธ Architecture

๐Ÿ“ฆ pdf2podcast
โ”œโ”€โ”€ ๐ŸŽฏ PodcastGenerator (Main orchestrator)
โ”œโ”€โ”€ ๐Ÿ“„ RAG Systems (Text extraction)
โ”œโ”€โ”€ ๐Ÿง  LLM Integration (Script generation)
โ”œโ”€โ”€ ๐ŸŽต TTS Engines (Voice synthesis)
โ”œโ”€โ”€ ๐Ÿ” Semantic Retrieval (Intelligent search)
โ””โ”€โ”€ ๐Ÿ“ Parser System (Output validation)

Processing Flow

  1. Input โ†’ PDF/Text
  2. RAG โ†’ Content extraction and cleaning
  3. Chunking โ†’ Division into segments
  4. Retrieval โ†’ Relevant content selection
  5. LLM โ†’ Structured script generation
  6. Parsing โ†’ Format validation
  7. TTS โ†’ Audio conversion
  8. Output โ†’ Audio file + metadata

๐Ÿ“š Complete Documentation

โš™๏ธ Environment Setup

Environment Variables

# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region

System Dependencies

# For Kokoro TTS (optional)
pip install torch torchaudio

# For advanced PDF processing
pip install pypdf2 pdfplumber

๐ŸŽฏ Practical Examples

Scientific Paper Podcast

result = generator.generate(
    pdf_path="research_paper.pdf",
    output_path="research_podcast.mp3",
    dialogue=True,
    query="Explain methodology, results and implications",
    instructions="Focus on practical applications and limitations"
)

Technical Documentation Podcast

result = generator.generate(
    pdf_path="technical_manual.pdf",
    output_path="tutorial_podcast.mp3",
    dialogue=True,
    query="Create a step-by-step tutorial",
    instructions="Use concrete examples and common troubleshooting"
)

Multilingual Podcast

# Italian
generator = PodcastGenerator(
    rag_system=processor,
    llm_provider="gemini",
    tts_provider="google",
    llm_config={"language": "it"},
    tts_config={"language": "it"}
)

result = generator.generate(
    text="Italian content here...",
    query="Discuss the main points in Italian"
)

๐Ÿ” Troubleshooting

Common Issues

1. API Key Error

# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")

2. Low Audio Quality

# Use premium TTS provider
tts_config = {
    "voice_id": "Joanna",  # Polly
    "engine": "neural"     # Higher quality
}

3. Short Scripts

# Increase retrieval chunks
generator = PodcastGenerator(..., k=10)  # More content

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit: git commit -am 'Add new feature'
  4. Push: git push origin feature/new-feature
  5. Create Pull Request

๐Ÿ“œ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

  • LangChain - LLM Framework
  • Pydantic - Data validation
  • Sentence Transformers - Semantic embeddings
  • Community Contributors - Feedback and improvements

๐Ÿš€ Roadmap

  • OpenAI Integration - GPT-4 support
  • Batch Processing - Multiple file processing
  • Web Interface - Web-based GUI
  • Audio Effects - Background music, effects
  • Export Formats - MP3, WAV, OGG
  • Cloud Deployment - Docker, AWS Lambda

Last updated: December 2024 Version: 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.28.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2podcast-0.1.28-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf2podcast-0.1.28.tar.gz.

File metadata

  • Download URL: pdf2podcast-0.1.28.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.28.tar.gz
Algorithm Hash digest
SHA256 b4f376325639ffcbe29fe1dde1f8d2e5324d67a7a6bf225616ff0ac2ecaa1c67
MD5 571ccd75d4815afb1f45d31058a16b21
BLAKE2b-256 90c94e08821fdc61e611ca8587c8493f475363e60899a1e7e00f9dc4f69a7465

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.28-py3-none-any.whl.

File metadata

  • Download URL: pdf2podcast-0.1.28-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdf2podcast-0.1.28-py3-none-any.whl
Algorithm Hash digest
SHA256 334db5b15b99ca30382e6788ff9e5c7c347b48f00b2e3a6d3143e6ad66beb707
MD5 55fd485f91e15a867313c02f430682a6
BLAKE2b-256 ef84e12b583b710036e88a6838679adee147b1ca09496b5e0fdcfbab2ce804b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page