A Python library to convert PDF documents into podcasts
Project description
๐๏ธ PDF2Podcast
An advanced Python system for converting PDF documents into audio podcasts with natural dialogues between hosts and experts.
โจ Key Features
- ๐ PDF Extraction: Advanced PDF document processing
- ๐ค LLM Integration: Support for Google Gemini and other models
- ๐ฃ๏ธ Multi-TTS: Amazon Polly, Google TTS, Azure TTS, Kokoro TTS
- ๐ฌ Natural Dialogues: Realistic conversations between host and expert
- ๐ Semantic Retrieval: Intelligent content selection
- ๐ฏ Structured Chapters: Automatic organization into thematic chapters
- ๐ Multilingual: Support for multiple languages
- ๐ ๏ธ Modular: Extensible and customizable architecture
๐ Quick Install
pip install pdf2podcast
๐ Basic Usage
Simple Example
from pdf2podcast import PodcastGenerator
from pdf2podcast.core.rag import AdvancedPDFProcessor
# Setup
processor = AdvancedPDFProcessor()
generator = PodcastGenerator(
rag_system=processor,
llm_provider="gemini",
tts_provider="google",
llm_config={"api_key": "your-gemini-key"},
tts_config={"language": "en"}
)
# Generate podcast
result = generator.generate(
pdf_path="document.pdf",
output_path="podcast.mp3",
dialogue=True, # Dialogue between host and expert
query="Explain the main concepts of the document"
)
print(f"Podcast generated: {result['audio']['path']}")
print(f"Duration: {result['total_duration']:.1f} seconds")
With Direct Text
# From text instead of PDF
result = generator.generate(
text="Your text content here...",
output_path="podcast.mp3",
dialogue=True,
query="Discuss the key points"
)
๐ง Advanced Configuration
Semantic Retrieval
from pdf2podcast.core.processing import SimpleChunker, SemanticRetriever
chunker = SimpleChunker()
retriever = SemanticRetriever()
generator = PodcastGenerator(
rag_system=processor,
llm_provider="gemini",
tts_provider="kokoro",
chunker=chunker,
retriever=retriever,
k=5 # Top 5 most relevant chunks
)
Custom Prompts
from pdf2podcast.core.prompts import PodcastPromptBuilder
# Custom instructions
instructions = """
Focus on practical aspects and real-world applications.
Use concrete examples and accessible language.
"""
prompt_builder = PodcastPromptBuilder(
instructions=instructions,
dialogue=True
)
generator = PodcastGenerator(
rag_system=processor,
llm_provider="gemini",
tts_provider="azure",
llm_config={"prompt_builder": prompt_builder}
)
Multi-Voice TTS (Kokoro)
# Different voices for host and expert
generator = PodcastGenerator(
rag_system=processor,
llm_provider="gemini",
tts_provider="kokoro",
tts_config={
"voice_id": ["af_heart", "am_liam"], # [host, expert]
"language": "en"
}
)
๐ญ Output Modes
Dialogue (Recommended)
- Activation:
dialogue=True - Format: Natural conversation between S1 (host) and S2 (expert)
- Features: Interruptions, questions, clarifications
- Ideal for: Complex content, educational material
Monologue
- Activation:
dialogue=False - Format: Continuous narration
- Features: Linear flow, storytelling
- Ideal for: Narrative content, summaries
๐ Supported Providers
LLM (Large Language Models)
- Google Gemini โ (Recommended)
- OpenAI ๐ (In development)
TTS (Text-to-Speech)
- Google TTS โ - Basic quality, easy setup
- Amazon Polly โ - Professional quality
- Azure TTS โ - Advanced neural voices
- Kokoro TTS โ - Local, multiple voices, precise timing
๐ ๏ธ Architecture
๐ฆ pdf2podcast
โโโ ๐ฏ PodcastGenerator (Main orchestrator)
โโโ ๐ RAG Systems (Text extraction)
โโโ ๐ง LLM Integration (Script generation)
โโโ ๐ต TTS Engines (Voice synthesis)
โโโ ๐ Semantic Retrieval (Intelligent search)
โโโ ๐ Parser System (Output validation)
Processing Flow
- Input โ PDF/Text
- RAG โ Content extraction and cleaning
- Chunking โ Division into segments
- Retrieval โ Relevant content selection
- LLM โ Structured script generation
- Parsing โ Format validation
- TTS โ Audio conversion
- Output โ Audio file + metadata
๐ Complete Documentation
- ๐ Module Documentation - Detailed architecture
- ๐ก Advanced Examples - Use cases and configurations
- ๐ง API Reference - Complete API documentation
โ๏ธ Environment Setup
Environment Variables
# .env file
GENAI_API_KEY=your_gemini_api_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION_NAME=your_azure_region
System Dependencies
# For Kokoro TTS (optional)
pip install torch torchaudio
# For advanced PDF processing
pip install pypdf2 pdfplumber
๐ฏ Practical Examples
Scientific Paper Podcast
result = generator.generate(
pdf_path="research_paper.pdf",
output_path="research_podcast.mp3",
dialogue=True,
query="Explain methodology, results and implications",
instructions="Focus on practical applications and limitations"
)
Technical Documentation Podcast
result = generator.generate(
pdf_path="technical_manual.pdf",
output_path="tutorial_podcast.mp3",
dialogue=True,
query="Create a step-by-step tutorial",
instructions="Use concrete examples and common troubleshooting"
)
Multilingual Podcast
# Italian
generator = PodcastGenerator(
rag_system=processor,
llm_provider="gemini",
tts_provider="google",
llm_config={"language": "it"},
tts_config={"language": "it"}
)
result = generator.generate(
text="Italian content here...",
query="Discuss the main points in Italian"
)
๐ Troubleshooting
Common Issues
1. API Key Error
# Verify configuration
import os
print("Gemini Key:", os.getenv("GENAI_API_KEY")[:10] + "..." if os.getenv("GENAI_API_KEY") else "Not found")
2. Low Audio Quality
# Use premium TTS provider
tts_config = {
"voice_id": "Joanna", # Polly
"engine": "neural" # Higher quality
}
3. Short Scripts
# Increase retrieval chunks
generator = PodcastGenerator(..., k=10) # More content
๐ค Contributing
- Fork the repository
- Create feature branch:
git checkout -b feature/new-feature - Commit:
git commit -am 'Add new feature' - Push:
git push origin feature/new-feature - Create Pull Request
๐ License
MIT License - see LICENSE for details.
๐ Credits
- LangChain - LLM Framework
- Pydantic - Data validation
- Sentence Transformers - Semantic embeddings
- Community Contributors - Feedback and improvements
๐ Roadmap
- OpenAI Integration - GPT-4 support
- Batch Processing - Multiple file processing
- Web Interface - Web-based GUI
- Audio Effects - Background music, effects
- Export Formats - MP3, WAV, OGG
- Cloud Deployment - Docker, AWS Lambda
Last updated: December 2024 Version: 1.0.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2podcast-0.1.21.tar.gz.
File metadata
- Download URL: pdf2podcast-0.1.21.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55eb88b26ee8a8a4edf3517329843265365e43324872505f579c187a3f09e499
|
|
| MD5 |
f5f077dc0e96cd195163c47e98b191e6
|
|
| BLAKE2b-256 |
3964063188747d771aaa176ca012996c21f9f6cb54c440a80038baedc4e5d36e
|
File details
Details for the file pdf2podcast-0.1.21-py3-none-any.whl.
File metadata
- Download URL: pdf2podcast-0.1.21-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c5e5a6a05d848c2b1198c24e4b9a27dda3e42035287065447ef2f2288a5bb0f
|
|
| MD5 |
36cfcfad4ceed3eabcc3f9d4fe55a9c2
|
|
| BLAKE2b-256 |
47b6d664ec2b2f8495f99265f5087324ed5429fb2c1c6b34aabe1b8daa94ecc7
|