A Python library to convert PDF documents into podcasts

Project description

pdf2podcast

A Python library to convert PDF documents into podcasts using LLMs and Text-to-Speech.

Installation

pip install pdf2podcast

Requirements

Python 3.8 or higher
Google API key for Gemini LLM
AWS credentials for Polly TTS (optional, can use Google TTS instead)

Dependencies

This library uses several key technologies:

Text Processing
- PyMuPDF: Advanced PDF processing with metadata and image caption extraction
- sentence-transformers: Text embeddings for semantic analysis
- faiss-cpu: Fast similarity search for text chunks
Language Models
- langchain-google-genai: Integration with Google's Gemini LLM
- langchain-community: Core LangChain functionality
- accelerate: ML model optimization
Text-to-Speech
- boto3: AWS Polly integration for high-quality TTS
- ffmpeg-python: Audio processing and manipulation
- gTTS: Google Text-to-Speech alternative
Utils
- python-dotenv: Environment variable management
- pydantic: Data validation and settings management
- datasets: Data handling utilities

Quick Start

from pdf2podcast import PodcastGenerator, SimplePDFProcessor

# Initialize PDF processor with advanced features
pdf_processor = SimplePDFProcessor(
    max_chars_per_chunk=8000,    # Customize chunk size
    extract_images=True,         # Include image captions
    metadata=True               # Include document metadata
)

# Create podcast generator with configuration
generator = PodcastGenerator(
    rag_system=pdf_processor,
    llm_type="gemini",      # Specify LLM provider
    tts_type="aws",         # Specify TTS provider
    llm_config={
        "api_key": "your_google_api_key",
        "model_name": "gemini-1.5-flash",
        "temperature": 0.2
    },
    tts_config={
        "voice_id": "Joanna",
        "region_name": "us-west-2"
    }
)

# Generate podcast
result = generator.generate(
    pdf_path="document.pdf",
    output_path="podcast.mp3",
    complexity="intermediate",  # Options: "simple", "intermediate", "advanced"
    voice_id="Joanna"  # Optional: override default voice
)

# Access results
print(f"Generated podcast: {result['audio']['path']}")
print(f"Audio size: {result['audio']['size']} bytes")
print(f"Script length: {len(result['script'])} characters")

Available Providers

LLM Providers

"gemini": Google's Gemini LLM
- Requires: GENAI_API_KEY
- Configuration options:
  - model_name: Model version to use
  - temperature: Output randomness (0-1)
  - max_output_tokens: Maximum output length
  - top_p: Nucleus sampling parameter
  - streaming: Enable/disable streaming mode
  - prompt_builder: Custom prompt builder instance

TTS Providers

"aws": Amazon Polly
- Requires: AWS credentials
- Configuration options:
  - voice_id: Voice to use (e.g., "Joanna", "Matthew")
  - region_name: AWS region
  - engine: "standard" or "neural"
"google": Google Text-to-Speech
- No API key required
- Configuration options:
  - language: Language code (e.g., "en", "es")
  - tld: Top-level domain for accent (e.g., "com", "co.uk")
  - slow: Speech speed

PDF Processing Features

The library offers advanced PDF processing capabilities:

Basic Features

Metadata extraction (title, author, subject, keywords)
Image caption extraction from documents
Efficient processing of large documents
Support for complex PDF layouts

Text Processing

Smart text chunking with customizable size
Paragraph-aware text splitting
Sentence boundary preservation

Semantic Search & Retrieval

Vector-based semantic search using FAISS
Embedding generation with Sentence Transformers
Retrieval of relevant text chunks based on queries

Configuration

Environment Variables

You can set these environment variables instead of passing them directly:

GENAI_API_KEY=your_google_api_key
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_DEFAULT_REGION=your_aws_region

Advanced Configuration Examples

High-Quality Production Setup

generator = PodcastGenerator(
    rag_system=processor,
    llm_type="gemini",
    tts_type="aws",
    llm_config={
        "model_name": "gemini-1.5-flash",
        "temperature": 0.2,
        "top_p": 0.9,
        "max_output_tokens": 8192,
        "streaming": True
    },
    tts_config={
        "voice_id": "Joanna",
        "engine": "neural",
        "region_name": "us-west-2"
    }
)

Fast Development Setup

generator = PodcastGenerator(
    rag_system=processor,
    llm_type="gemini",
    tts_type="google",  # Faster, no API key needed
    llm_config={
        "temperature": 0.3,
        "max_output_tokens": 4096
    },
    tts_config={
        "language": "en",
        "tld": "com"
    }
)

Custom Prompt Builders

You can customize how content is processed by creating custom prompt builders:

from pdf2podcast.core.base import BasePromptBuilder

class TechnicalPromptBuilder(BasePromptBuilder):
    """Specialized for technical documentation."""
    
    def build_prompt(self, text: str, **kwargs) -> str:
        return (
            "Create a technical podcast script following these guidelines:\n"
            "1. Start with a high-level overview\n"
            "2. Break down complex concepts step by step\n"
            "3. Include practical examples\n\n"
            f"Content: {text}\n"
            f"Complexity: {kwargs.get('complexity', 'intermediate')}"
        )

# Use custom prompt builder
generator = PodcastGenerator(
    rag_system=processor,
    llm_type="gemini",
    tts_type="aws",
    llm_config={
        "prompt_builder": TechnicalPromptBuilder(),
        "temperature": 0.2
    }
)

Common Use Cases

Academic Paper Processing

generator = PodcastGenerator(
    rag_system=SimplePDFProcessor(
        max_chars_per_chunk=8000,
        extract_images=True,
        metadata=True
    ),
    llm_type="gemini",
    tts_type="aws",
    llm_config={
        "temperature": 0.2,
        "max_output_tokens": 8192
    },
    tts_config={
        "voice_id": "Joanna",
        "engine": "neural"
    }
)

result = generator.generate(
    pdf_path="paper.pdf",
    output_path="paper_podcast.mp3",
    complexity="advanced",
    query="Focus on methodology and key findings"
)

Business Report Summary

generator = PodcastGenerator(
    rag_system=SimplePDFProcessor(
        max_chars_per_chunk=4000
    ),
    llm_type="gemini",
    tts_type="aws",
    llm_config={
        "temperature": 0.3,
        "max_output_tokens": 4096
    }
)

result = generator.generate(
    pdf_path="report.pdf",
    output_path="summary.mp3",
    complexity="intermediate",
    query="Summarize key business metrics and trends"
)

Educational Content

generator = PodcastGenerator(
    rag_system=SimplePDFProcessor(extract_images=True),
    llm_type="gemini",
    tts_type="google",
    llm_config={
        "temperature": 0.4,
        "max_output_tokens": 6144
    },
    tts_config={
        "language": "en",
        "tld": "com",
        "slow": True  # Better for learning
    }
)

result = generator.generate(
    pdf_path="lesson.pdf",
    output_path="tutorial.mp3",
    complexity="simple"
)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.1.28

Dec 17, 2025

0.1.27

Dec 17, 2025

0.1.26

Dec 17, 2025

0.1.25

Dec 17, 2025

0.1.24

Dec 17, 2025

0.1.23

Dec 17, 2025

0.1.22

Dec 17, 2025

0.1.21

Nov 17, 2025

0.1.20

Nov 14, 2025

0.1.19

Nov 13, 2025

0.1.18

Sep 4, 2025

0.1.17

Aug 4, 2025

0.1.16

Aug 1, 2025

0.1.15

Jul 28, 2025

0.1.14

Jun 16, 2025

0.1.13

May 26, 2025

0.1.12

May 23, 2025

0.1.11

May 16, 2025

0.1.10

Apr 22, 2025

0.1.9

Apr 22, 2025

0.1.8

Apr 22, 2025

0.1.7

Apr 18, 2025

0.1.6

Apr 3, 2025

0.1.5

Apr 3, 2025

0.1.4

Mar 28, 2025

0.1.3

Mar 22, 2025

0.1.2

Mar 20, 2025

0.1.1

Mar 18, 2025

This version

0.1.0

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2podcast-0.1.0.tar.gz (17.6 kB view details)

Uploaded Mar 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf2podcast-0.1.0-py3-none-any.whl (18.5 kB view details)

Uploaded Mar 18, 2025 Python 3

File details

Details for the file pdf2podcast-0.1.0.tar.gz.

File metadata

Download URL: pdf2podcast-0.1.0.tar.gz
Upload date: Mar 18, 2025
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a7643cf61221fe949d1e7b73fb0f8e3d9314ade08fb14275f5403cbdb9616cf4`
MD5	`b312bf99fec0fe4c8066405b27c6e5cd`
BLAKE2b-256	`63a0972a5d6765e80952425b51c0f3f8ff44ad00d2746fab1f011fd463092aa7`

See more details on using hashes here.

File details

Details for the file pdf2podcast-0.1.0-py3-none-any.whl.

File metadata

Download URL: pdf2podcast-0.1.0-py3-none-any.whl
Upload date: Mar 18, 2025
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pdf2podcast-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`366b2d8815befb6b92dbacd19ca0cb67c0ed4e2ade42eb3be02305158b894692`
MD5	`2ac9d939d94bcae55b2c2e72d1b96b30`
BLAKE2b-256	`0d1295bd242ee2491bcc952efa9881dc7270345d5598cb483b24bea838b92684`

See more details on using hashes here.

pdf2podcast 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pdf2podcast

Installation

Requirements

Dependencies

Quick Start

Available Providers

LLM Providers

TTS Providers

PDF Processing Features

Basic Features

Text Processing

Semantic Search & Retrieval

Configuration

Environment Variables

Advanced Configuration Examples

High-Quality Production Setup

Fast Development Setup

Custom Prompt Builders

Common Use Cases

Academic Paper Processing

Business Report Summary

Educational Content

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes