Skip to main content

AI-powered tool to convert blog articles into podcast audio with optional voice cloning

Project description

๐ŸŽ™๏ธ Blog2Podcasts

PyPI version Python 3.9+ License: MIT

An AI-powered tool that converts any blog article into a podcast audio file with optional voice cloning from YouTube.

Features

  • ๐ŸŒ Web Scraping: Extracts main content from any blog URL using trafilatura
  • ๐Ÿค– AI Summarization: Converts articles into engaging podcast scripts using local LLMs (Ollama)
  • ๐ŸŽต Text-to-Speech: Generates high-quality audio using Microsoft Edge TTS (free)
  • ๐ŸŽค Voice Cloning: Clone voices from YouTube videos using Coqui TTS (XTTS-v2)

Installation

From PyPI

pip install blog2podcasts

From Source

git clone https://github.com/QuantBender/blog2podcasts.git
cd blog2podcasts
pip install -e .

With Voice Cloning Support

pip install blog2podcasts[voice-cloning]

Prerequisites

1. Install Ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# macOS
brew install ollama

# Start Ollama service
ollama serve

2. Pull an LLM Model

# Recommended: Llama 3.2 (fast and capable)
ollama pull llama3.2

# Alternative options:
ollama pull mistral      # Fast, good quality
ollama pull llama3.1     # More capable, slower
ollama pull phi3         # Small, fast

3. Install ffmpeg (for audio processing)

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Usage

Command Line

# Convert a blog to podcast
blog2podcasts https://example.com/blog-article

# Use a different voice
blog2podcasts https://example.com/blog --voice en-GB-RyanNeural

# Use a different LLM model
blog2podcasts https://example.com/blog --model mistral

# Adjust script length (words)
blog2podcasts https://example.com/blog --length 1200

# Preview script without generating audio
blog2podcasts https://example.com/blog --preview

# List available voices
blog2podcasts --list-voices

# Custom output name
blog2podcasts https://example.com/blog -o my_podcast

# Adjust speech rate
blog2podcasts https://example.com/blog --rate "+10%"

๐ŸŽค Voice Cloning from YouTube

Clone any voice from YouTube videos and use it for your podcasts!

# Clone voice from a YouTube video
blog2podcasts --clone-voice "https://www.youtube.com/watch?v=VIDEO_ID" --voice-name "my_host"

# Generate podcast with cloned voice
blog2podcasts https://example.com/blog --use-cloned-voice my_host

Python API

from blog2podcasts import BlogScraper, ContentSummarizer, AudioGenerator
from blog2podcasts.cli import BlogToPodcastAgent, PodcastConfig

# Create agent with custom config
config = PodcastConfig(
    voice="en-US-JennyNeural",  # Female US voice
    model="llama3.2",           # Ollama model
    script_length=1000,         # Target words
    output_dir="podcasts",      # Output folder
)

agent = BlogToPodcastAgent(config)

# Convert blog to podcast
result = agent.convert("https://example.com/interesting-article")

print(f"Audio: {result['audio_path']}")
print(f"Script: {result['script_path']}")

Use Individual Components

from blog2podcasts import BlogScraper, ContentSummarizer, AudioGenerator
import asyncio

# Just scrape a blog
scraper = BlogScraper()
content = scraper.scrape("https://example.com/blog")
print(content.title, content.text)

# Just create a podcast script
summarizer = ContentSummarizer(model="llama3.2")
script = summarizer.generate_podcast_script(content.text, content.title)

# Just generate audio
generator = AudioGenerator(voice="en-US-GuyNeural")
asyncio.run(generator.generate_audio(script, "output.mp3"))

Available Voices

Recommended Podcast Voices

Voice ID Style
๐Ÿ‡บ๐Ÿ‡ธ Guy (Male) en-US-GuyNeural Professional, clear
๐Ÿ‡บ๐Ÿ‡ธ Jenny (Female) en-US-JennyNeural Friendly, warm
๐Ÿ‡ฌ๐Ÿ‡ง Ryan (Male) en-GB-RyanNeural British, authoritative
๐Ÿ‡ฌ๐Ÿ‡ง Sonia (Female) en-GB-SoniaNeural British, professional
๐Ÿ‡ฆ๐Ÿ‡บ William (Male) en-AU-WilliamNeural Australian, casual
๐Ÿ‡ฆ๐Ÿ‡บ Natasha (Female) en-AU-NatashaNeural Australian, friendly

Run blog2podcasts --list-voices to see all available voices.

Tech Stack

Component Tool Why
Scraping Trafilatura Best-in-class article extraction
LLM Ollama Free, local, private LLM inference
TTS Edge-TTS High-quality, free Microsoft voices
Voice Cloning Coqui TTS Open-source XTTS-v2 voice cloning
YouTube Download yt-dlp Extract audio from YouTube videos

Project Structure

blog2podcasts/
โ”œโ”€โ”€ pyproject.toml        # Package configuration
โ”œโ”€โ”€ LICENSE              # MIT License
โ”œโ”€โ”€ README.md            # This file
โ”œโ”€โ”€ CHANGELOG.md         # Version history
โ”œโ”€โ”€ blog2podcasts/
โ”‚   โ”œโ”€โ”€ __init__.py      # Package exports
โ”‚   โ”œโ”€โ”€ cli.py           # Command-line interface
โ”‚   โ”œโ”€โ”€ scraper.py       # Blog content extraction
โ”‚   โ”œโ”€โ”€ summarizer.py    # LLM-based script generation
โ”‚   โ”œโ”€โ”€ audio_generator.py # Text-to-speech (Edge TTS)
โ”‚   โ””โ”€โ”€ voice_cloner.py  # YouTube voice extraction & cloning
โ”œโ”€โ”€ voices/              # Saved voice profiles
โ””โ”€โ”€ output/              # Generated podcasts

How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Blog URL  โ”‚ -> โ”‚   Scraper   โ”‚ -> โ”‚ Summarizer  โ”‚ -> โ”‚  Edge TTS   โ”‚
โ”‚             โ”‚    โ”‚ (trafilatura)โ”‚    โ”‚  (Ollama)   โ”‚    โ”‚             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚                  โ”‚                  โ”‚
                          v                  v                  v
                    Blog Content      Podcast Script      Audio (.mp3)

Troubleshooting

"Ollama not available"

# Start Ollama service
ollama serve

# Check if running
curl http://localhost:11434/api/tags

"Model not found"

# Pull the model
ollama pull llama3.2

# List available models
ollama list

"Content extraction failed"

  • Some sites block scraping - try a different blog
  • Check if the URL is accessible
  • The fallback scraper will try BeautifulSoup

License

MIT License - Use freely for personal and commercial projects.

Contributing

Pull requests welcome! See CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blog2podcasts-1.0.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blog2podcasts-1.0.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file blog2podcasts-1.0.0.tar.gz.

File metadata

  • Download URL: blog2podcasts-1.0.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for blog2podcasts-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7ed1cd12cff727a675adfe86f24d5a577de57f5c6b1d49997db3d9f986703f0b
MD5 98f3be2cba2de53d59f5ca06993a0b1e
BLAKE2b-256 c126882527163e6b4f96febddabf0e91609b6ca3c317b450e5f84b3e113b7cbd

See more details on using hashes here.

File details

Details for the file blog2podcasts-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: blog2podcasts-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for blog2podcasts-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ff4dd8e4f24a14e7bba37de4ed89d024b82e2c80f8c8292948fc890a806eb3
MD5 894dfba7585241f378dad9eec7a339e6
BLAKE2b-256 e2a6a5370f69fa606bcdaedee3067b8abab9a3794c4066c38324085c125e8a82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page