Skip to main content

Turn any web article into a two-voice podcast via Gemini TTS, with iterative Google Search enrichment.

Project description

🎙️ tts-podcast

PyPI Python Gemini TTS License

Turn any article, document, or search query into a two-voice podcast — scraped, researched, scripted, and voiced by Google Gemini.

Feed it URLs, local files, or a topic to search. It scrapes the sources, optionally runs iterative Google-Search-grounded research, writes a natural back-and-forth dialogue between two hosts, and synthesises an MP3 (or WAV) with Gemini's multi-speaker TTS — plus a tidy folder of Markdown reports.


✨ Features

Feature Description
🌐 Any URL → podcast Feed one or several article URLs; scraping, dialogue, and audio are handled end-to-end.
📄 Local documents Include .txt, .md, .html, or .pdf files with -f — no network request.
🔍 Web-search queries Pass a natural-language topic with -s; the research stage investigates it via Google Search grounding.
🧠 Iterative research --research N runs N sequential grounded rounds, each drilling into the gaps the last one left.
🎭 Multi-voice TTS Two distinct Gemini voices with configurable personalities, scene, and delivery cues.
👥 Named voice duos Five built-in pairings (contrast default, warm, explorer, journalist, debate) — or define your own from all 30 prebuilt Gemini voices.
🎨 Style & angle control Presets, free-text style, per-episode angle, and per-speaker overlays — without touching the baseline voice acting.
📑 Report folder Generates overview.md, sources.md, script.md, research.md, and summary.md next to the audio.
💸 Token & cost tracking Accumulates per-model token usage and estimates cost from configurable pricing.
🥷 Stealth fallback Optional CloakBrowser retry for pages that block plain scraping (Cloudflare, 403/429, JS-only).

🚀 Quickstart

Get a podcast out of a single URL in three steps:

# 1. Get the Gemini API key into your environment
export GEMINI_API_KEY=<your key>

# 2. Make sure ffmpeg is available (audio export needs it)
brew install ffmpeg            # macOS  ·  apt: sudo apt install ffmpeg

# 3. Run it — no install required
uvx tts-podcast run https://blog.example.com/article

That's it: you get an .mp3 plus a tts_<stem>/ folder of Markdown reports. Want to hear the script before spending TTS tokens? Add -n for a dry run.

Prefer a permanent install or pip? See Installation.


👥 Voice duos

A duo bundles both speakers — name, prebuilt Gemini voice, and baseline personality — under one slug, so you swap the whole pairing at once instead of editing speaker1 / speaker2 by hand.

tts-podcast duos          # list them (no API key needed)
tts-podcast run --duo journalist https://blog.example.com/article

Built-in duos

Slug Speaker 1 Speaker 2 Vibe
contrast (default) Puck (Upbeat) Kore (Firm) High timbre contrast — Google's own multi-speaker pairing
warm Sulafat (Warm) Achird (Friendly) Accessible, mainstream feel
explorer Fenrir (Excitable) Sadaltager (Knowledgeable) Excited explorer + calm expert; vulgarisation-friendly
journalist Zephyr (Bright) Algieba (Smooth) Fast-paced tech-journalism feel
debate Laomedeia (Upbeat) Algenib (Gravelly) Opposing viewpoints — optimist vs skeptic (pair with --preset debate)

Gemini doesn't officially document voice gender; pairings are curated from each voice's official descriptor plus community reports. Audition them in Google AI Studio before committing.

Custom duos

Define your own under gemini.duos; they merge over the built-ins (same slug overrides, a new slug adds one):

gemini:
  default_duo: my_duo
  duos:
    my_duo:
      description: "my custom pairing"
      speaker1:
        name: Robin
        voice: Laomedeia   # Upbeat
        personality: "techno-optimist; champions the upside"
      speaker2:
        name: Sasha
        voice: Algenib     # Gravelly
        personality: "hard-nosed skeptic; probes risks and costs"

Resolution precedence: --duogemini.default_duo › legacy gemini.speaker1 / speaker2 blocks › built-in contrast. A config that defines only the legacy speakerN blocks keeps working unchanged.


🎚️ Usage

# Single URL, no research
tts-podcast run https://blog.example.com/article

# Multiple URLs with two rounds of complementary research
tts-podcast run -R 2 https://blog.example.com/a https://blog.example.com/b

# Local document — no network request
tts-podcast run -n -f paper.pdf

# Web-search query — research auto-bumped to 1 if it's the only input
tts-podcast run -n -s "agentic AI memory systems"

# Mixed: URL + local file + search query in one episode
tts-podcast run -n https://blog.example.com/article -f notes.md -s "follow-up topic"

# Preview the dialogue without calling TTS
tts-podcast run -n https://blog.example.com/article

# Generate script + report but skip audio synthesis
tts-podcast run -A https://blog.example.com/article

# Style & angle: nudge tone via preset + free text, focus on one angle
tts-podcast run -R 1 \
    --preset academic \
    --style "extra rigorous, French academic feel" \
    --angle "the regulatory implications" \
    https://blog.example.com/article

# Per-episode speaker overlay (TTS voice acting stays unchanged)
tts-podcast run \
    --speaker1-style "more skeptical than usual" \
    --speaker2-style "extra warm and forgiving" \
    https://blog.example.com/article

# Opposing viewpoints, structured as a debate
tts-podcast run --duo debate --preset debate https://blog.example.com/article

Running from a source checkout? Prefix every command with uv run (e.g. uv run tts-podcast run …).

Key flags

Flag Description
-f, --file FILE Local document to include (repeatable). .txt, .md, .html, .pdf.
-s, --search QUERY Web-search query to seed the podcast (repeatable). Auto-bumps research to 1 if search-only.
-R, --research N Number of Google-Search-grounded research rounds (default 0).
--duo NAME Named voice duo (contrast, warm, explorer, journalist, debate).
--preset NAME Style preset: casual, academic, humorous, debate, vulgarized, or none.
--style TEXT Free-text style guidance (≤ 500 chars). Composes with --preset.
--speaker1-style / --speaker2-style Per-episode overlay for one speaker; baseline voice unchanged.
--angle TEXT Episode angle. Steers the dialogue and the first research round only.
-d, --duration MIN Target episode duration in minutes.
-n, --dry-run Print dialogue to stdout, no TTS.
-A, --no-audio Generate script + report only.
-o, --output-dir DIR Output directory (overrides config).
--no-report Skip the report folder.
-v, --verbose Enable DEBUG logging.

Run tts-podcast run --help for the full list.


⚙️ Configuration

Scaffold a config file, then export your Gemini API key:

tts-podcast config init
export GEMINI_API_KEY=<your key>

The config lives at $XDG_CONFIG_HOME/tts-podcast/config.yaml (typically ~/.config/tts-podcast/config.yaml). The full schema is in config.example.yaml. The API key is read at runtime from the env var named by gemini.api_key_env (default GEMINI_API_KEY) and loaded from a local .env automatically.

gemini:
  api_key_env: GEMINI_API_KEY
  default_duo: contrast        # persistent voice pairing
  dialogue:
    target_duration_minutes: 8

📦 Installation

uvx tts-podcast                 # run without installing
uv tool install tts-podcast      # persistent install via uv
pipx install tts-podcast         # via pipx
pip install tts-podcast          # plain pip

Optional stealth-browser fallback (pulls a ~200 MB Chromium on first run):

uv tool install "tts-podcast[cloak]"

ffmpeg is required for audio export — skip only if you stick to --no-audio / --dry-run:

brew install ffmpeg          # macOS
sudo apt install ffmpeg      # Debian / Ubuntu

From source

git clone https://github.com/obeone/tts-podcast.git
cd tts-podcast
uv sync                      # Python 3.13+
uv run tts-podcast --help

📂 Output layout

<output_dir>/
├── <stem>.mp3
└── tts_<stem>/
    ├── overview.md       # metadata, link breakdown, token/cost summary
    ├── sources.md        # per-source content (title, URL, summary, full text)
    ├── script.md         # full two-host dialogue
    ├── research.md       # only when --research >= 1
    └── summary.md        # synthetic reference sheet with categorised links

The stem combines the first URL's hostname, a 6-char digest of the URL list, and today's date — e.g. arxiv.org-a1b2c3-2026-06-07.mp3.


💸 Research cost note

Each --research round is a separate Gemini call with Google Search grounding enabled, which adds search overhead to the standard input-token cost. The tool logs the cumulative cost after each round, so you can watch the bill while iterating.


🧪 Development

uv sync                          # install deps (Python 3.13+)
uv run pytest tests/ -q          # run the test suite
uv run ruff check src/ tests/    # lint

Tests mock the Gemini SDK rather than hitting the network. See CLAUDE.md for the architecture deep-dive and key invariants.


🔊 How it works

flowchart TB
    subgraph IN[" Inputs "]
        U[🌐 URLs]
        F[📄 Files<br/>txt · md · html · pdf]
        S[🔍 Search queries]
    end

    U --> SC[web_scraper]
    F --> LL[local_loader]
    S --> SY[synthetic source]

    SC --> R{🧠 Research?<br/>--research N}
    LL --> R
    SY --> R

    R -->|optional| RR[Google Search<br/>grounded rounds]
    R --> D[💬 llm_summarizer<br/>two-host dialogue]
    RR --> D

    D --> T[🎙️ Gemini multi-speaker TTS<br/>parallel chunks]
    T --> A[🎧 audio_exporter<br/>MP3 / WAV]
    D --> REP[📑 report_generator<br/>Markdown folder]

The pipeline is strictly linear: each stage hands typed data to the next, no hidden shared state. Scrape failures don't abort the run — it continues with whatever succeeded.


📝 License

MIT © Grégoire Compagnon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tts_podcast-0.5.0.tar.gz (80.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tts_podcast-0.5.0-py3-none-any.whl (64.3 kB view details)

Uploaded Python 3

File details

Details for the file tts_podcast-0.5.0.tar.gz.

File metadata

  • Download URL: tts_podcast-0.5.0.tar.gz
  • Upload date:
  • Size: 80.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tts_podcast-0.5.0.tar.gz
Algorithm Hash digest
SHA256 607d6ad3c4dd24baf9b5cf18eaca1c70b03c55a9760f63f0e7613cf9c1b23dbc
MD5 b137483c7472f0fb6fae93a4fe6572ab
BLAKE2b-256 8e2ee3157f4186c88b5afd915f7f61bf42d767f8acbd9461a60db46dc9673d7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_podcast-0.5.0.tar.gz:

Publisher: publish.yml on obeone/tts-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tts_podcast-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: tts_podcast-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 64.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tts_podcast-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 663fc9934669f41bb574c1f7d723726c6aa41426683b58507051d3dad24aaf34
MD5 90bc1451e202ab1e119e2ce805fd01c9
BLAKE2b-256 66845891236d024658e66b4ad263b40a048652e689fe73d8c71c5a4fc00285b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_podcast-0.5.0-py3-none-any.whl:

Publisher: publish.yml on obeone/tts-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page