Turn any web article into a two-voice podcast via Gemini TTS, with iterative Google Search enrichment.
Project description
🎙️ tts-podcast
Turn any article, document, or search query into a two-voice podcast — scraped, researched, scripted, and voiced by Google Gemini.
Feed it URLs, local files, or a topic to search. It scrapes the sources, optionally runs iterative Google-Search-grounded research, writes a natural back-and-forth dialogue between two hosts, and synthesises an MP3 (or WAV) with Gemini's multi-speaker TTS — plus a tidy folder of Markdown reports.
✨ Features
| Feature | Description | |
|---|---|---|
| 🌐 | Any URL → podcast | Feed one or several article URLs; scraping, dialogue, and audio are handled end-to-end. |
| 📄 | Local documents | Include .txt, .md, .html, or .pdf files with -f — no network request. |
| 🔍 | Web-search queries | Pass a natural-language topic with -s; the research stage investigates it via Google Search grounding. |
| 🧠 | Iterative research | --research N runs N sequential grounded rounds, each drilling into the gaps the last one left. |
| 🎭 | Multi-voice TTS | Two distinct Gemini voices with configurable personalities, scene, and delivery cues. |
| 👥 | Named voice duos | Five built-in pairings (contrast default, warm, explorer, journalist, debate) — or define your own from all 30 prebuilt Gemini voices. |
| 🎨 | Style & angle control | Presets, free-text style, per-episode angle, and per-speaker overlays — without touching the baseline voice acting. |
| 📑 | Report folder | Generates overview.md, sources.md, script.md, research.md, and summary.md next to the audio. |
| 💸 | Token & cost tracking | Accumulates per-model token usage and estimates cost from configurable pricing. |
| 🥷 | Stealth fallback | Optional CloakBrowser retry for pages that block plain scraping (Cloudflare, 403/429, JS-only). |
🚀 Quickstart
Get a podcast out of a single URL in three steps:
# 1. Get the Gemini API key into your environment
export GEMINI_API_KEY=<your key>
# 2. Make sure ffmpeg is available (audio export needs it)
brew install ffmpeg # macOS · apt: sudo apt install ffmpeg
# 3. Run it — no install required
uvx tts-podcast run https://blog.example.com/article
That's it: you get an .mp3 plus a tts_<stem>/ folder of Markdown reports.
Want to hear the script before spending TTS tokens? Add -n for a dry run.
Prefer a permanent install or
pip? See Installation.
👥 Voice duos
A duo bundles both speakers — name, prebuilt Gemini voice, and baseline
personality — under one slug, so you swap the whole pairing at once instead of
editing speaker1 / speaker2 by hand.
tts-podcast duos # list them (no API key needed)
tts-podcast run --duo journalist https://blog.example.com/article
Built-in duos
| Slug | Speaker 1 | Speaker 2 | Vibe |
|---|---|---|---|
contrast (default) |
Puck (Upbeat) | Kore (Firm) | High timbre contrast — Google's own multi-speaker pairing |
warm |
Sulafat (Warm) | Achird (Friendly) | Accessible, mainstream feel |
explorer |
Fenrir (Excitable) | Sadaltager (Knowledgeable) | Excited explorer + calm expert; vulgarisation-friendly |
journalist |
Zephyr (Bright) | Algieba (Smooth) | Fast-paced tech-journalism feel |
debate |
Laomedeia (Upbeat) | Algenib (Gravelly) | Opposing viewpoints — optimist vs skeptic (pair with --preset debate) |
Gemini doesn't officially document voice gender; pairings are curated from each voice's official descriptor plus community reports. Audition them in Google AI Studio before committing.
Custom duos
Define your own under gemini.duos; they merge over the built-ins (same slug
overrides, a new slug adds one):
gemini:
default_duo: my_duo
duos:
my_duo:
description: "my custom pairing"
speaker1:
name: Robin
voice: Laomedeia # Upbeat
personality: "techno-optimist; champions the upside"
speaker2:
name: Sasha
voice: Algenib # Gravelly
personality: "hard-nosed skeptic; probes risks and costs"
Resolution precedence: --duo › gemini.default_duo ›
legacy gemini.speaker1 / speaker2 blocks › built-in contrast. A config
that defines only the legacy speakerN blocks keeps working unchanged.
🎚️ Usage
# Single URL, no research
tts-podcast run https://blog.example.com/article
# Multiple URLs with two rounds of complementary research
tts-podcast run -R 2 https://blog.example.com/a https://blog.example.com/b
# Local document — no network request
tts-podcast run -n -f paper.pdf
# Web-search query — research auto-bumped to 1 if it's the only input
tts-podcast run -n -s "agentic AI memory systems"
# Mixed: URL + local file + search query in one episode
tts-podcast run -n https://blog.example.com/article -f notes.md -s "follow-up topic"
# Preview the dialogue without calling TTS
tts-podcast run -n https://blog.example.com/article
# Generate script + report but skip audio synthesis
tts-podcast run -A https://blog.example.com/article
# Style & angle: nudge tone via preset + free text, focus on one angle
tts-podcast run -R 1 \
--preset academic \
--style "extra rigorous, French academic feel" \
--angle "the regulatory implications" \
https://blog.example.com/article
# Per-episode speaker overlay (TTS voice acting stays unchanged)
tts-podcast run \
--speaker1-style "more skeptical than usual" \
--speaker2-style "extra warm and forgiving" \
https://blog.example.com/article
# Opposing viewpoints, structured as a debate
tts-podcast run --duo debate --preset debate https://blog.example.com/article
Running from a source checkout? Prefix every command with
uv run(e.g.uv run tts-podcast run …).
Key flags
| Flag | Description |
|---|---|
-f, --file FILE |
Local document to include (repeatable). .txt, .md, .html, .pdf. |
-s, --search QUERY |
Web-search query to seed the podcast (repeatable). Auto-bumps research to 1 if search-only. |
-R, --research N |
Number of Google-Search-grounded research rounds (default 0). |
--duo NAME |
Named voice duo (contrast, warm, explorer, journalist, debate). |
--preset NAME |
Style preset: casual, academic, humorous, debate, vulgarized, or none. |
--style TEXT |
Free-text style guidance (≤ 500 chars). Composes with --preset. |
--speaker1-style / --speaker2-style |
Per-episode overlay for one speaker; baseline voice unchanged. |
--angle TEXT |
Episode angle. Steers the dialogue and the first research round only. |
-d, --duration MIN |
Target episode duration in minutes. |
-n, --dry-run |
Print dialogue to stdout, no TTS. |
-A, --no-audio |
Generate script + report only. |
-o, --output-dir DIR |
Output directory (overrides config). |
--no-report |
Skip the report folder. |
-v, --verbose |
Enable DEBUG logging. |
Run tts-podcast run --help for the full list.
⚙️ Configuration
Scaffold a config file, then export your Gemini API key:
tts-podcast config init
export GEMINI_API_KEY=<your key>
The config lives at $XDG_CONFIG_HOME/tts-podcast/config.yaml (typically
~/.config/tts-podcast/config.yaml). The full schema is in
config.example.yaml. The API key is read at runtime
from the env var named by gemini.api_key_env (default GEMINI_API_KEY) and
loaded from a local .env automatically.
gemini:
api_key_env: GEMINI_API_KEY
default_duo: contrast # persistent voice pairing
dialogue:
target_duration_minutes: 8
📦 Installation
uvx tts-podcast … # run without installing
uv tool install tts-podcast # persistent install via uv
pipx install tts-podcast # via pipx
pip install tts-podcast # plain pip
Optional stealth-browser fallback (pulls a ~200 MB Chromium on first run):
uv tool install "tts-podcast[cloak]"
ffmpeg is required for audio export — skip only if you stick to
--no-audio / --dry-run:
brew install ffmpeg # macOS
sudo apt install ffmpeg # Debian / Ubuntu
From source
git clone https://github.com/obeone/tts-podcast.git
cd tts-podcast
uv sync # Python 3.13+
uv run tts-podcast --help
📂 Output layout
<output_dir>/
├── <stem>.mp3
└── tts_<stem>/
├── overview.md # metadata, link breakdown, token/cost summary
├── sources.md # per-source content (title, URL, summary, full text)
├── script.md # full two-host dialogue
├── research.md # only when --research >= 1
└── summary.md # synthetic reference sheet with categorised links
The stem combines the first URL's hostname, a 6-char digest of the URL list,
and today's date — e.g. arxiv.org-a1b2c3-2026-06-07.mp3.
💸 Research cost note
Each --research round is a separate Gemini call with Google Search grounding
enabled, which adds search overhead to the standard input-token cost. The tool
logs the cumulative cost after each round, so you can watch the bill while
iterating.
🧪 Development
uv sync # install deps (Python 3.13+)
uv run pytest tests/ -q # run the test suite
uv run ruff check src/ tests/ # lint
Tests mock the Gemini SDK rather than hitting the network. See
CLAUDE.md for the architecture deep-dive and key invariants.
🔊 How it works
flowchart TB
subgraph IN[" Inputs "]
U[🌐 URLs]
F[📄 Files<br/>txt · md · html · pdf]
S[🔍 Search queries]
end
U --> SC[web_scraper]
F --> LL[local_loader]
S --> SY[synthetic source]
SC --> R{🧠 Research?<br/>--research N}
LL --> R
SY --> R
R -->|optional| RR[Google Search<br/>grounded rounds]
R --> D[💬 llm_summarizer<br/>two-host dialogue]
RR --> D
D --> T[🎙️ Gemini multi-speaker TTS<br/>parallel chunks]
T --> A[🎧 audio_exporter<br/>MP3 / WAV]
D --> REP[📑 report_generator<br/>Markdown folder]
The pipeline is strictly linear: each stage hands typed data to the next, no hidden shared state. Scrape failures don't abort the run — it continues with whatever succeeded.
📝 License
MIT © Grégoire Compagnon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tts_podcast-0.5.0.tar.gz.
File metadata
- Download URL: tts_podcast-0.5.0.tar.gz
- Upload date:
- Size: 80.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
607d6ad3c4dd24baf9b5cf18eaca1c70b03c55a9760f63f0e7613cf9c1b23dbc
|
|
| MD5 |
b137483c7472f0fb6fae93a4fe6572ab
|
|
| BLAKE2b-256 |
8e2ee3157f4186c88b5afd915f7f61bf42d767f8acbd9461a60db46dc9673d7f
|
Provenance
The following attestation bundles were made for tts_podcast-0.5.0.tar.gz:
Publisher:
publish.yml on obeone/tts-podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tts_podcast-0.5.0.tar.gz -
Subject digest:
607d6ad3c4dd24baf9b5cf18eaca1c70b03c55a9760f63f0e7613cf9c1b23dbc - Sigstore transparency entry: 1754862371
- Sigstore integration time:
-
Permalink:
obeone/tts-podcast@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/obeone
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tts_podcast-0.5.0-py3-none-any.whl.
File metadata
- Download URL: tts_podcast-0.5.0-py3-none-any.whl
- Upload date:
- Size: 64.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
663fc9934669f41bb574c1f7d723726c6aa41426683b58507051d3dad24aaf34
|
|
| MD5 |
90bc1451e202ab1e119e2ce805fd01c9
|
|
| BLAKE2b-256 |
66845891236d024658e66b4ad263b40a048652e689fe73d8c71c5a4fc00285b8
|
Provenance
The following attestation bundles were made for tts_podcast-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on obeone/tts-podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tts_podcast-0.5.0-py3-none-any.whl -
Subject digest:
663fc9934669f41bb574c1f7d723726c6aa41426683b58507051d3dad24aaf34 - Sigstore transparency entry: 1754862377
- Sigstore integration time:
-
Permalink:
obeone/tts-podcast@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/obeone
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140 -
Trigger Event:
release
-
Statement type: