Turn any web article into a two-voice podcast via Gemini TTS, with iterative Google Search enrichment.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

obeone

These details have not been verified by PyPI

Project description

🎙️ tts-podcast

Python Gemini TTS License

Turn any article, document, or search query into a two-voice podcast — scraped, researched, scripted, and voiced by Google Gemini.

Feed it URLs, local files, or a topic to search. It scrapes the sources, optionally runs iterative Google-Search-grounded research, writes a natural back-and-forth dialogue between two hosts, and synthesises an MP3 (or WAV) with Gemini's multi-speaker TTS — plus a tidy folder of Markdown reports.

✨ Features

	Feature	Description
🌐	Any URL → podcast	Feed one or several article URLs; scraping, dialogue, and audio are handled end-to-end.
📄	Local documents	Include `.txt`, `.md`, `.html`, or `.pdf` files with `-f` — no network request.
🔍	Web-search queries	Pass a natural-language topic with `-s`; the research stage investigates it via Google Search grounding.
🧠	Iterative research	`--research N` runs N sequential grounded rounds, each drilling into the gaps the last one left.
🎭	Multi-voice TTS	Two distinct Gemini voices with configurable personalities, scene, and delivery cues.
👥	Named voice duos	Five built-in pairings (`contrast` default, `warm`, `explorer`, `journalist`, `debate`) — or define your own from all 30 prebuilt Gemini voices.
🎨	Style & angle control	Presets, free-text style, per-episode angle, and per-speaker overlays — without touching the baseline voice acting.
📑	Report folder	Generates `overview.md`, `sources.md`, `script.md`, `research.md`, and `summary.md` next to the audio.
💸	Token & cost tracking	Accumulates per-model token usage and estimates cost from configurable pricing.
🥷	Stealth fallback	Optional CloakBrowser retry for pages that block plain scraping (Cloudflare, 403/429, JS-only).

🚀 Quickstart

Get a podcast out of a single URL in three steps:

# 1. Get the Gemini API key into your environment
export GEMINI_API_KEY=<your key>

# 2. Make sure ffmpeg is available (audio export needs it)
brew install ffmpeg            # macOS  ·  apt: sudo apt install ffmpeg

# 3. Run it — no install required
uvx tts-podcast run https://blog.example.com/article

That's it: you get an .mp3 plus a tts_<stem>/ folder of Markdown reports. Want to hear the script before spending TTS tokens? Add -n for a dry run.

Prefer a permanent install or pip? See Installation.

👥 Voice duos

A duo bundles both speakers — name, prebuilt Gemini voice, and baseline personality — under one slug, so you swap the whole pairing at once instead of editing speaker1 / speaker2 by hand.

tts-podcast duos          # list them (no API key needed)
tts-podcast run --duo journalist https://blog.example.com/article

Built-in duos

Slug	Speaker 1	Speaker 2	Vibe
`contrast` (default)	Puck (Upbeat)	Kore (Firm)	High timbre contrast — Google's own multi-speaker pairing
`warm`	Sulafat (Warm)	Achird (Friendly)	Accessible, mainstream feel
`explorer`	Fenrir (Excitable)	Sadaltager (Knowledgeable)	Excited explorer + calm expert; vulgarisation-friendly
`journalist`	Zephyr (Bright)	Algieba (Smooth)	Fast-paced tech-journalism feel
`debate`	Laomedeia (Upbeat)	Algenib (Gravelly)	Opposing viewpoints — optimist vs skeptic (pair with `--preset debate`)

Gemini doesn't officially document voice gender; pairings are curated from each voice's official descriptor plus community reports. Audition them in Google AI Studio before committing.

Custom duos

Define your own under gemini.duos; they merge over the built-ins (same slug overrides, a new slug adds one):

gemini:
  default_duo: my_duo
  duos:
    my_duo:
      description: "my custom pairing"
      speaker1:
        name: Robin
        voice: Laomedeia   # Upbeat
        personality: "techno-optimist; champions the upside"
      speaker2:
        name: Sasha
        voice: Algenib     # Gravelly
        personality: "hard-nosed skeptic; probes risks and costs"

Resolution precedence: --duo › gemini.default_duo › legacy gemini.speaker1 / speaker2 blocks › built-in contrast. A config that defines only the legacy speakerN blocks keeps working unchanged.

🎚️ Usage

# Single URL, no research
tts-podcast run https://blog.example.com/article

# Multiple URLs with two rounds of complementary research
tts-podcast run -R 2 https://blog.example.com/a https://blog.example.com/b

# Local document — no network request
tts-podcast run -n -f paper.pdf

# Web-search query — research auto-bumped to 1 if it's the only input
tts-podcast run -n -s "agentic AI memory systems"

# Mixed: URL + local file + search query in one episode
tts-podcast run -n https://blog.example.com/article -f notes.md -s "follow-up topic"

# Preview the dialogue without calling TTS
tts-podcast run -n https://blog.example.com/article

# Generate script + report but skip audio synthesis
tts-podcast run -A https://blog.example.com/article

# Style & angle: nudge tone via preset + free text, focus on one angle
tts-podcast run -R 1 \
    --preset academic \
    --style "extra rigorous, French academic feel" \
    --angle "the regulatory implications" \
    https://blog.example.com/article

# Per-episode speaker overlay (TTS voice acting stays unchanged)
tts-podcast run \
    --speaker1-style "more skeptical than usual" \
    --speaker2-style "extra warm and forgiving" \
    https://blog.example.com/article

# Opposing viewpoints, structured as a debate
tts-podcast run --duo debate --preset debate https://blog.example.com/article

Running from a source checkout? Prefix every command with uv run (e.g. uv run tts-podcast run …).

Key flags

Flag	Description
`-f, --file FILE`	Local document to include (repeatable). `.txt`, `.md`, `.html`, `.pdf`.
`-s, --search QUERY`	Web-search query to seed the podcast (repeatable). Auto-bumps research to 1 if search-only.
`-R, --research N`	Number of Google-Search-grounded research rounds (default `0`).
`--duo NAME`	Named voice duo (`contrast`, `warm`, `explorer`, `journalist`, `debate`).
`--preset NAME`	Style preset: `casual`, `academic`, `humorous`, `debate`, `vulgarized`, or `none`.
`--style TEXT`	Free-text style guidance (≤ 500 chars). Composes with `--preset`.
`--speaker1-style` / `--speaker2-style`	Per-episode overlay for one speaker; baseline voice unchanged.
`--angle TEXT`	Episode angle. Steers the dialogue and the first research round only.
`-d, --duration MIN`	Target episode duration in minutes.
`-n, --dry-run`	Print dialogue to stdout, no TTS.
`-A, --no-audio`	Generate script + report only.
`-o, --output-dir DIR`	Output directory (overrides config).
`--no-report`	Skip the report folder.
`-v, --verbose`	Enable DEBUG logging.

Run tts-podcast run --help for the full list.

⚙️ Configuration

Scaffold a config file, then export your Gemini API key:

tts-podcast config init
export GEMINI_API_KEY=<your key>

The config lives at $XDG_CONFIG_HOME/tts-podcast/config.yaml (typically ~/.config/tts-podcast/config.yaml). The full schema is in config.example.yaml. The API key is read at runtime from the env var named by gemini.api_key_env (default GEMINI_API_KEY) and loaded from a local .env automatically.

gemini:
  api_key_env: GEMINI_API_KEY
  default_duo: contrast        # persistent voice pairing
  dialogue:
    target_duration_minutes: 8

📦 Installation

uvx tts-podcast …                # run without installing
uv tool install tts-podcast      # persistent install via uv
pipx install tts-podcast         # via pipx
pip install tts-podcast          # plain pip

Optional stealth-browser fallback (pulls a ~200 MB Chromium on first run):

uv tool install "tts-podcast[cloak]"

ffmpeg is required for audio export — skip only if you stick to --no-audio / --dry-run:

brew install ffmpeg          # macOS
sudo apt install ffmpeg      # Debian / Ubuntu

From source

git clone https://github.com/obeone/tts-podcast.git
cd tts-podcast
uv sync                      # Python 3.13+
uv run tts-podcast --help

📂 Output layout

<output_dir>/
├── <stem>.mp3
└── tts_<stem>/
    ├── overview.md       # metadata, link breakdown, token/cost summary
    ├── sources.md        # per-source content (title, URL, summary, full text)
    ├── script.md         # full two-host dialogue
    ├── research.md       # only when --research >= 1
    └── summary.md        # synthetic reference sheet with categorised links

The stem combines the first URL's hostname, a 6-char digest of the URL list, and today's date — e.g. arxiv.org-a1b2c3-2026-06-07.mp3.

💸 Research cost note

Each --research round is a separate Gemini call with Google Search grounding enabled, which adds search overhead to the standard input-token cost. The tool logs the cumulative cost after each round, so you can watch the bill while iterating.

🧪 Development

uv sync                          # install deps (Python 3.13+)
uv run pytest tests/ -q          # run the test suite
uv run ruff check src/ tests/    # lint

Tests mock the Gemini SDK rather than hitting the network. See CLAUDE.md for the architecture deep-dive and key invariants.

🔊 How it works

flowchart TB
    subgraph IN[" Inputs "]
        U[🌐 URLs]
        F[📄 Files<br/>txt · md · html · pdf]
        S[🔍 Search queries]
    end

    U --> SC[web_scraper]
    F --> LL[local_loader]
    S --> SY[synthetic source]

    SC --> R{🧠 Research?<br/>--research N}
    LL --> R
    SY --> R

    R -->|optional| RR[Google Search<br/>grounded rounds]
    R --> D[💬 llm_summarizer<br/>two-host dialogue]
    RR --> D

    D --> T[🎙️ Gemini multi-speaker TTS<br/>parallel chunks]
    T --> A[🎧 audio_exporter<br/>MP3 / WAV]
    D --> REP[📑 report_generator<br/>Markdown folder]

The pipeline is strictly linear: each stage hands typed data to the next, no hidden shared state. Scrape failures don't abort the run — it continues with whatever succeeded.

📝 License

MIT © Grégoire Compagnon

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

obeone

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.5.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tts_podcast-0.5.0.tar.gz (80.4 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tts_podcast-0.5.0-py3-none-any.whl (64.3 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file tts_podcast-0.5.0.tar.gz.

File metadata

Download URL: tts_podcast-0.5.0.tar.gz
Upload date: Jun 8, 2026
Size: 80.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tts_podcast-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`607d6ad3c4dd24baf9b5cf18eaca1c70b03c55a9760f63f0e7613cf9c1b23dbc`
MD5	`b137483c7472f0fb6fae93a4fe6572ab`
BLAKE2b-256	`8e2ee3157f4186c88b5afd915f7f61bf42d767f8acbd9461a60db46dc9673d7f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_podcast-0.5.0.tar.gz:

Publisher: publish.yml on obeone/tts-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tts_podcast-0.5.0.tar.gz
- Subject digest: 607d6ad3c4dd24baf9b5cf18eaca1c70b03c55a9760f63f0e7613cf9c1b23dbc
- Sigstore transparency entry: 1754862371
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: obeone/tts-podcast@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/obeone
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140
- Trigger Event: release

File details

Details for the file tts_podcast-0.5.0-py3-none-any.whl.

File metadata

Download URL: tts_podcast-0.5.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 64.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tts_podcast-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`663fc9934669f41bb574c1f7d723726c6aa41426683b58507051d3dad24aaf34`
MD5	`90bc1451e202ab1e119e2ce805fd01c9`
BLAKE2b-256	`66845891236d024658e66b4ad263b40a048652e689fe73d8c71c5a4fc00285b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_podcast-0.5.0-py3-none-any.whl:

Publisher: publish.yml on obeone/tts-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tts_podcast-0.5.0-py3-none-any.whl
- Subject digest: 663fc9934669f41bb574c1f7d723726c6aa41426683b58507051d3dad24aaf34
- Sigstore transparency entry: 1754862377
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: obeone/tts-podcast@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/obeone
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7faf4b7f77559a35ffc2502ebeaf72a3b91e140
- Trigger Event: release

tts-podcast 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

🎙️ tts-podcast

✨ Features

🚀 Quickstart

👥 Voice duos

Built-in duos

Custom duos

🎚️ Usage

Key flags

⚙️ Configuration

📦 Installation

From source

📂 Output layout

💸 Research cost note

🧪 Development

🔊 How it works

📝 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance