Skip to main content

Turn an RSS feed into a podcast by synthesizing entries with Piper TTS

Project description

rss2podcast

Turn an RSS feed into a listenable podcast. Fetches each entry, extracts the article body, sends it to gopipertts for Piper TTS synthesis, and publishes the resulting MP3s as a standards-compliant podcast RSS feed.

How it works

  1. Parse the source RSS feed.
  2. For each entry not yet processed (tracked in state.json):
    • Fetch the linked article and extract the body with trafilatura (falls back to RSS summary).
    • Strip HTML to plain speakable text.
    • POST to gopipertts /api/tts, save the MP3 to disk.
    • Append the entry to state.json immediately (crash-safe).
  3. Rewrite feed.xml from the full state with feedgen (iTunes namespace).
  4. Write style.xsl to the output root so browsers render feeds as a readable HTML page.

Install

uv sync

Or install as a tool:

uv tool install .

Usage

Single feed (CLI)

EFF Updates (equivalent to the config.sample.yaml entry):

uv run rss2podcast --feed-url https://www.eff.org/rss/updates.xml --feed-name "EFF Updates" --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-medium --description "EFF updates, narrated by Piper TTS" --author EFF --limit 5

Ars Technica:

uv run rss2podcast --feed-url https://feeds.arstechnica.com/arstechnica/index --feed-name "Ars Technica" --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-medium --description "Ars Technica articles, narrated by Piper TTS" --author "Ars Technica" --limit 1 --prune-xpath '//div[contains(@class,"author-bio")]' --merge-xpath '//div[contains(@class,"post-content")]'

Hackaday:

uv run rss2podcast --feed-url https://hackaday.com/blog/feed/ --feed-name Hackaday --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-low --description "Hackaday articles, narrated by Piper TTS" --author Hackaday --prune-xpath '//div[contains(@class,"author-bio")]' --prune-xpath '//section[contains(@class,"related")]'

Multi-feed (YAML)

uv run rss2podcast --config config.yaml

See config.sample.yaml for a fully annotated example.

Output layout

{output_dir}/
  style.xsl
  {feed-slug}/
    state.json
    feed.xml
    2026-04-16-some-post-title-abc12345.mp3
    ...

Serve {output_dir} over HTTP at {url_root} and subscribe to {url_root}/{feed-slug}/feed.xml in a podcast app. When style_rss_feed is enabled (the default), opening feed.xml directly in a browser renders it as a human-readable HTML page with an inline audio player.

Scheduling

Designed to run as a cron / scheduled job. Re-runs are idempotent — entries already in state.json are skipped. Long runs are fine (no time limits, state is committed after each entry).

CLI reference

Feed selection

Flag Default Description
--config PATH YAML config file; enables multi-feed mode (--limit may still be used to override the YAML value)
--feed-url URL Source RSS feed URL (single-feed mode)
--feed-name NAME Feed display name; also determines the output subdirectory slug
--output-dir PATH Directory to write state.json, feed.xml, and MP3s
--url-root URL Public base URL where output-dir is served

TTS

Flag Default Description
--tts-endpoint URL http://localhost:8080 gopipertts base URL
--voice MODEL en_US-amy-low Piper voice model name

Feed metadata

Flag Default Description
--description TEXT Channel description
--author TEXT Channel author
--image-url URL Channel artwork URL

Processing

Flag Default Description
--limit N Keep only the N newest articles per feed; entries that roll out of the window are evicted from state and removed from the podcast feed
--save-text off Persist raw/clean text in state.json (useful for debugging)
--no-fetch off Skip external crawling; use only RSS content/description
--no-style-rss-feed style on Disable XSLT styling; skip style.xsl and omit the processing instruction from feed.xml

Extraction tuning

These control how trafilatura extracts article text from fetched pages. Defaults are tuned for broad recall; tighten them for noisy feeds.

Flag Default Description
--no-favor-recall recall on Disable recall-biased extraction; fall back to trafilatura's default balanced mode
--favor-precision off Prefer fewer, higher-confidence text blocks; reduces sidebar/bio bleed-through at the cost of occasionally truncating real content
--include-comments off Include comment sections in extracted text
--include-tables off Include table content
--deduplicate off Remove duplicate text blocks (useful for feeds that repeat headlines or teasers)
--fast-extraction off Skip fallback extractors; faster but may miss content on harder pages
--prune-xpath XPATH XPath expression to remove from the DOM before extraction; repeatable. Use this to surgically excise author bios, related-article widgets, cookie banners, etc.
--merge-xpath XPATH XPath matching split article containers; children of all matches are concatenated into the first match before extraction. Repeatable. Use this when a site breaks the article body across sibling containers around mid-article ads (e.g. Ars Technica's split post-content divs), which otherwise causes trafilatura to truncate at the first ad break.

--prune-xpath examples:

# Remove Ars Technica author bio
--prune-xpath '//div[contains(@class,"author-bio")]'

# Remove multiple sections
--prune-xpath '//aside' --prune-xpath '//div[@id="related"]'

--merge-xpath example:

# Ars Technica splits the article body across multiple <div class="post-content">
# siblings separated by ad wrappers; merging them avoids mid-article truncation.
--merge-xpath '//div[contains(@class,"post-content")]'

YAML config reference

Top-level keys:

output_dir: /var/www/podcasts        # required
url_root: https://podcasts.example.com  # required
tts_endpoint: http://gopipertts:8080 # default: http://localhost:8080
limit: 5                             # optional: process only N newest per feed
save_text: false                     # optional: persist text in state.json
no_fetch: false                      # optional: skip external crawling globally
style_rss_feed: true                 # default: true; set to false to disable XSLT browser rendering

Per-feed keys:

feeds:
  - name: My Feed        # required
    url: https://...     # required

    # TTS
    voice: en_US-amy-medium   # default: en_US-amy-low

    # Feed metadata
    description: "..."
    author: "..."
    image_url: "https://..."

    # Processing (override top-level defaults)
    limit: 5             # optional: overrides top-level limit for this feed

    # Extraction tuning
    favor_recall: true         # default: true
    favor_precision: false     # default: false
    include_comments: false    # default: false
    include_tables: false      # default: false
    deduplicate: false         # default: false
    fast_extraction: false     # default: false
    prune_xpath:               # default: null
      - '//div[contains(@class,"author-bio")]'
      - '//section[@id="related"]'
    merge_xpath:               # default: null
      - '//div[contains(@class,"post-content")]'

favor_recall and favor_precision are independent trafilatura flags. Setting both to true is valid; trafilatura will apply both biases simultaneously.

Tests

uv sync --extra test
uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rss2podcast-0.12.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rss2podcast-0.12.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file rss2podcast-0.12.0.tar.gz.

File metadata

  • Download URL: rss2podcast-0.12.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rss2podcast-0.12.0.tar.gz
Algorithm Hash digest
SHA256 4eb1d0d609db553dbf4b5ae5112a8bbf4e59f05c1bdaea6f4ed6b130555bf551
MD5 e76a4efc306f29a1d8ce5eb7255fb0e7
BLAKE2b-256 452f169f27211fd9fb1f80d874dc5a0582f87d91e6abf5ee16a401b8d063a0e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for rss2podcast-0.12.0.tar.gz:

Publisher: publish.yml on nbr23/rss2podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rss2podcast-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: rss2podcast-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rss2podcast-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a158d961691b514088206052ab21504308f763f9c393953a8c78b2ec1f2fd2a
MD5 7917a7381a3e23e8cf34bf6b97f78ce3
BLAKE2b-256 cc26fab77b882ff00f295e8738d22e6ed39fbc65788c339daaba37720f4fb4ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for rss2podcast-0.12.0-py3-none-any.whl:

Publisher: publish.yml on nbr23/rss2podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page