Turn an RSS feed into a podcast by synthesizing entries with Piper TTS
Project description
rss2podcast
Turn an RSS feed into a listenable podcast. Fetches each entry, extracts the article body, sends it to gopipertts for Piper TTS synthesis, and publishes the resulting MP3s as a standards-compliant podcast RSS feed.
How it works
- Parse the source RSS feed.
- For each entry not yet processed (tracked in
state.json):- Fetch the linked article and extract the body with
trafilatura(falls back to RSS summary). - Strip HTML to plain speakable text.
- POST to gopipertts
/api/tts, save the MP3 to disk. - Append the entry to
state.jsonimmediately (crash-safe).
- Fetch the linked article and extract the body with
- Rewrite
feed.xmlfrom the full state withfeedgen(iTunes namespace). - Write
style.xslto the output root so browsers render feeds as a readable HTML page.
Install
uv sync
Or install as a tool:
uv tool install .
Usage
Single feed (CLI)
EFF Updates (equivalent to the config.sample.yaml entry):
uv run rss2podcast --feed-url https://www.eff.org/rss/updates.xml --feed-name "EFF Updates" --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-medium --description "EFF updates, narrated by Piper TTS" --author EFF --limit 5
Ars Technica:
uv run rss2podcast --feed-url https://feeds.arstechnica.com/arstechnica/index --feed-name "Ars Technica" --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-medium --description "Ars Technica articles, narrated by Piper TTS" --author "Ars Technica" --limit 1 --prune-xpath '//div[contains(@class,"author-bio")]' --merge-xpath '//div[contains(@class,"post-content")]'
Hackaday:
uv run rss2podcast --feed-url https://hackaday.com/blog/feed/ --feed-name Hackaday --output-dir podcasts --url-root https://podcasts.example.com --tts-endpoint http://localhost:8080/ --voice en_US-amy-low --description "Hackaday articles, narrated by Piper TTS" --author Hackaday --prune-xpath '//div[contains(@class,"author-bio")]' --prune-xpath '//section[contains(@class,"related")]'
Multi-feed (YAML)
uv run rss2podcast --config config.yaml
See config.sample.yaml for a fully annotated example.
Output layout
{output_dir}/
style.xsl
{feed-slug}/
state.json
feed.xml
2026-04-16-some-post-title-abc12345.mp3
...
Serve {output_dir} over HTTP at {url_root} and subscribe to {url_root}/{feed-slug}/feed.xml in a podcast app. When style_rss_feed is enabled (the default), opening feed.xml directly in a browser renders it as a human-readable HTML page with an inline audio player.
Scheduling
Designed to run as a cron / scheduled job. Re-runs are idempotent — entries already in state.json are skipped. Long runs are fine (no time limits, state is committed after each entry).
CLI reference
Feed selection
| Flag | Default | Description |
|---|---|---|
--config PATH |
— | YAML config file; enables multi-feed mode (--limit may still be used to override the YAML value) |
--feed-url URL |
— | Source RSS feed URL (single-feed mode) |
--feed-name NAME |
— | Feed display name; also determines the output subdirectory slug |
--output-dir PATH |
— | Directory to write state.json, feed.xml, and MP3s |
--url-root URL |
— | Public base URL where output-dir is served |
TTS
| Flag | Default | Description |
|---|---|---|
--tts-endpoint URL |
http://localhost:8080 |
gopipertts base URL |
--voice MODEL |
en_US-amy-low |
Piper voice model name |
Feed metadata
| Flag | Default | Description |
|---|---|---|
--description TEXT |
— | Channel description |
--author TEXT |
— | Channel author |
--image-url URL |
— | Channel artwork URL |
Processing
| Flag | Default | Description |
|---|---|---|
--limit N |
— | Keep only the N newest articles per feed; entries that roll out of the window are evicted from state and removed from the podcast feed |
--save-text |
off | Persist raw/clean text in state.json (useful for debugging) |
--no-fetch |
off | Skip external crawling; use only RSS content/description |
--no-style-rss-feed |
style on | Disable XSLT styling; skip style.xsl and omit the processing instruction from feed.xml |
Extraction tuning
These control how trafilatura extracts article text from fetched pages. Defaults are tuned for broad recall; tighten them for noisy feeds.
| Flag | Default | Description |
|---|---|---|
--no-favor-recall |
recall on | Disable recall-biased extraction; fall back to trafilatura's default balanced mode |
--favor-precision |
off | Prefer fewer, higher-confidence text blocks; reduces sidebar/bio bleed-through at the cost of occasionally truncating real content |
--include-comments |
off | Include comment sections in extracted text |
--include-tables |
off | Include table content |
--deduplicate |
off | Remove duplicate text blocks (useful for feeds that repeat headlines or teasers) |
--fast-extraction |
off | Skip fallback extractors; faster but may miss content on harder pages |
--prune-xpath XPATH |
— | XPath expression to remove from the DOM before extraction; repeatable. Use this to surgically excise author bios, related-article widgets, cookie banners, etc. |
--merge-xpath XPATH |
— | XPath matching split article containers; children of all matches are concatenated into the first match before extraction. Repeatable. Use this when a site breaks the article body across sibling containers around mid-article ads (e.g. Ars Technica's split post-content divs), which otherwise causes trafilatura to truncate at the first ad break. |
--prune-xpath examples:
# Remove Ars Technica author bio
--prune-xpath '//div[contains(@class,"author-bio")]'
# Remove multiple sections
--prune-xpath '//aside' --prune-xpath '//div[@id="related"]'
--merge-xpath example:
# Ars Technica splits the article body across multiple <div class="post-content">
# siblings separated by ad wrappers; merging them avoids mid-article truncation.
--merge-xpath '//div[contains(@class,"post-content")]'
YAML config reference
Top-level keys:
output_dir: /var/www/podcasts # required
url_root: https://podcasts.example.com # required
tts_endpoint: http://gopipertts:8080 # default: http://localhost:8080
limit: 5 # optional: process only N newest per feed
save_text: false # optional: persist text in state.json
no_fetch: false # optional: skip external crawling globally
style_rss_feed: true # default: true; set to false to disable XSLT browser rendering
Per-feed keys:
feeds:
- name: My Feed # required
url: https://... # required
# TTS
voice: en_US-amy-medium # default: en_US-amy-low
# Feed metadata
description: "..."
author: "..."
image_url: "https://..."
# Processing (override top-level defaults)
limit: 5 # optional: overrides top-level limit for this feed
# Extraction tuning
favor_recall: true # default: true
favor_precision: false # default: false
include_comments: false # default: false
include_tables: false # default: false
deduplicate: false # default: false
fast_extraction: false # default: false
prune_xpath: # default: null
- '//div[contains(@class,"author-bio")]'
- '//section[@id="related"]'
merge_xpath: # default: null
- '//div[contains(@class,"post-content")]'
favor_recall and favor_precision are independent trafilatura flags. Setting both to true is valid; trafilatura will apply both biases simultaneously.
Tests
uv sync --extra test
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rss2podcast-0.14.0.tar.gz.
File metadata
- Download URL: rss2podcast-0.14.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7174be2ed6b579355b5ff9efc18d9e91fa63fc87202eb52ea273b9e655be1050
|
|
| MD5 |
c557066e76a5915b6048ac2177b96a61
|
|
| BLAKE2b-256 |
bca9d71c4c1575f0ccef06ec17e2ae832775c536574ec77e6fb78b34a5a0ae96
|
Provenance
The following attestation bundles were made for rss2podcast-0.14.0.tar.gz:
Publisher:
publish.yml on nbr23/rss2podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rss2podcast-0.14.0.tar.gz -
Subject digest:
7174be2ed6b579355b5ff9efc18d9e91fa63fc87202eb52ea273b9e655be1050 - Sigstore transparency entry: 1340763880
- Sigstore integration time:
-
Permalink:
nbr23/rss2podcast@247b7909c2a2cf494b51e172ea73a351a68c7fad -
Branch / Tag:
refs/heads/master - Owner: https://github.com/nbr23
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@247b7909c2a2cf494b51e172ea73a351a68c7fad -
Trigger Event:
push
-
Statement type:
File details
Details for the file rss2podcast-0.14.0-py3-none-any.whl.
File metadata
- Download URL: rss2podcast-0.14.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23e0a4d530cc4607414015cfa796bcf45c62b076db4a3f0852110a5fdc133c70
|
|
| MD5 |
03bc0fe3b170dc456949f0b00c44a8f8
|
|
| BLAKE2b-256 |
ca78652d85c44b3b714a78309aa3614e74c69cfa8fdefbe08589bd1a05348e20
|
Provenance
The following attestation bundles were made for rss2podcast-0.14.0-py3-none-any.whl:
Publisher:
publish.yml on nbr23/rss2podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rss2podcast-0.14.0-py3-none-any.whl -
Subject digest:
23e0a4d530cc4607414015cfa796bcf45c62b076db4a3f0852110a5fdc133c70 - Sigstore transparency entry: 1340763881
- Sigstore integration time:
-
Permalink:
nbr23/rss2podcast@247b7909c2a2cf494b51e172ea73a351a68c7fad -
Branch / Tag:
refs/heads/master - Owner: https://github.com/nbr23
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@247b7909c2a2cf494b51e172ea73a351a68c7fad -
Trigger Event:
push
-
Statement type: