A local-first, generic-entity podcast knowledge base.
Project description
easy-podcast
A local-first, generic-entity podcast knowledge base. Your data lives in a folder you own — durable, human-readable, and portable — with full-text search and a query/graph index on top. Built to grow into transcript search, story segmentation, and cross-podcast connections (same story, author, voice).
Status: v0.7 — feature-complete. The generic store, RSS ingestion, transcription, story segmentation, cross-podcast connections, and a local web UI are all implemented and tested (see What it does). Everything is local and private — no LLMs, no cloud.
Quick start (Docker)
The server ships the phone app inside it. Run it hardened, non-root, exposed
only on localhost — one command:
docker compose up -d --build # serves on http://localhost:8765
# populate your library (via the CLI inside the container):
docker compose exec easypod easy-podcast --data-dir /data add <feed-url>
docker compose exec easypod easy-podcast --data-dir /data sync
docker compose exec easypod easy-podcast --data-dir /data download <podcast_id>
Open http://localhost:8765 and "Add to Home Screen" — that's the installable, offline PWA. To reach it from your phone, the deploy stack (docker-compose.deploy.yml) fronts it with a bundled Tailscale sidecar (HTTPS over the tailnet, no open ports); the full runbook is in docs/SECURE_DEPLOY.md, and a from-scratch, layer-by-layer explainer is in docs/SYNC_DEEP_DIVE.md. Prefer to run it directly (no Docker)? See Install.
Principles — built to respect creators
This is a personal tool for organizing and exploring podcasts you listen to. It is built to always respect human authors and human voice actors:
- No AI training on anyone's content. It honors creators who ask not to be used for AI — their feeds are excluded from examples and tests.
- No LLMs. Transcription (via whisperx) and all analysis run locally and privately; nothing leaves your machine.
- Identify and credit, never imitate. The "connections" features detect and link human authors and voice actors across episodes in order to credit their work — never to clone, synthesize, or impersonate a voice or an author's writing.
- Polite by default. Downloads are rate-limited, use conditional requests, and respect server back-off, so the app never hammers anyone's servers.
Design in one breath
Truth is a Ledger: one JSON file per entity under docs/<kind>/, written
atomically. A Lens (an embedded SQLite database, lens.sqlite) is a
derived, disposable index over the Ledger providing exact-match queries,
full-text search, and referential edges. Delete the Lens and rebuild it from
the Ledger at any time. The store is domain-agnostic — any frozen-dataclass
Entity is stored and queried through one mechanism.
from easy_podcast import Store, Podcast, Episode, DEFAULT_SPECS
store = Store.open("~/PodcastLibrary") # the only storage decision; back up = copy the folder
for spec in DEFAULT_SPECS:
store.register(spec)
show = Podcast.create("https://feeds.example/magnus-archives", title="The Magnus Archives")
store.repo(Podcast).put(show)
store.repo(Episode).put(Episode(
id=Episode.id_for("guid-42", "https://cdn/ep42.mp3"),
podcast_id=show.id, title="The Lighthouse", description="A keeper vanishes.",
audio_url="https://cdn/ep42.mp3",
))
# exact-match query on a promoted field, and full-text search
store.repo(Episode).find(podcast_id=show.id)
for hit in store.search("lighthouse keeper"):
print(hit.ref, hit.snippet)
Adding a new entity type is a dataclass + one line
from dataclasses import dataclass
from typing import ClassVar
from easy_podcast import Entity, EntitySpec
@dataclass(frozen=True)
class Bookmark(Entity):
kind: ClassVar[str] = "bookmark"
episode_id: str = ""
note: str = ""
t: float = 0.0
store.register(EntitySpec(Bookmark, promote=["episode_id"], fts=["note"]))
store.repo(Bookmark).put(Bookmark(id="b1", episode_id=ep.id, note="great twist", t=842.0))
No new persistence module, no schema migration.
What it does
The core is pure-Python (feedparser + requests) and the intelligence layers are optional extras you add only if you want them. Nothing uses an LLM; the ML is local transcription and small local embeddings.
| Capability | Module | Extra |
|---|---|---|
| Generic store (Ledger + Lens), atomic writes, FTS, edges | store/ |
core |
| RSS ingestion: fault-tolerant parse, self-throttling verified download, content-based sync, SSRF/decompression-bomb defenses | ingest/ |
core |
| Transcription with word-level timestamps (whisperx align) + optional speaker diarization | pipelines/transcribe.py |
transcribe |
Creator-authored chapters (Podcasting 2.0 podcast:chapters) |
pipelines/chapters.py |
core |
| Story segmentation of anthology episodes + author/narrator extraction (heuristics, no LLM) | pipelines/segment.py |
transcribe |
| Local text + voice embeddings → cross-podcast connections (same story / author / transcript / voice) | pipelines/{embed,connect}.py |
connections |
| Full-text + transcript search | store/lens.py (FTS5) |
core |
| Local web UI (localhost-only, CSRF-guarded) | web/ |
web |
Install
pip install -e ".[dev]" # core: feedparser, requests (pure-python, 3.10–3.13)
pip install -e ".[transcribe]" # whisperx transcription/diarization (needs Python 3.10–3.12)
pip install -e ".[connections]" # local embeddings + connection-finding (sentence-transformers)
pip install -e ".[web]" # local Flask web UI
Transcription needs the ffmpeg binary on PATH. Speaker diarization
additionally needs a HuggingFace token (--hf-token or HF_TOKEN).
Command line
easy-podcast add https://rss.acast.com/themagnusarchives # subscribe
easy-podcast sync # fetch new episodes
easy-podcast download <podcast_id> # polite, throttled
easy-podcast transcribe <episode_id> --model base # + --diarize (needs HF token)
easy-podcast chapters <episode_id> # creator chapters, if published
easy-podcast segment <episode_id> # split into stories + credits
easy-podcast embed <episode_id> # local story embeddings
easy-podcast connections # find cross-episode links
easy-podcast search "lighthouse keeper" # full-text + transcript
easy-podcast serve # local web UI on 127.0.0.1:8000
The web UI binds to 127.0.0.1 only — there is intentionally no option to
bind a public address, and the app refuses any non-loopback client, so it can't
be exposed to a network by accident. The UI is unauthenticated by design; to
reach it from another device (e.g. your phone), put a proxy that adds auth + TLS
in front of the loopback port — tailscale serve <port>
is the easy, private option (only your own enrolled devices can connect).
Listen on your phone
The phone app is a PWA — no app store, no native build. Your library lives on the phone (offline-capable); the home computer keeps the MP3s and does the heavy ML. They sync over an authenticated, encrypted channel, paired once by scanning a QR. The built PWA ships inside the package, so the server serves it with no Node toolchain on the user's side.
1 — set up the server
pip install -e ".[sync]" # adds the sync server (pynacl + qrcode)
easy-podcast add <feed-url> # subscribe, then `sync` / `download` as usual
2 — run the sync server. It binds loopback by default — safe, invisible to the network. Reaching the phone is an explicit opt-in; pick a mode:
easy-podcast sync-serve # 127.0.0.1 — same machine only (default)
easy-podcast sync-serve --host tailscale \ # off-box: over a private Tailscale net —
--allow-peer <phone-tailscale-ip> # no open port on Wi-Fi, reachable anywhere
Add --tls-cert/--tls-key/--tls-host (a free Tailscale cert) to serve HTTPS —
the secure context the phone needs to install it as a real offline PWA. The
full step-by-step runbook (Tailscale, a least-privilege jail, certs) is
docs/SECURE_DEPLOY.md.
3 — pair the phone. easy-podcast pair "my phone" prints a QR (and a paste-able
{"secret":…} line). On the phone, open the server URL and paste that line — or scan
the QR with your camera and paste what it decodes — then tap Connect. The app
pulls your podcasts/transcripts, fetches audio on demand, plays with lock-screen
controls, and "Add to Home Screen" makes it standalone. Pairing a device (or
revoking one with unpair) takes effect on a running server with no restart.
Why it's safe (and why an "open port" isn't an exposure)
- Local/overlay clients only, fail-closed: the real TCP peer and the
Hostheader must be loopback / private / your-tailnet — never an arbitrary domain (closing DNS-rebinding).--allow-peerpins it to one exact device, and every refusal is logged. - Paired devices only. Every request is sealed under the QR secret
(authenticated encryption, direction-bound, replay-deduped); the secret is shown
as a QR and never crosses the wire. Unpaired → flat
403. - No discoverable port. Bound to loopback (unreachable) or the Tailscale overlay (the app port lives on a virtual interface nothing physical can address; WireGuard's own socket is silent to non-peers). A scan finds nothing.
A from-scratch, layer-by-layer walkthrough — the protocol, the networking, and a full request trace down to the physical layer — is in docs/SYNC_DEEP_DIVE.md.
Deliberate limits (so it doesn't overclaim): encryption is at the message
layer, with no forward secrecy (the pairing secret is long-lived but never
crosses the wire); the /audio body leans on the transport (WireGuard/TLS) for
confidentiality; a lost unlocked device is out of scope.
Developing the phone app
The PWA is a SolidJS "kernel + one folder per feature" architecture — adding a
feature means dropping a self-registering folder under pwa/src/features/, editing
no shared file. To build it yourself (a built copy already ships in the package):
cd pwa && npm install && npm run build compiles straight into
src/easy_podcast/sync/_webapp; npm run e2e drives a real sync end-to-end.
Three docs:
- pwa/README.md — start here: how the code works, how to add a feature in one folder, and how to test it (typecheck → Vitest → Node e2e → browser smoke).
- docs/PWA_ARCHITECTURE.md — the deep design ("Ports & SolidJS"): the kernel, the Solid ownership model, every contract, the footguns.
- docs/PWA_FEATURE_CATALOG.md — the ~430-feature backlog with IDs, a dependency map, and the build waves.
Roadmap — delivered
| Phase | Scope | |
|---|---|---|
| 0 | Generic store (Ledger + Lens), atomic writes, FTS, edges, models | ✅ |
| 1 | RSS ingestion (fault-tolerant parse, rate-limited verified download, content-based sync) + CLI | ✅ |
| 2 | Transcription via easy-whisperx; transcript full-text search | ✅ |
| 3 | Story segmentation (anthology-aware) + author/narrator extraction — no LLMs | ✅ |
| 4 | Text + voice embeddings; cross-podcast connections (same story / author / transcript / voice) | ✅ |
| 5 | Local web app | ✅ |
| 6 | Phone PWA (local-first replica) + authenticated LAN sync — secure by construction | ✅ |
| 7 | PWA feature platform — "Ports & SolidJS" kernel + one-folder-per-feature; features in progress | 🚧 |
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easy_podcast-0.8.1.tar.gz.
File metadata
- Download URL: easy_podcast-0.8.1.tar.gz
- Upload date:
- Size: 660.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea66499db9cdc5c3d0978504b70db6b808507e88821471f2573e826837f28c55
|
|
| MD5 |
adeebbff216eda434662214901682b09
|
|
| BLAKE2b-256 |
ae3ef10a73f1e1531e6253fd81b43bcd18cef789bd0a2d87637364899b9d5ff1
|
Provenance
The following attestation bundles were made for easy_podcast-0.8.1.tar.gz:
Publisher:
python-publish.yml on falahat/easy-podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
easy_podcast-0.8.1.tar.gz -
Subject digest:
ea66499db9cdc5c3d0978504b70db6b808507e88821471f2573e826837f28c55 - Sigstore transparency entry: 1989689678
- Sigstore integration time:
-
Permalink:
falahat/easy-podcast@4b2c456bc43e1ec3566a5b34aa13508e5db79f4f -
Branch / Tag:
refs/tags/0.8.1 - Owner: https://github.com/falahat
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4b2c456bc43e1ec3566a5b34aa13508e5db79f4f -
Trigger Event:
push
-
Statement type:
File details
Details for the file easy_podcast-0.8.1-py3-none-any.whl.
File metadata
- Download URL: easy_podcast-0.8.1-py3-none-any.whl
- Upload date:
- Size: 640.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88423b8cdcaaca7e25ac6853895ecbdd77c23f7133a285f7056235eb282faee0
|
|
| MD5 |
e9e809385e1939897ff859e886d124be
|
|
| BLAKE2b-256 |
90f0d5f1d4adaf55a44af5d525b3e67f192d769f0cd7af61e188e1bc4b5e41ad
|
Provenance
The following attestation bundles were made for easy_podcast-0.8.1-py3-none-any.whl:
Publisher:
python-publish.yml on falahat/easy-podcast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
easy_podcast-0.8.1-py3-none-any.whl -
Subject digest:
88423b8cdcaaca7e25ac6853895ecbdd77c23f7133a285f7056235eb282faee0 - Sigstore transparency entry: 1989689861
- Sigstore integration time:
-
Permalink:
falahat/easy-podcast@4b2c456bc43e1ec3566a5b34aa13508e5db79f4f -
Branch / Tag:
refs/tags/0.8.1 - Owner: https://github.com/falahat
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4b2c456bc43e1ec3566a5b34aa13508e5db79f4f -
Trigger Event:
push
-
Statement type: