Skip to main content

A local-first, generic-entity podcast knowledge base.

Project description

easy-podcast

A local-first, generic-entity podcast knowledge base. Your data lives in a folder you own — durable, human-readable, and portable — with full-text search and a query/graph index on top. Built to grow into transcript search, story segmentation, and cross-podcast connections (same story, author, voice).

Status: v0.7 — feature-complete. The generic store, RSS ingestion, transcription, story segmentation, cross-podcast connections, and a local web UI are all implemented and tested (see What it does). Everything is local and private — no LLMs, no cloud.

Quick start (Docker)

The server ships the phone app inside it. Run it hardened, non-root, exposed only on localhost — one command:

docker compose up -d --build      # serves on http://localhost:8765
# populate your library (via the CLI inside the container):
docker compose exec easypod easy-podcast --data-dir /data add <feed-url>
docker compose exec easypod easy-podcast --data-dir /data sync
docker compose exec easypod easy-podcast --data-dir /data download <podcast_id>

Open http://localhost:8765 and "Add to Home Screen" — that's the installable, offline PWA. To reach it from your phone, the deploy stack (docker-compose.deploy.yml) fronts it with a bundled Tailscale sidecar (HTTPS over the tailnet, no open ports); the full runbook is in docs/SECURE_DEPLOY.md, and a from-scratch, layer-by-layer explainer is in docs/SYNC_DEEP_DIVE.md. Prefer to run it directly (no Docker)? See Install.

Principles — built to respect creators

This is a personal tool for organizing and exploring podcasts you listen to. It is built to always respect human authors and human voice actors:

  • No AI training on anyone's content. It honors creators who ask not to be used for AI — their feeds are excluded from examples and tests.
  • No LLMs. Transcription (via whisperx) and all analysis run locally and privately; nothing leaves your machine.
  • Identify and credit, never imitate. The "connections" features detect and link human authors and voice actors across episodes in order to credit their work — never to clone, synthesize, or impersonate a voice or an author's writing.
  • Polite by default. Downloads are rate-limited, use conditional requests, and respect server back-off, so the app never hammers anyone's servers.

Design in one breath

Truth is a Ledger: one JSON file per entity under docs/<kind>/, written atomically. A Lens (an embedded SQLite database, lens.sqlite) is a derived, disposable index over the Ledger providing exact-match queries, full-text search, and referential edges. Delete the Lens and rebuild it from the Ledger at any time. The store is domain-agnostic — any frozen-dataclass Entity is stored and queried through one mechanism.

from easy_podcast import Store, Podcast, Episode, DEFAULT_SPECS

store = Store.open("~/PodcastLibrary")          # the only storage decision; back up = copy the folder
for spec in DEFAULT_SPECS:
    store.register(spec)

show = Podcast.create("https://feeds.example/magnus-archives", title="The Magnus Archives")
store.repo(Podcast).put(show)

store.repo(Episode).put(Episode(
    id=Episode.id_for("guid-42", "https://cdn/ep42.mp3"),
    podcast_id=show.id, title="The Lighthouse", description="A keeper vanishes.",
    audio_url="https://cdn/ep42.mp3",
))

# exact-match query on a promoted field, and full-text search
store.repo(Episode).find(podcast_id=show.id)
for hit in store.search("lighthouse keeper"):
    print(hit.ref, hit.snippet)

Adding a new entity type is a dataclass + one line

from dataclasses import dataclass
from typing import ClassVar
from easy_podcast import Entity, EntitySpec

@dataclass(frozen=True)
class Bookmark(Entity):
    kind: ClassVar[str] = "bookmark"
    episode_id: str = ""
    note: str = ""
    t: float = 0.0

store.register(EntitySpec(Bookmark, promote=["episode_id"], fts=["note"]))
store.repo(Bookmark).put(Bookmark(id="b1", episode_id=ep.id, note="great twist", t=842.0))

No new persistence module, no schema migration.

What it does

The core is pure-Python (feedparser + requests) and the intelligence layers are optional extras you add only if you want them. Nothing uses an LLM; the ML is local transcription and small local embeddings.

Capability Module Extra
Generic store (Ledger + Lens), atomic writes, FTS, edges store/ core
RSS ingestion: fault-tolerant parse, self-throttling verified download, content-based sync, SSRF/decompression-bomb defenses ingest/ core
Transcription with word-level timestamps (whisperx align) + optional speaker diarization pipelines/transcribe.py transcribe
Creator-authored chapters (Podcasting 2.0 podcast:chapters) pipelines/chapters.py core
Story segmentation of anthology episodes + author/narrator extraction (heuristics, no LLM) pipelines/segment.py transcribe
Local text + voice embeddings → cross-podcast connections (same story / author / transcript / voice) pipelines/{embed,connect}.py connections
Full-text + transcript search store/lens.py (FTS5) core
Local web UI (localhost-only, CSRF-guarded) web/ web

Install

pip install -e ".[dev]"          # core: feedparser, requests (pure-python, 3.10–3.13)
pip install -e ".[transcribe]"   # whisperx transcription/diarization (needs Python 3.10–3.12)
pip install -e ".[connections]"  # local embeddings + connection-finding (sentence-transformers)
pip install -e ".[web]"          # local Flask web UI

Transcription needs the ffmpeg binary on PATH. Speaker diarization additionally needs a HuggingFace token (--hf-token or HF_TOKEN).

Command line

easy-podcast add https://rss.acast.com/themagnusarchives   # subscribe
easy-podcast sync                                          # fetch new episodes
easy-podcast download <podcast_id>                         # polite, throttled
easy-podcast transcribe <episode_id> --model base          # + --diarize (needs HF token)
easy-podcast chapters <episode_id>                         # creator chapters, if published
easy-podcast segment <episode_id>                          # split into stories + credits
easy-podcast embed <episode_id>                            # local story embeddings
easy-podcast connections                                   # find cross-episode links
easy-podcast search "lighthouse keeper"                    # full-text + transcript
easy-podcast serve                                         # local web UI on 127.0.0.1:8000

The web UI binds to 127.0.0.1 only — there is intentionally no option to bind a public address, and the app refuses any non-loopback client, so it can't be exposed to a network by accident. The UI is unauthenticated by design; to reach it from another device (e.g. your phone), put a proxy that adds auth + TLS in front of the loopback port — tailscale serve <port> is the easy, private option (only your own enrolled devices can connect).

Listen on your phone

The phone app is a PWA — no app store, no native build. Your library lives on the phone (offline-capable); the home computer keeps the MP3s and does the heavy ML. They sync over an authenticated, encrypted channel, paired once by scanning a QR. The built PWA ships inside the package, so the server serves it with no Node toolchain on the user's side.

1 — set up the server

pip install -e ".[sync]"        # adds the sync server (pynacl + qrcode)
easy-podcast add <feed-url>     # subscribe, then `sync` / `download` as usual

2 — run the sync server. It binds loopback by default — safe, invisible to the network. Reaching the phone is an explicit opt-in; pick a mode:

easy-podcast sync-serve                          # 127.0.0.1 — same machine only (default)
easy-podcast sync-serve --host tailscale \       # off-box: over a private Tailscale net —
    --allow-peer <phone-tailscale-ip>            #   no open port on Wi-Fi, reachable anywhere

Add --tls-cert/--tls-key/--tls-host (a free Tailscale cert) to serve HTTPS — the secure context the phone needs to install it as a real offline PWA. The full step-by-step runbook (Tailscale, a least-privilege jail, certs) is docs/SECURE_DEPLOY.md.

3 — pair the phone. easy-podcast pair "my phone" prints a QR (and a paste-able {"secret":…} line). On the phone, open the server URL and paste that line — or scan the QR with your camera and paste what it decodes — then tap Connect. The app pulls your podcasts/transcripts, fetches audio on demand, plays with lock-screen controls, and "Add to Home Screen" makes it standalone. Pairing a device (or revoking one with unpair) takes effect on a running server with no restart.

Why it's safe (and why an "open port" isn't an exposure)

  • Local/overlay clients only, fail-closed: the real TCP peer and the Host header must be loopback / private / your-tailnet — never an arbitrary domain (closing DNS-rebinding). --allow-peer pins it to one exact device, and every refusal is logged.
  • Paired devices only. Every request is sealed under the QR secret (authenticated encryption, direction-bound, replay-deduped); the secret is shown as a QR and never crosses the wire. Unpaired → flat 403.
  • No discoverable port. Bound to loopback (unreachable) or the Tailscale overlay (the app port lives on a virtual interface nothing physical can address; WireGuard's own socket is silent to non-peers). A scan finds nothing.

A from-scratch, layer-by-layer walkthrough — the protocol, the networking, and a full request trace down to the physical layer — is in docs/SYNC_DEEP_DIVE.md.

Deliberate limits (so it doesn't overclaim): encryption is at the message layer, with no forward secrecy (the pairing secret is long-lived but never crosses the wire); the /audio body leans on the transport (WireGuard/TLS) for confidentiality; a lost unlocked device is out of scope.

Developing the phone app

The PWA is a SolidJS "kernel + one folder per feature" architecture — adding a feature means dropping a self-registering folder under pwa/src/features/, editing no shared file. To build it yourself (a built copy already ships in the package): cd pwa && npm install && npm run build compiles straight into src/easy_podcast/sync/_webapp; npm run e2e drives a real sync end-to-end. Three docs:

  • pwa/README.md — start here: how the code works, how to add a feature in one folder, and how to test it (typecheck → Vitest → Node e2e → browser smoke).
  • docs/PWA_ARCHITECTURE.md — the deep design ("Ports & SolidJS"): the kernel, the Solid ownership model, every contract, the footguns.
  • docs/PWA_FEATURE_CATALOG.md — the ~430-feature backlog with IDs, a dependency map, and the build waves.

Roadmap — delivered

Phase Scope
0 Generic store (Ledger + Lens), atomic writes, FTS, edges, models
1 RSS ingestion (fault-tolerant parse, rate-limited verified download, content-based sync) + CLI
2 Transcription via easy-whisperx; transcript full-text search
3 Story segmentation (anthology-aware) + author/narrator extraction — no LLMs
4 Text + voice embeddings; cross-podcast connections (same story / author / transcript / voice)
5 Local web app
6 Phone PWA (local-first replica) + authenticated LAN sync — secure by construction
7 PWA feature platform — "Ports & SolidJS" kernel + one-folder-per-feature; features in progress 🚧

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_podcast-0.8.1.tar.gz (660.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easy_podcast-0.8.1-py3-none-any.whl (640.1 kB view details)

Uploaded Python 3

File details

Details for the file easy_podcast-0.8.1.tar.gz.

File metadata

  • Download URL: easy_podcast-0.8.1.tar.gz
  • Upload date:
  • Size: 660.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for easy_podcast-0.8.1.tar.gz
Algorithm Hash digest
SHA256 ea66499db9cdc5c3d0978504b70db6b808507e88821471f2573e826837f28c55
MD5 adeebbff216eda434662214901682b09
BLAKE2b-256 ae3ef10a73f1e1531e6253fd81b43bcd18cef789bd0a2d87637364899b9d5ff1

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_podcast-0.8.1.tar.gz:

Publisher: python-publish.yml on falahat/easy-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file easy_podcast-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: easy_podcast-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 640.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for easy_podcast-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 88423b8cdcaaca7e25ac6853895ecbdd77c23f7133a285f7056235eb282faee0
MD5 e9e809385e1939897ff859e886d124be
BLAKE2b-256 90f0d5f1d4adaf55a44af5d525b3e67f192d769f0cd7af61e188e1bc4b5e41ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_podcast-0.8.1-py3-none-any.whl:

Publisher: python-publish.yml on falahat/easy-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page