A modular podcast episode downloader with RSS feed parsing and progress tracking

These details have not been verified by PyPI

Project links

Project description

Podcast Tracker

A modular Python package for downloading podcast episodes from RSS feeds and transcribing them using AI. Features progress tracking, metadata management, duplicate detection, and WhisperX-powered transcription with speaker diarization.

Python Version Requirements

This package requires Python 3.10, 3.11, or 3.12. Python 3.13+ is not supported due to dependency limitations with the WhisperX library.

Features

RSS Feed Parsing: Download and parse podcast RSS feeds
Episode Management: Track downloaded episodes with JSONL metadata
Progress Tracking: Visual progress bars for downloads
AI Transcription: WhisperX-powered transcription with speaker diarization
Duplicate Detection: Automatically skip already downloaded episodes
Type Safety: Comprehensive type hints throughout

Installation

Standard Installation

git clone https://github.com/falahat/podcast.git
cd podcast
pip install -e .

Installation with Transcription Support

git clone https://github.com/falahat/podcast.git
cd podcast
pip install -e .[transcribe]

Development Installation

git clone https://github.com/falahat/podcast.git
cd podcast
pip install -e .[dev,notebook,transcribe]

Quick Start

Command Line Interface

# Download episodes from an RSS feed
podcast_downloader "https://example.com/podcast/rss.xml"

# Specify custom data directory  
podcast_downloader "https://example.com/podcast/rss.xml" --data-dir ./my_podcasts

# List episodes without downloading
podcast_downloader "https://example.com/podcast/rss.xml" --list-only

# Disable progress bars
podcast_downloader "https://example.com/podcast/rss.xml" --no-progress

Python API

from easy_podcast.manager import PodcastManager

# Create manager from RSS URL (downloads and parses automatically)
manager = PodcastManager.from_rss_url("https://example.com/podcast/rss.xml")

if manager:
    podcast = manager.get_podcast()
    print(f"Podcast: {podcast.title}")
    
    # Get new episodes to download
    new_episodes = manager.get_new_episodes()
    print(f"Found {len(new_episodes)} new episodes")
    
    # Download episodes with progress tracking
    successful, skipped, failed = manager.download_episodes(new_episodes)
    print(f"Downloaded: {successful}, Skipped: {skipped}, Failed: {failed}")

Working with Existing Podcast Data

# Load manager from existing podcast folder
manager = PodcastManager.from_podcast_folder("data/My Podcast/")

if manager:
    # Continue downloading new episodes
    new_episodes = manager.get_new_episodes()
    manager.download_episodes(new_episodes)

Audio Transcription

The package includes AI-powered audio transcription using WhisperX with GPU acceleration and speaker diarization. Transcription functionality is available as an optional dependency.

Installation: To use transcription features, install with the [transcribe] option:

pip install -e .[transcribe]

Prerequisites for Transcription

NVIDIA GPU with CUDA support
Hugging Face Token (for speaker diarization models)
PyTorch with GPU support (automatically installed with easy-whisperx)

Note: PyTorch is automatically installed as part of the easy-whisperx dependency. No manual installation required.

Setting up Transcription Environment

Get a Hugging Face Token:
- Go to Hugging Face Settings
- Create a token with "read" permissions
- Accept user agreements for segmentation and diarization models

Set Environment Variable:

# Windows PowerShell
$env:HF_TOKEN="your_token_here"

# Linux/macOS
export HF_TOKEN="your_token_here"

Using Transcription in Python

from easy_whisperx.transcriber import Transcriber

# Initialize transcriber
transcriber = Transcriber(
    model_size="base",
    device="cuda",  # or "cpu"
    compute_type="float16",
    batch_size=16
)

# Transcribe audio file
with transcriber:
    result = transcriber("path/to/audio.mp3")
    print(result["text"])

Data Storage Structure

Podcast data is organized in a clear directory structure:

data/
└── [Sanitized Podcast Name]/
    ├── episodes.jsonl      # Episode metadata (one JSON object per line)
    ├── rss.xml            # Cached RSS feed
    └── downloads/         # Downloaded audio files
        ├── episode1.mp3
        ├── episode2.mp3
        └── ...

Important: Episode objects store filenames only (e.g., "727175.mp3"), not full paths. Use manager.get_episode_audio_path(episode) to get complete file paths.

Development

Setting up Development Environment

git clone https://github.com/falahat/podcast.git
cd podcast

# Create virtual environment (note the .venv name)
python -m venv .venv

# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Linux/macOS:
source .venv/bin/activate

# Install in development mode
pip install -e .[dev,notebook]

Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=easy_podcast --cov-report=html

# Run specific test file
pytest tests/test_manager.py -v

Code Quality Tools

The project uses:

Black for code formatting
mypy for type checking
flake8 for linting
pytest for testing

# Format code
black src/ tests/

# Type checking
mypy src/easy_podcast/

# Linting
flake8 src/easy_podcast/

Core Components

The package is built with a modular architecture:

PodcastManager - Main orchestrator for the complete workflow
Episode/Podcast - Data models with computed properties
EpisodeTracker - JSONL-based metadata persistence
PodcastParser - RSS feed parsing with custom episode ID extraction
PodcastDownloader - HTTP downloads with progress tracking
Transcription - WhisperX-based transcription module

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Ensure all tests pass (pytest)
Check code quality (black src/ tests/ and mypy src/)
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Sep 15, 2025

This version

0.0.1

Sep 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_podcast-0.0.1.tar.gz (40.8 kB view details)

Uploaded Sep 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easy_podcast-0.0.1-py3-none-any.whl (16.2 kB view details)

Uploaded Sep 14, 2025 Python 3

File details

Details for the file easy_podcast-0.0.1.tar.gz.

File metadata

Download URL: easy_podcast-0.0.1.tar.gz
Upload date: Sep 14, 2025
Size: 40.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for easy_podcast-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`69ea5db9174d8d6dc8387f19036e59f7a54a6a9450c5426ebc9e8ebd6b2b5768`
MD5	`54e574ac8c93a6ddbd949b94111bc3d6`
BLAKE2b-256	`adc0d48a94dd908fde3235c2285068ad8f126c719565dae181fffd2dfd0b411e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_podcast-0.0.1.tar.gz:

Publisher: python-publish.yml on falahat/easy-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: easy_podcast-0.0.1.tar.gz
- Subject digest: 69ea5db9174d8d6dc8387f19036e59f7a54a6a9450c5426ebc9e8ebd6b2b5768
- Sigstore transparency entry: 516450934
- Sigstore integration time: Sep 14, 2025
Source repository:
- Permalink: falahat/easy-podcast@3f3c5ba0975b99686af2cb736e8cc960a71567fa
- Branch / Tag: refs/tags/0.0.2
- Owner: https://github.com/falahat
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3f3c5ba0975b99686af2cb736e8cc960a71567fa
- Trigger Event: release

File details

Details for the file easy_podcast-0.0.1-py3-none-any.whl.

File metadata

Download URL: easy_podcast-0.0.1-py3-none-any.whl
Upload date: Sep 14, 2025
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for easy_podcast-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`572cde6220a3297c9da8592c7931848191dcab9b686590edbe1ca002a402dd15`
MD5	`51cf74d93f80307e91392bf26ba2ea0a`
BLAKE2b-256	`7067a2ded323eed765473b0cae3224e5d083ea1bef586acc0e8b474e8367d963`

See more details on using hashes here.

Provenance

The following attestation bundles were made for easy_podcast-0.0.1-py3-none-any.whl:

Publisher: python-publish.yml on falahat/easy-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: easy_podcast-0.0.1-py3-none-any.whl
- Subject digest: 572cde6220a3297c9da8592c7931848191dcab9b686590edbe1ca002a402dd15
- Sigstore transparency entry: 516450940
- Sigstore integration time: Sep 14, 2025
Source repository:
- Permalink: falahat/easy-podcast@3f3c5ba0975b99686af2cb736e8cc960a71567fa
- Branch / Tag: refs/tags/0.0.2
- Owner: https://github.com/falahat
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3f3c5ba0975b99686af2cb736e8cc960a71567fa
- Trigger Event: release

easy-podcast 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Podcast Tracker

Python Version Requirements

Features

Installation

Standard Installation

Installation with Transcription Support

Development Installation

Quick Start

Command Line Interface

Python API

Working with Existing Podcast Data

Audio Transcription

Prerequisites for Transcription

Setting up Transcription Environment

Using Transcription in Python

Data Storage Structure

Development

Setting up Development Environment

Running Tests

Code Quality Tools

Core Components

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance