Skip to main content

Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.

Project description

podcast-transcript-convert

PyPI Lint and Test Ruff Code style: black Checked with pytype twitter

Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.

Installation

It is recommended to use pipx to install and run the CLI tool. If you wish to use the library, you can install with pip instead.

brew install pipx
pipx install podcast-transcript-convert

If you've already installed the package and wish to upgrade:

pipx upgrade podcast-transcript-convert

Usage

Run the conversion app on your transcripts directory.

transcript2json transcripts/ converted/

You can then inspect the output JSON files in the converted/ directory.

Library Usage

from podcast_transcript_convert.convert import bulk_convert

bulk_convert("transctipts_dir/", "converted_dir/")

Individual file type converters are in the converters package. You can use them directly if you know the file type.

You can use file_typing.identify_file_type(file) to determine the file type of a transcript file.

Development

Pull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.

git clone git@github.com:hbmartin/podcast-transcript-convert.git
cd podcast-transcript-convert
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Replace with the actual path to your transcript files
python -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/

Code Formatting

This project is linted with ruff and uses Black code formatting.

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podcast_transcript_convert-0.1.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

podcast_transcript_convert-0.1.2-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file podcast_transcript_convert-0.1.2.tar.gz.

File metadata

File hashes

Hashes for podcast_transcript_convert-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a52b12bd255f02e02c05742bd92c444a0de8b9f9d571c9c3056fcb22b46bf80d
MD5 7bc9455c528ced7c5a56ac47841e1ef5
BLAKE2b-256 56a2ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc

See more details on using hashes here.

File details

Details for the file podcast_transcript_convert-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for podcast_transcript_convert-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cf7ae1e34e80e086664e2674f1f7612647891d3924932031ae5121133ea1c08c
MD5 b10b6f097c7f555ac37a1f0e457e6e58
BLAKE2b-256 7573d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page