Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.
Project description
podcast-transcript-convert
Convert podcast transcripts from HTML, SRT, WebVtt, Podlove etc into PodcastIndex JSON.
Installation
It is recommended to use pipx to install and run the CLI tool. If you wish to use the library, you can install with pip
instead.
brew install pipx
pipx install podcast-transcript-convert
If you've already installed the package and wish to upgrade:
pipx upgrade podcast-transcript-convert
Usage
Run the conversion app on your transcripts directory.
transcript2json transcripts/ converted/
You can then inspect the output JSON files in the converted/
directory.
Library Usage
from podcast_transcript_convert.convert import bulk_convert
bulk_convert("transctipts_dir/", "converted_dir/")
Individual file type converters are in the converters
package. You can use them directly if you know the file type.
You can use file_typing.identify_file_type(file)
to determine the file type of a transcript file.
Development
Pull requests are very welcome! For major changes, please open an issue first to discuss what you would like to change.
git clone git@github.com:hbmartin/podcast-transcript-convert.git
cd podcast-transcript-convert
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Replace with the actual path to your transcript files
python -m podcast_transcript_convert ~/Downloads/overcast-to-sqlite/archive/transcripts converted/
Code Formatting
This project is linted with ruff and uses Black code formatting.
Authors
- Harold Martin - harold.martin at gmail
- Icon courtesy of Vecteezy.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file podcast_transcript_convert-0.1.2.tar.gz
.
File metadata
- Download URL: podcast_transcript_convert-0.1.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a52b12bd255f02e02c05742bd92c444a0de8b9f9d571c9c3056fcb22b46bf80d |
|
MD5 | 7bc9455c528ced7c5a56ac47841e1ef5 |
|
BLAKE2b-256 | 56a2ac020367d366d91042b9782f3e6d098c885eed8baa8cdceaf76db002bbdc |
File details
Details for the file podcast_transcript_convert-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: podcast_transcript_convert-0.1.2-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf7ae1e34e80e086664e2674f1f7612647891d3924932031ae5121133ea1c08c |
|
MD5 | b10b6f097c7f555ac37a1f0e457e6e58 |
|
BLAKE2b-256 | 7573d654f430a814480cd67e4691c4dd4fff59a86ae1c16d354ca258111ef19e |