Fetch YouTube video transcripts and save them to markdown files.
Project description
YouTube Video to Text Markdown Converter
yt-video-text-md
is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the youtube-transcript-api
for direct subtitle extraction and whisper
for audio-to-text conversion when transcripts are unavailable.
Features
- Playlist and Video Support: Extracts subtitles from both individual videos and entire playlists.
- Fallback Mechanism: Utilizes
whisper
to transcribe audio if subtitles are not available. - Markdown Formatting: Outputs transcripts in Markdown format with video titles as headers.
Installation
Via pip
To install the latest version directly from the GitHub repository, use:
pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git
Building from Source
-
Clone the repository:
git clone https://github.com/kothiyarajesh/yt-video-text-md.git
-
Navigate to the project directory:
cd yt-video-text-md
-
Install the package:
python setup.py install
-
If installing from source, make sure to install the dependencies manually:
pip install -r requirements.txt
Usage
Python Script
Here's a simple example of how to use the yt-video-text-md
library in a Python script:
from yt_video_text_md import YTVideoTextMD
# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"
# Specify the directory where the output Markdown file will be saved
output_directory = "."
# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"
# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"
# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
url=video_url,
output_dir=output_directory,
default_md_file_name=markdown_file_name,
audio_output_dir=temporary_audio_directory
)
Command-Line Interface
You can also use the package from the command line:
yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"
Options:
-u
or--url
: URL of the YouTube video or playlist.-d
or--output-dir
: Directory where the output Markdown file will be saved.-f
or--file-name
: Name for the generated Markdown file.-ad
or--audio-dir
: Directory where temporary audio files will be stored (used only if a transcript is not available).
Notes
- Dependencies: This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
- Audio Extraction: If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file yt_video_text_md-0.1.0.tar.gz
.
File metadata
- Download URL: yt_video_text_md-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c40d46c74c8e793bd41f6e5c712d6624c825a27ef5ebf395af030fd0b8e587f |
|
MD5 | c1fbbb5c1f119109f45635a518c027f9 |
|
BLAKE2b-256 | 00b886ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969 |
File details
Details for the file yt_video_text_md-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: yt_video_text_md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8540f50634659f2e73db191725a1ed7df6a5934645c7c215e2fb2caeaca7433a |
|
MD5 | fa2b1af9c491629951baf18aeece1587 |
|
BLAKE2b-256 | 6280d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4 |