Skip to main content

Fetch YouTube video transcripts and save them to markdown files.

Project description

YouTube Video to Text Markdown Converter

yt-video-text-md is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the youtube-transcript-api for direct subtitle extraction and whisper for audio-to-text conversion when transcripts are unavailable.

Features

  • Playlist and Video Support: Extracts subtitles from both individual videos and entire playlists.
  • Fallback Mechanism: Utilizes whisper to transcribe audio if subtitles are not available.
  • Markdown Formatting: Outputs transcripts in Markdown format with video titles as headers.

Installation

Via pip

To install the latest version directly from the GitHub repository, use:

pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git

Building from Source

  1. Clone the repository:

    git clone https://github.com/kothiyarajesh/yt-video-text-md.git
    
  2. Navigate to the project directory:

    cd yt-video-text-md
    
  3. Install the package:

    python setup.py install
    
  4. If installing from source, make sure to install the dependencies manually:

    pip install -r requirements.txt
    

Usage

Python Script

Here's a simple example of how to use the yt-video-text-md library in a Python script:

from yt_video_text_md import YTVideoTextMD

# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"

# Specify the directory where the output Markdown file will be saved
output_directory = "."

# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"

# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"

# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
    url=video_url,
    output_dir=output_directory,
    default_md_file_name=markdown_file_name,
    audio_output_dir=temporary_audio_directory
)

Command-Line Interface

You can also use the package from the command line:

yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"

Options:

  • -u or --url: URL of the YouTube video or playlist.
  • -d or --output-dir: Directory where the output Markdown file will be saved.
  • -f or --file-name: Name for the generated Markdown file.
  • -ad or --audio-dir: Directory where temporary audio files will be stored (used only if a transcript is not available).

Notes

  • Dependencies: This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
  • Audio Extraction: If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_video_text_md-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

yt_video_text_md-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file yt_video_text_md-0.1.0.tar.gz.

File metadata

  • Download URL: yt_video_text_md-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.4

File hashes

Hashes for yt_video_text_md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c40d46c74c8e793bd41f6e5c712d6624c825a27ef5ebf395af030fd0b8e587f
MD5 c1fbbb5c1f119109f45635a518c027f9
BLAKE2b-256 00b886ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969

See more details on using hashes here.

File details

Details for the file yt_video_text_md-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yt_video_text_md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8540f50634659f2e73db191725a1ed7df6a5934645c7c215e2fb2caeaca7433a
MD5 fa2b1af9c491629951baf18aeece1587
BLAKE2b-256 6280d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page