Project description

Youtube Transcript Loader

This loader fetches the text transcript of Youtube videos using the youtube_transcript_api Python package.

Usage

To use this loader, you will need to first pip install youtube_transcript_api.

Then, simply pass an array of YouTube links into load_data:

from llama_hub.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(
    ytlinks=["https://www.youtube.com/watch?v=i3OYlaoj-BM"]
)

Supported URL formats: + youtube.com/watch?v={video_id} (with or without 'www.') + youtube.com/embed?v={video_id} (with or without 'www.') + youtu.be/{video_id} (never includes www subdomain)

To programmatically check if a URL is supported:

from llama_hub.youtube_transcript import is_youtube_video

is_youtube_video("https://youtube.com/watch?v=j83jrh2")  # => True
is_youtube_video("https://vimeo.com/272134160")  # => False

This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.4

Feb 21, 2024

0.1.3

Feb 16, 2024

0.1.2

Feb 13, 2024

0.1.1

Feb 12, 2024

0.1.0

Feb 10, 2024

0.0.1

Feb 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_youtube_transcript-0.1.4.tar.gz (3.0 kB view hashes)

Uploaded Feb 21, 2024 Source

Built Distribution

llama_index_readers_youtube_transcript-0.1.4-py3-none-any.whl (3.7 kB view hashes)

Uploaded Feb 21, 2024 Python 3

Hashes for llama_index_readers_youtube_transcript-0.1.4.tar.gz

Hashes for llama_index_readers_youtube_transcript-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`1ccdf6ab06146432e5ad7187c61ce273d73d15e6999814252d9947907189503d`
MD5	`c5d9c4c463f205da156c14f0e10af74c`
BLAKE2b-256	`1f719af3f99c1cf32328bc9a4327f4795fc200a23e4b08723864333fec2b3408`

Hashes for llama_index_readers_youtube_transcript-0.1.4-py3-none-any.whl

Hashes for llama_index_readers_youtube_transcript-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb3909095dee3566248f42ece972415417ed648aed272662e3969aed85c3ec5e`
MD5	`4f2b5ad8a10162cf9a98101c9c63109d`
BLAKE2b-256	`8afd6139f84e21c68f79a3898617abd875b537169ca99692fbd758bdf37e5b9d`