Skip to main content

Retrieve relevant transcript chunks from YouTube videos

Project description

videorag

A simple Python library to extract and retrieve the most relevant transcript chunks from a YouTube video for a user query using vector search and semantic retrieval. It helps developers quickly index video transcripts and answer questions about the video content.


🧠 What It Does

videorag fetches the transcript of a YouTube video and uses a vector store with MMR retrieval to find the most relevant text parts for a user query. This makes it easy to build tools like video Q&A bots, summarizers, and RAG systems.


🚀 Features

  • Automatically download YouTube transcripts (English first, fallback languages)
  • Clean and split transcripts into searchable chunks
  • Build a FAISS vector index for fast retrieval
  • Retrieve top-k relevant chunks for a query
  • Minimal and easy-to-use API

🧾 Installation

Install directly via pip in your Python environment:

pip install videorag

Or install from the local repository (after cloning):

pip install -e .

Make sure you are using a virtual environment (venv) when installing dependencies. ([pyOpenSci][1])


📦 Usage Example

Here is a simple usage example in Python:

from videorag import get_relevant_chunks_from_video

video_link = "https://www.youtube.com/watch?v=HbZD0XoN5fc"
query = "Who is Sunita Williams?"
k = 3

relevant_text = get_relevant_chunks_from_video(video_link, query, k)
print(relevant_text)

This returns the top-k relevant transcript text chunks that best match your query.


🛠️ How It Works

  1. Fetch the YouTube video transcript
  2. Clean and split it into chunks
  3. Create or load a FAISS vector index
  4. Run MMR (Maximal Marginal Relevance) to get diverse, relevant chunks

🧪 Contributing

Contributions are welcome! You can:

  • Report bugs
  • Suggest new features
  • Improve documentation
  • Add tests

Feel free to open issues or submit pull requests.

Legal Notice

This software is provided under the MIT License. Users are responsible for complying with YouTube's Terms of Service when using this library. The author assumes no liability for how this library is used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videorag-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

videorag-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file videorag-0.1.0.tar.gz.

File metadata

  • Download URL: videorag-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for videorag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 873e59e1b85e30bd7b648558c4c4e590adcfa7451674f0c931faeb117b97ae75
MD5 161a8c7df5b46dc93346f35f6a9d5846
BLAKE2b-256 406c74dd4dbe48c979cb57b1186c45882edde3e073ee8d95bdf35a4fbecc5631

See more details on using hashes here.

File details

Details for the file videorag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: videorag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for videorag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea2b29d8484bf78805d0047f3e5a27809e2f5d45e2ec99011e960d7882d1f13a
MD5 c66617c990c136284c34f1cb2f55b27e
BLAKE2b-256 54249c6563eccdd0bac20ceef5d4c623d4498c9444d1b53cd0bef113e7ad0b9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page