Retrieve relevant transcript chunks from YouTube videos
Project description
videorag
A simple Python library to extract and retrieve the most relevant transcript chunks from a YouTube video for a user query using vector search and semantic retrieval. It helps developers quickly index video transcripts and answer questions about the video content.
🧠 What It Does
videorag fetches the transcript of a YouTube video and uses a vector store with MMR retrieval to find the most relevant text parts for a user query.
This makes it easy to build tools like video Q&A bots, summarizers, and RAG systems.
🚀 Features
- Automatically download YouTube transcripts (English first, fallback languages)
- Clean and split transcripts into searchable chunks
- Build a FAISS vector index for fast retrieval
- Retrieve top-k relevant chunks for a query
- Minimal and easy-to-use API
🧾 Installation
Install directly via pip in your Python environment:
pip install videorag
Or install from the local repository (after cloning):
pip install -e .
Make sure you are using a virtual environment (
venv) when installing dependencies. ([pyOpenSci][1])
📦 Usage Example
Here is a simple usage example in Python:
from videorag import get_relevant_chunks_from_video
video_link = "https://www.youtube.com/watch?v=HbZD0XoN5fc"
query = "Who is Sunita Williams?"
k = 3
relevant_text = get_relevant_chunks_from_video(video_link, query, k)
print(relevant_text)
This returns the top-k relevant transcript text chunks that best match your query.
🛠️ How It Works
- Fetch the YouTube video transcript
- Clean and split it into chunks
- Create or load a FAISS vector index
- Run MMR (Maximal Marginal Relevance) to get diverse, relevant chunks
🧪 Contributing
Contributions are welcome! You can:
- Report bugs
- Suggest new features
- Improve documentation
- Add tests
Feel free to open issues or submit pull requests.
Legal Notice
This software is provided under the MIT License. Users are responsible for complying with YouTube's Terms of Service when using this library. The author assumes no liability for how this library is used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file videorag-0.1.0.tar.gz.
File metadata
- Download URL: videorag-0.1.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873e59e1b85e30bd7b648558c4c4e590adcfa7451674f0c931faeb117b97ae75
|
|
| MD5 |
161a8c7df5b46dc93346f35f6a9d5846
|
|
| BLAKE2b-256 |
406c74dd4dbe48c979cb57b1186c45882edde3e073ee8d95bdf35a4fbecc5631
|
File details
Details for the file videorag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: videorag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2b29d8484bf78805d0047f3e5a27809e2f5d45e2ec99011e960d7882d1f13a
|
|
| MD5 |
c66617c990c136284c34f1cb2f55b27e
|
|
| BLAKE2b-256 |
54249c6563eccdd0bac20ceef5d4c623d4498c9444d1b53cd0bef113e7ad0b9f
|