Video Retrieval-Augmented Chunking and Q&A Generation Toolkit
Project description
🌐 VideoRAC: Retrieval-Adaptive Chunking for Lecture Video RAG
🏛️ Official CSICC 2025 Implementation
"Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset"
(Presented at the 30th International Computer Society of Iran Computer Conference — CSICC 2025)
📊 Project Pipeline
📖 Overview
VideoRAC (Video Retrieval-Adaptive Chunking) provides a comprehensive framework for multimodal retrieval-augmented generation (RAG) in educational videos. This toolkit integrates visual-semantic chunking, entropy-based keyframe selection, and LLM-driven question generation to enable effective multimodal retrieval.
This repository is the official implementation of the CSICC 2025 paper by Hemmat et al.
Hemmat, A., Vadaei, K., Shirian, M., Heydari, M.H., Fatemi, A. “Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset.” Proceedings of the 30th International Computer Society of Iran Computer Conference (CSICC 2025), University of Isfahan.
🧩 Core Components
| Module | Class | Description |
|---|---|---|
processing.chunking.py |
HybridChunker |
Detects slide transitions using CLIP embeddings and SSIM to segment videos into coherent chunks. |
processing.entropy_utils.py |
EntropyUtils |
Computes frame entropy for selecting representative keyframes. |
processing.qa_generation.py |
VideoQAGenerator |
Generates structured Q&A pairs using transcripts and visual frame descriptions. |
🧠 Research Background
This framework underpins the EduViQA bilingual dataset, designed for evaluating lecture-based RAG systems in both Persian and English. The dataset and code form a unified ecosystem for multimodal question generation and retrieval evaluation.
Key Contributions:
- 🎥 Adaptive Hybrid Chunking — Combines CLIP cosine similarity with SSIM-based visual comparison.
- 🧮 Entropy-Based Keyframe Selection — Extracts high-information frames for retrieval.
- 🗣️ Transcript–Frame Alignment — Synchronizes ASR transcripts with visual semantics.
- 🔍 Multimodal Retrieval — Integrates visual and textual embeddings for RAG.
- 🧠 Benchmark Dataset — 20 bilingual educational videos with 50 QA pairs each.
⚙️ Installation
# Clone repository
git clone https://github.com/your-org/VideoRAC.git
cd VideoRAC
# Create environment & install
ython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
🚀 Usage Example
1️⃣ Hybrid Chunking
from VideoRAC.Modules import HybridChunker
chunker = HybridChunker(alpha=0.6, threshold_embedding=0.85)
chunks, timestamps, duration = chunker.chunk("lecture.mp4")
chunker.evaluate()
2️⃣ Q&A Generation
from VideoRAC.Modules import VideoQAGenerator
def my_llm_fn(messages):
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-4o", messages=messages)
return response.choices[0].message.content
urls = ["https://www.youtube.com/watch?v=2uYu8nMR5O4"]
qa = VideoQAGenerator(video_urls=urls, llm_fn=my_llm_fn)
qa.process_videos()
📈 Results Summary (CSICC 2025)
| Method | AR | CR | F | Notes |
|---|---|---|---|---|
| VideoRAC (CLIP+SSIM) | 0.87 | 0.82 | 0.91 | Best performance overall |
| CLIP-only | 0.80 | 0.75 | 0.83 | Weaker temporal segmentation |
| Simple Slicing | 0.72 | 0.67 | 0.76 | Time-based only |
Evaluated using RAGAS metrics: Answer Relevance (AR), Context Relevance (CR), and Faithfulness (F).
🧾 License
Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
You may share and adapt this work with attribution. Please cite our paper when using VideoRAC or EduViQA:
@inproceedings{hem2025videorac,
title={Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset},
author={Hemmat, Arshia and Vadaei, Kianoosh and Shirian, Melika and Heydari, Mohammad Hassan and Fatemi, Afsaneh},
booktitle={30th International Computer Society of Iran Computer Conference (CSICC 2025)},
year={2025},
organization={IEEE}
}
👥 Authors
University of Isfahan — Department of Computer Engineering
- Kianoosh Vadaei — k.vadaei@eng.ui.ac.ir
- Melika Shirian — m.shirian@eng.ui.ac.ir
- Arshia Hemmat — a.hemmat@eng.ui.ac.ir
- Mohammad Hassan Heydari — mh.heydari@eng.ui.ac.ir
- Afsaneh Fatemi — a.fatemi@eng.ui.ac.ir
⭐ Official CSICC 2025 Implementation — Give it a star if you use it in your research! ⭐ Made with ❤️ at University of Isfahan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file videorac-0.1.5.tar.gz.
File metadata
- Download URL: videorac-0.1.5.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1eb995564f08e00e69025be3a039f217870cebdae8ad6d70e17043cee3e1ad18
|
|
| MD5 |
08b50b3d89faac71af48d1b70865dd34
|
|
| BLAKE2b-256 |
119b198207ae7f71a1898e9ca12fa3130d1c0b0538692cd8187b9e3be7bb4903
|
File details
Details for the file videorac-0.1.5-py3-none-any.whl.
File metadata
- Download URL: videorac-0.1.5-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c05bdb87d700bc1ee4a2a50ba5b17660dda5523cd3e8eb76c9222aa2ab8d9606
|
|
| MD5 |
3c935bd1212d0f2493fbda9f344a5004
|
|
| BLAKE2b-256 |
f989b77a16ae09296bbe4262834fe8cd52360320423335c824a2144ce1e3eb4b
|