Skip to main content

A YouTube video search and management tool with a Gradio interface

Project description

Offline YouTube Video Search Application

This application allows users to extract transcripts from YouTube videos, upload their own video/audio files, create searchable vector databases, and perform semantic searches using a Gradio web interface or command-line interface (CLI). It's powered by faster-whisper for transcription, FAISS for vector search, and sentence-transformers for text embeddings.


Features

  • Extract transcripts from individual videos, playlists, and entire channels.
  • Upload your own video or audio files for processing.
  • Automatically detect playlists, channels, and individual video links.
  • Automatically download video thumbnails.
  • Store transcripts and create a searchable vector database.
  • Perform semantic searches on video content.
  • Supports Gradio web interface and CLI for flexible usage.
  • Easily add more videos or your own files to the dataset.

Web Interface

Add Videos Tab

  • Enter playlist, channel, and/or video URLs (comma-separated).
  • Upload your own video/audio files.
  • Option to process entire channels when a channel URL is provided.
  • Option to keep videos stored locally or not.
Screenshot 2024-11-01 at 11 14 22 AM

Search Tab

  • Enter your search query to find relevant snippets.
  • View top relevant videos with thumbnails and play local videos if available.
  • View detailed results with timestamps and direct links.
Screenshot 2024-11-01 at 11 18 01 AM Screenshot 2024-11-01 at 12 05 34 PM

Installation

Ensure you have Python installed (>= 3.8). Then, install the required dependencies:

pip install offlineyoutube

Usage

The app provides two ways to interact:

  1. Gradio Web Interface
  2. Command-Line Interface (CLI)

1. Running the Gradio Web Interface

Launch the web interface:

offlineyoutube ui

or simply:

offlineyoutube

Then, open the URL (usually http://127.0.0.1:7860) in your browser.

Gradio Interface Tabs:

  • Add Videos:

    • Enter playlist URLs, channel URLs, and/or individual video URLs (comma-separated).
    • Upload your own video or audio files for processing.
    • Option to process entire YouTube channels when a channel URL is provided.
    • Option to keep videos stored locally or not.
    • The app will automatically detect whether each link is a playlist, channel, or a video.
    • Videos and uploaded files will be transcribed, and the database will be updated with the content.
  • Search:

    • Enter search queries to find relevant snippets from the video transcripts.
    • Results are ranked based on semantic similarity and include video thumbnails.
    • If local videos are available, you can play them directly in the interface.

2. Command-Line Interface (CLI)

The CLI provides more flexibility for programmatic use.

Commands Overview

Use the --help command to view available commands and examples:

offlineyoutube --help

Output:

usage: offlineyoutube [-h] {add,search,ui} ...

YouTube Video Search Application

positional arguments:
  {add,search,ui}   Available commands
    add             Add videos to the database
    search          Search the video database
    ui              Run the Gradio web interface

optional arguments:
  -h, --help        Show this help message and exit

Examples:
  # Add videos from a playlist and keep videos locally
  offlineyoutube add --input "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID" --keep_videos

  # Add specific videos without keeping videos locally
  offlineyoutube add --input "https://www.youtube.com/watch?v=VIDEO_ID1,https://www.youtube.com/watch?v=VIDEO_ID2"

  # Add videos from a channel (process entire channel)
  offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel

  # Search the database with a query
  offlineyoutube search --query "Your search query" --top_k 5

  # Run the Gradio web interface
  offlineyoutube ui

Examples of CLI Usage

1. Adding Videos

  • Add Playlists and Videos:

    offlineyoutube add --input "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID,https://www.youtube.com/watch?v=VIDEO_ID"
    
  • Add Specific Videos Without Keeping Them Locally:

    offlineyoutube add --input "https://www.youtube.com/watch?v=dQw4w9WgXcQ,https://www.youtube.com/watch?v=9bZkp7q19f0"
    
  • Add Videos from a Channel (Process Entire Channel):

    offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel
    
  • Add Videos and Keep Videos Stored Locally:

    offlineyoutube add --input "https://www.youtube.com/watch?v=VIDEO_ID" --keep_videos
    

2. Searching the Database

  • Perform a Search:

    offlineyoutube search --query "machine learning tutorials" --top_k 5
    

How It Works

  1. Adding Videos and Uploaded Files:

    • The app accepts a list of links and automatically detects whether each link is a playlist, channel, or an individual video.
    • You can upload your own video or audio files for processing.
    • It downloads video audio (or uses uploaded files) and transcribes it using faster-whisper.
    • Thumbnails are downloaded and saved locally.
    • The transcript data is saved in datasets/transcript_dataset.csv.
    • A vector database is updated using FAISS with embeddings generated by sentence-transformers.
  2. Incremental Updating:

    • Videos and uploaded files are processed one by one, and the dataset and vector database are updated incrementally.
    • This ensures efficient processing, especially when dealing with large datasets.
  3. Searching the Database:

    • When a query is entered, the app computes its embedding and searches the FAISS index for relevant video snippets.
    • The top results are displayed with thumbnails, titles, and links to the videos.
    • If local videos are available, you can play them directly in the interface.

FAQ

1. How do I add multiple playlists, channels, and videos at once?

Simply provide a comma-separated list of URLs, and the app will automatically detect and process each link:

offlineyoutube add --input "https://www.youtube.com/playlist?list=PLAYLIST_ID1,https://www.youtube.com/watch?v=VIDEO_ID,https://www.youtube.com/channel/CHANNEL_ID"

If you want to process entire channels, make sure to include the --process_channel flag:

offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel

2. How can I upload my own video or audio files for processing?

In the Gradio web interface, navigate to the Add Videos tab. Use the "Upload your own video/audio files" option to upload one or multiple files. The app will process these files and add them to the database.

3. Why aren’t new videos or uploaded files showing up in search results?

Ensure that the videos or files have been fully processed and that the vector database has been updated. The app handles this automatically, but processing may take time for large videos, playlists, or channels.

4. How do I prevent videos from being stored locally?

By default, the app keeps videos stored locally. To change this behavior, use the --keep_videos flag and set it to False:

offlineyoutube add --input "VIDEO_OR_PLAYLIST_URL" --keep_videos False

In the Gradio interface, uncheck the "Keep videos stored locally" option in the Add Videos tab.

5. Can I process entire YouTube channels?

Yes! Use the --process_channel flag when adding videos via the CLI:

offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel

In the Gradio interface, check the "Process entire channel when a channel URL is provided" option in the Add Videos tab.

6. Can I search the database without launching the Gradio interface?

Yes! Use the search command via the CLI:

offlineyoutube search --query "Your query" --top_k 5

Project Structure

.
├── app.py                       # Main application script (Gradio + CLI)
├── functions.py                 # Helper functions for transcription, FAISS, etc.
├── datasets/
│   ├── transcript_dataset.csv   # CSV file storing transcripts
│   └── vector_index.faiss       # FAISS vector index
├── thumbnails/                  # Folder for storing video thumbnails
├── videos/                      # Folder for storing downloaded videos (if keep_videos is True)
├── tmp/                         # Temporary folder for videos (if keep_videos is False)
├── uploaded_files/              # Folder for storing uploaded files

Known Limitations

  • Processing Time: Transcribing videos and generating embeddings can be time-consuming, especially for long videos, large playlists, or channels.
  • Storage Requirements: Keeping videos stored locally will require additional disk space. Use the --keep_videos False option if storage is a concern.
  • Large Datasets: As the dataset grows, querying may take longer. Consider optimizing the FAISS index for very large datasets.

Contributing

Feel free to fork the repository, open issues, or submit pull requests if you'd like to contribute to this project.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • faster-whisper for fast transcription.
  • FAISS for efficient vector search.
  • Gradio for the interactive web interface.
  • yt-dlp for downloading video content.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

offlineyoutube-2.1.7.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

offlineyoutube-2.1.7-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file offlineyoutube-2.1.7.tar.gz.

File metadata

  • Download URL: offlineyoutube-2.1.7.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for offlineyoutube-2.1.7.tar.gz
Algorithm Hash digest
SHA256 d2009258fad13075d191643ed672d1b7da6429dfd207b5fef3f6f4d479802c9b
MD5 6ef057e115b75a8662a5eb4c0965db93
BLAKE2b-256 3c74d1e85aa380fa3ed1bdfa6286780cfa02e7e42b6efe5a742ab30ef14af760

See more details on using hashes here.

File details

Details for the file offlineyoutube-2.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for offlineyoutube-2.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 bb6f53f666ed81cd956a31f45e263083181fcbf1b33e19099e3edc3690cd06d0
MD5 afaca4efd1938f82c72f48881afe1ed9
BLAKE2b-256 288588321641dcc0d0a1be18b0d99a64949b2f9cea4bdecda55c812c066f9188

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page