Skip to main content

Build knowledge base from YouTube video transcripts

Project description

ytq

PyPI Changelog Tests License

Build knowledge base from YouTube video transcripts

Overview

ytq (short for YouTube Query) is a CLI tool that processes YouTube videos to create a searchable knowledge base. It:

  • Downloads and extracts transcripts from YouTube videos
  • Uses LLMs to generate structured summaries
  • Each transcript is split into multiple chunks (subsections). Each section preserves its start and end times timestemps. Then chunks are embedded using openai 'text-embedding-3-small' model. Created embeddings are used when --semantic search flag is enabled.
  • Stores everything in a searchable SQLite database
  • Provides a CLI for adding videos, searching, and viewing summaries

Installation

Install this tool using pip:

pip install ytq

If you are using uv then you can run directly the cli in temporary enviironment like so:

uvx ytq <command> <args>

or you can also install it as a tool:

uv tool install ytq
# and then
ytq <command> <args>

Usage

Adding a Video to the Knowledge Base

To add a YouTube video to your knowledge base, use the add command:

ytq add <video_url>

Optional parameters:

  • --chunk-size: Maximum size of each text chunk (default: 1000 characters)
  • --chunk-overlap: Overlap between chunks (default: 100 characters)
  • --provider: LLM summarization provider (default: "openai")
  • --model: LLM summarization model (default: "gpt-4o-mini")

Example:

ytq add https://youtube.com/watch?v=example --chunk-size 1500 --provider anthropic

If you try storing a video that is already in the db, the old version is removed and replaced with the new version.

Searching the Knowledge Base

Search your knowledge base using the query command:

ytq query <search_term>

Search options:

  • --chunks: Enable chunk-level search
  • --semantic: Enable semantic search (when chunk search is enabled)
  • --limit: Maximum number of results (default: 3)

Examples:

# Video-level search (default)
ytq query "machine learning"

# Chunk-level keyword search
ytq query "neural networks" --chunks

# Semantic chunk-level search
ytq query "types of algorithms" --chunks --semantic

Viewing Video Summary

To view a summary of a specific video:

ytq summary <video_id>

Example:

ytq summary dQw4w9WgXcQ

Deleting a Video

To remove a video from the knowledge base:

ytq delete <video_id>

Version Information

To check the version of ytq:

ytq --version

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd ytq
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

python -m pytest

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytq-0.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ytq-0.1-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file ytq-0.1.tar.gz.

File metadata

  • Download URL: ytq-0.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ytq-0.1.tar.gz
Algorithm Hash digest
SHA256 b4436feaa91e6c0902e8d1e5cf6445a50d03ec859e28c4dde08d66eafcc5c404
MD5 7043d758e864207a7e4aa360e537ea87
BLAKE2b-256 0ce4cb55bbfb5b86372251a6fd20f53753de4c01ff082101f11ce6b58440f032

See more details on using hashes here.

Provenance

The following attestation bundles were made for ytq-0.1.tar.gz:

Publisher: publish.yml on LVG77/ytq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ytq-0.1-py3-none-any.whl.

File metadata

  • Download URL: ytq-0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ytq-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 00c5cb17ce17e41c2fb82ade7b9dc7d80d8ed7c8d0f50171ab05272a79834ce7
MD5 487693265664fba0329c5fc20b496714
BLAKE2b-256 09ce01d0a5b68eea64cf4382c9f6d511caf6fe4f31aebc65d65e33fd0392b745

See more details on using hashes here.

Provenance

The following attestation bundles were made for ytq-0.1-py3-none-any.whl:

Publisher: publish.yml on LVG77/ytq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page