Build knowledge base from YouTube video transcripts
Project description
ytq
Build knowledge base from YouTube video transcripts
Overview
ytq (short for YouTube Query) is a CLI tool that processes YouTube videos to create a searchable knowledge base. It:
- Downloads and extracts transcripts from YouTube videos
- Uses LLMs to generate structured summaries
- Each transcript is split into multiple chunks (subsections). Each section preserves its start and end times timestemps. Then chunks are embedded using openai 'text-embedding-3-small' model. Created embeddings are used when
--semanticsearch flag is enabled. - Stores everything in a searchable SQLite database
- Provides a CLI for adding videos, searching, and viewing summaries
Installation
Install this tool using pip:
pip install ytq
If you are using uv then you can run directly the cli in temporary enviironment like so:
uvx ytq <command> <args>
or you can also install it as a tool:
uv tool install ytq
# and then
ytq <command> <args>
Usage
Adding a Video to the Knowledge Base
To add a YouTube video to your knowledge base, use the add command:
ytq add <video_url>
Optional parameters:
--chunk-size: Maximum size of each text chunk (default: 1000 characters)--chunk-overlap: Overlap between chunks (default: 100 characters)--provider: LLM summarization provider (default: "openai")--model: LLM summarization model (default: "gpt-4o-mini")
Example:
ytq add https://youtube.com/watch?v=example --chunk-size 1500 --provider anthropic
If you try storing a video that is already in the db, the old version is removed and replaced with the new version.
Searching the Knowledge Base
Search your knowledge base using the query command:
ytq query <search_term>
Search options:
--chunks: Enable chunk-level search--semantic: Enable semantic search (when chunk search is enabled)--limit: Maximum number of results (default: 3)
Examples:
# Video-level search (default)
ytq query "machine learning"
# Chunk-level keyword search
ytq query "neural networks" --chunks
# Semantic chunk-level search
ytq query "types of algorithms" --chunks --semantic
Viewing Video Summary
To view a summary of a specific video:
ytq summary <video_id>
Example:
ytq summary dQw4w9WgXcQ
Deleting a Video
To remove a video from the knowledge base:
ytq delete <video_id>
Version Information
To check the version of ytq:
ytq --version
Development
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd ytq
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
python -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ytq-0.1.tar.gz.
File metadata
- Download URL: ytq-0.1.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4436feaa91e6c0902e8d1e5cf6445a50d03ec859e28c4dde08d66eafcc5c404
|
|
| MD5 |
7043d758e864207a7e4aa360e537ea87
|
|
| BLAKE2b-256 |
0ce4cb55bbfb5b86372251a6fd20f53753de4c01ff082101f11ce6b58440f032
|
Provenance
The following attestation bundles were made for ytq-0.1.tar.gz:
Publisher:
publish.yml on LVG77/ytq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ytq-0.1.tar.gz -
Subject digest:
b4436feaa91e6c0902e8d1e5cf6445a50d03ec859e28c4dde08d66eafcc5c404 - Sigstore transparency entry: 178308510
- Sigstore integration time:
-
Permalink:
LVG77/ytq@86ba0eab56caa0f1fec061bf11b12d3fdc8d28f2 -
Branch / Tag:
refs/tags/0.1 - Owner: https://github.com/LVG77
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@86ba0eab56caa0f1fec061bf11b12d3fdc8d28f2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ytq-0.1-py3-none-any.whl.
File metadata
- Download URL: ytq-0.1-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c5cb17ce17e41c2fb82ade7b9dc7d80d8ed7c8d0f50171ab05272a79834ce7
|
|
| MD5 |
487693265664fba0329c5fc20b496714
|
|
| BLAKE2b-256 |
09ce01d0a5b68eea64cf4382c9f6d511caf6fe4f31aebc65d65e33fd0392b745
|
Provenance
The following attestation bundles were made for ytq-0.1-py3-none-any.whl:
Publisher:
publish.yml on LVG77/ytq
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ytq-0.1-py3-none-any.whl -
Subject digest:
00c5cb17ce17e41c2fb82ade7b9dc7d80d8ed7c8d0f50171ab05272a79834ce7 - Sigstore transparency entry: 178308513
- Sigstore integration time:
-
Permalink:
LVG77/ytq@86ba0eab56caa0f1fec061bf11b12d3fdc8d28f2 -
Branch / Tag:
refs/tags/0.1 - Owner: https://github.com/LVG77
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@86ba0eab56caa0f1fec061bf11b12d3fdc8d28f2 -
Trigger Event:
release
-
Statement type: