Skip to main content

Convert YouTube educational video to crisp PDF notes

Project description

Glimpsify (ytvideo2pdf)

Glimpsify extracts slide-like frames from educational videos and builds a PDF of the key visuals (diagrams, formulas, charts). It is optimized for lecture-style videos where text appears on screen over time.

Try now (without setup)

Try it out here: https://colab.research.google.com/drive/1xz6uHeY0QAzMTR8DbXJY8BSvNmKhI24Q?usp=sharing

Quick start

  1. Install OCR engine (required for text detection)

    • Windows: install Tesseract OCR and make sure tesseract is on PATH.
    • macOS: brew install tesseract
    • Debian/Ubuntu: sudo apt-get install tesseract-ocr
  2. Install the package

pip install ytvideo2pdf
  1. Run the CLI
ytvideo2pdf --input=youtube --url="https://youtu.be/Z_MLrbI1s2E"

Common usage

Extract from a local folder (expects a single video file in the directory):

ytvideo2pdf --input=local --dir="C:\path\to\video_dir"

Run with a specific extraction strategy:

ytvideo2pdf --input=youtube --url="https://youtu.be/Z_MLrbI1s2E" --extraction=prominent_peaks

Extract a fixed number of frames:

ytvideo2pdf --input=youtube --url="https://youtu.be/Z_MLrbI1s2E" --k=10

Extract frames at explicit timestamps (seconds):

ytvideo2pdf --input=youtube --url="https://youtu.be/Z_MLrbI1s2E" --extraction=timestamps --timestamps="30, 95.5, 120"

What you get

  • A PDF file in output/ with the extracted frames.
  • A JSON metadata file alongside the PDF (same name, .json).
  • Intermediate folders (unless --no-cleanup) for extracted frames and cached objects.

Key features

  • Multiple extraction strategies to pick the most informative frames.
  • OCR-based signal processing (Tesseract by default).
  • Optional caching of processed frames for reuse.
  • Optional plots of the OCR signal (for debugging and tuning).

CLI options (summary)

  • --input: youtube | local | pickle
  • --url: YouTube video or playlist URL (for youtube input)
  • --dir: local directory path (for local or pickle input)
  • --ocr: tesseract | easy_ocr | paddleocr
  • --ocr_approval: phash | pixel_comparison | approve_all | reject_all
  • --extraction: prominent_peaks | k_transactions | key_moments | timestamps | rate_change_threshold
  • --k: number of frames to extract, or auto
  • --timestamps: comma-separated seconds (for timestamps extraction)
  • --threshold: integer threshold for rate_change_threshold
  • --cache-frames/--no-cache-frames
  • --skip-plot/--no-skip-plot
  • --cleanup/--no-cleanup

For Python API usage, see LIBRARY.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytvideo2pdf-0.1.0.tar.gz (390.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ytvideo2pdf-0.1.0-py3-none-any.whl (38.7 kB view details)

Uploaded Python 3

File details

Details for the file ytvideo2pdf-0.1.0.tar.gz.

File metadata

  • Download URL: ytvideo2pdf-0.1.0.tar.gz
  • Upload date:
  • Size: 390.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for ytvideo2pdf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6575fc463404547191e8cbc85e73e3abb5c40f43f7ca7092cb946347da9f24be
MD5 87fb6bd1c8a7afa907a730374913438f
BLAKE2b-256 0f5f454f275836b3ab063b4f87891c94800e02d6c486b0013409f29718c0ec84

See more details on using hashes here.

File details

Details for the file ytvideo2pdf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ytvideo2pdf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for ytvideo2pdf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98221f4eaa1b96e977381f27ded3da798f6e067aa0f04a687f88ebc1802da26c
MD5 860148cd2c51ac9e8d4edd422d4dc308
BLAKE2b-256 35c82b449f01927c68ddc6bb8430160ebd1737b3c0c676a5b3d8ce7d7733dad5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page